Django XML External
Entities (XXE) Guide:
Examples and Prevention

stackhawk

StackHawk|April 20, 2022

Django XML external entities are deadly. Attackers use them for DOS attacks and steal confidential data. Here's how to protect yourself.


XML External Entity (XXE) attacks are a way to bypass security firewalls and coerce an application into downloading a threat to itself or sharing information with an attacker. These attacks often lead to loss of confidential information and denial of service outages.

In this article, we'll look at Django XML External Entities and how you can protect yourself from XXE attacks in Django.

Django XML External Entities (XXE) Guide: Examples and Prevention image

XML External Entity Attacks

XXE attacks are injection attacks that take advantage of an application's willingness to process dangerous XML documents.

These documents use XML constructs to interfere with the application's expected behavior. 

Before describing how these attacks function, we should discuss how we form XML documents. They're comprised of entities that hold or refer to content. If they already have their content, they are internal entities. If they don't, they contain a pointer to content elsewhere; they're external entities. 

Since you're already familiar with the Internet and blogs, you already know what a URL is. Many XML external entities use URIs, which are a superset of URLs. They look a lot like links in HTML pages, but they can point to a wide variety of different resources. URIs may refer to content or resources—entities—on the host running the application or elsewhere, like the local network or the public Internet. 

XXEs can also refer to content contained in the document itself. This can include constructs such as dictionaries that map entities to terms. Even they can be used in an attack, as we'll see below.

The Anatomy of an Entity

Let's look at a document with an external entity. 

This document defines an external entity on line #5 in the DOCTYPE attribute. The entity's name is xxe and points to an external website at https://example.com/login. 

?xml version="1.0" encoding="ISO-8859-1"?> 
   <!DOCTYPE foo [
        <!ELEMENT foo ANY >
        <!ELEMENT bar ANY >
        <!ENTITY xxe SYSTEM "https://www.example.com/login" >
   ]>
   <foo>
    <bar>&xxe;</bar>
   </foo>

So, when the document's body refers to &xxe, an XML parser that expands external entities will retrieve the contents of https://example.com/login and insert them in the document between the <bar> tags. 

XML Entity Attacks in Action

Now that we know what an XXE looks like let's examine a few of the more common flavors of XXE attack, then see how they look in Django. 

Malicious Data Injection

If an attacker can coerce your application into retrieving data and inserting it into a document, they can add malicious content like phishing forms and URL redirects. 

So, our example entity is a potential injection attack. It's inserting a login form to someone else's website. 

 <?xml version="1.0" encoding="ISO-8859-1"?> 
   <!DOCTYPE foo [
        <!ELEMENT foo ANY >
        <!ELEMENT bar ANY >
        <!ENTITY xxe SYSTEM "https://www.example.com/login" >
   ]>
   <foo>
    <bar>&xxe;</bar>
   </foo>


Network Snooping Attacks

External entities don't have to point to the public Internet. They can point at internal addresses, too. 

So let's change the URI in our sample entity: 

<?xml version="1.0" encoding="ISO-8859-1"?> 
   <!DOCTYPE foo [
        <!ELEMENT foo ANY >
        <!ELEMENT bar ANY >
        <!ENTITY xxe SYSTEM "https://10.1.1.1/login" >
   ]>
   <foo>
    <bar>&xxe;</bar>
   </foo>

If this URL exists, its contents will end up in the document. If it doesn't, an attacker can still learn a lot. Does the 10.1.1.* network exist in your network? Does the 10.1.1.1 host exist? Is it running a webserver? This may sound like a lot of work until you realize that you can build a few thousand of these documents with a Python script and a few minutes to spare. 

File Retrieval Attacks

Now, let's make a small tweak to our attack. 

 <?xml version="1.0" encoding="ISO-8859-1"?> 
   <!DOCTYPE foo [
        <!ELEMENT foo ANY >
        <!ELEMENT bar ANY >
        <!ENTITY xxe SYSTEM "file://etc/hosts" >
   ]>
   <foo>
    <bar>&xxe;</bar>
   </foo>

Instead of retrieving data from a web server, this attack inserts the contents of /etc/hosts into the tag. Now the attacker knows what your application server does about its network. 

Denial of Service Attacks

Not all files are created equal. While some file retrievals are useful for stealing information or planning for deeper attacks, others are useful for Denial of Service (DOS) attacks. 

 <?xml version="1.0" encoding="ISO-8859-1"?> 
   <!DOCTYPE foo [
        <!ELEMENT foo ANY >
        <!ELEMENT bar ANY >
        <!ENTITY xxe SYSTEM "file://dev/random" >
   ]>
   <foo>
    <bar>&xxe;</bar>
   </foo>

The /dev/random file generates random numbers based on system noise. It's a Unix special file. Reads to this file block: they don't return data until the system sees noise it can use to generate a random number. As a result, you can freeze an XML parser by pointing it there. Send a few hundred of these documents, and you've effectively killed the web application. 

LOL U R Pwned

Let's look at one last attack before covering how to protect your Django application from XXE attacks. This one takes advantage of XML parser behavior to create a denial of service. 

Here's an example of the billion laughs attack: 

<?xml version="1.0"?>
<!DOCTYPE lolz [
        <!ENTITY lol "lol">
        <!ELEMENT lolz (#PCDATA)>
        <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
        <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
        <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
        <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
        <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
        <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
        <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
        <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
        <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
        ]>
<lolz>&lol9;</lolz>

What happens when an XML parser process this document? 

  1. It encounters the &lol9; entity.

  2. This entity contains 10 &lol8; entities.

  3. Each &lol8; entity has 10 &lol7; entities.

  4. Wait...each &lol7; has 10 &lol6; entities?

  5. And each &lol6; has 10 &lol5; entities!

  6. Keep going, and we end up with 109lols!

So, that's literally a billion laughs. As a result, a tiny XML document expands to a few gigabytes of memory when processed. 

Django XML External Entities (XXE) Guide: Examples and Prevention image

Avoiding Django XML External Entities

How do you avoid these attacks if your Django application needs to parse untrusted XML? Quite simply, it turns out. The defusedxml python package does all the work for you. Update a few import statements, and it'll throw an exception when it comes across a forbidden construct. 

No Laughs for You

Let's start with a billion laughs. Put the XML above in a file named lolz.xml, then point this code at it. You'll need a Python environment with defusedxml installed. 

from defusedxml.ElementTree import parse

def main():

    try:
        et = parse("lolz.xml")
        kids = et.children()
        for kid in kids:
            print(kid)        
    except Exception as x:
        print(x)

if __name__ == '__main__':
    main()

This code uses the standard Python ElementTree parser, wrapped in defusedxml's protection. Instead of importing it from ElementTree, we specified defused's wrapper. After parsing the document, the code walks the document's child nodes and prints them—if it gets there. If processing the document throws, we print out the exception to see why. 

But here's the output: 

EntitiesForbidden(name='lol', system_id=None, public_id=None)

Defusedxml refused to parse the document! It saw the first Entity declaration and threw an appropriate exception. 

No File Retrievals

Let's put the file retrieval XML in a file named files.xml and point the parsing code at it.

from defusedxml.ElementTree import parse

def main():

    try:
        et = parse("files.xml")
        kids = et.children()
        for kid in kids:
            print(kid)
    except Exception as x:
        print(x)


if __name__ == '__main__':
    main()

The output looks similar to the previous trial: 

EntitiesForbidden(name='xxe', system_id='file://etc/hosts', public_id=None)


Just like before, defusedxml stopped parsing when it saw an entity declaration. 

Defusedxml Protects You From XXEs

These examples are using defusedxml's ElementTree interface. It also offers interfaces for cElementTree, expatreader, sax, minidom, pulldom, and several other XML parsers and builders. So, it's very easy to plug it into existing Python apps by updating your imports. If you need to write new code, you can still use the interfaces you're accustomed to. 

Automated API security testing in CICD

No More Django XML External Entity Attacks

XML External Attacks are dangerous constructs that can take your application down and steal your client's data.

You need to protect yourself from them. In this post, we discussed what the attacks look like and then saw how easy it is to protect your Django sites with defusedxml without modifying much more than a few import statements. 

Now that you understand Django external entity attacks check your code and make sure you're safe. While you're at it, sign up for a free account and take your code security to the next level! 

This post was written by Eric Goebelbecker. Eric has worked in the financial markets in New York City for 25 years, developing infrastructure for market data and financial information exchange (FIX) protocol networks. He loves to talk about what makes teams effective (or not so effective!).


StackHawk  |  April 20, 2022