What Are XML External Entities?
Let’s talk about XML. Extensible Markup Language (XML) was created to store, share, and transport data between systems, and it is platform and language independent. This doesn’t mean that it’s a renegade language because there is a standardized definition for it called Document Type Definition (DTD). The DTD is used to validate XML documents for proper structure.
Let’s go back to the login form. Here, you are registering on the site with your name, email address, and phone number. After you hit enter, they become elements in XML form.
Remember that the DTD will have already defined the elements user, username, email, and phone shown below. In XML that would translate to this:
<?xml version="1.0" encoding="ISO-8859-1"?>
Batman
batman@superheroes.bla
888-123-4567
I mentioned HTML character entities in the last chapter.
The XML entities work the same way. If you want to represent ambiguous characters like a <
character that can also be a less than sign, right? So, to specify that you mean less than and not just starting a new script tag, you can use the entity <
.
Now let’s talk about internal and external entities.
An internal entity is basically an XML reference to an internal declaration.
<!ENTITY superhero "Batman">
<!ENTITY origin "Gotham City">
&superhero;&origin;
An external entity is an XML reference to an external source like a file path or a URL. When this external entity tag is embedded on a web page, the external entities will be pulled onto the page from their location.
As you see below, the first external entity, superhero
, is a referenced URL. The second external entity, origin
, references a file path on a Linux file system. The contents of the file on that path will be displayed on the web page. The keyword SYSTEM includes the referenced file in the XML document.
<!ENTITY superhero SYSTEM "http://www.batman.bla">
<!ENTITY origin SYSTEM "file:///usr/batman">
&superhero;&origin;
What Are the XML External Entity Attacks?
How can a hacker take advantage of an external entity? Let’s look at the line from the above code.
<!ENTITY origin SYSTEM "file:///usr/batman">
When this entity is referenced, it will display the contents of the file in that path.
A hacker may have deliberately chosen to place that path in the XML to gain unauthorized access to the contents of that file. And what other attack does this look like? If you think this looks like a form of injection, you’re right!
Let’s look at how the XML injection attack works.
As you can see the variable xmlHero is in XML format.
A POST request is created to go to the URL webdevfightshacker.bla/login.html
Once it goes to the login page, the XML data in xmlHero will fill in the name and password form.
The POST request provides a response to show success or failure.
var xmlHero = "
Batgirl
ilovebatman
”;
$.ajax
({
type: "POST",
url: "webdevfightshacker.bla/login.html",
data: xmlHero,
success: function(response)
{ console.log(response);
}
});
The code above shows a POST request that sends the XML string data assigned to the variable xmlHero
.
The login page that receives this POST request then uses an XML parsing function to read the contents and assign them to the variables that correspond with the username and password on the form.
How can the hacker take advantage of this? An XML external entity injection is used to read the contents of a file and display it on the browser. This hacker is interested in a special folder called secretpower on Batgirl’s desktop.
Let’s watch what happens to poor Batgirl’s private folder.
This hacker starts to create a DTD for an entity named hax, and give it the file path.
<!DOCTYPE hax [<!ELEMENT hax ANY >
<!ENTITY hax SYSTEM “C:\Users\SBatgirl\Desktop\secretpower”>>
Then the variable hax
is referenced as an XML external character entity.
var xmlHero = "
&hax;
ilovebatman
”;
I hope Batgirl is ready for the world to find out about her secret power because this code may be able to display its contents on the hacker’s browser!
But it’s not too late for you to save the day, and fix the code on the web app!
So what should you do to prevent her from losing her secret power files?
How Can You Stop XML External Entity Injections?
What could you have done with the web application code to keep Batgirl’s folder safe? There is a way to disable external entities in all languages. It is usually just a quick true/false binary tag.
For example, in a PHP XML parser, it would look like this: libxml_disable_entity_loader (true);
And in Java:
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
There are additional XML parsing APIs that may have external entities enabled by default, so note all your dependencies.
What Is Insecure Deserialization?
Let’s talk about serialization first.
To start, you will need an object in any of the object-oriented programming languages like PHP, Javascript, Ruby, or Go.
Let’s say you want to take an object like a username and password and put it on the database. To store it, the object will need to be converted into byte stream to be transported through the network to get to the database. When there is a call for that same object in the database, it needs to be deserialized (converted from byte stream to an object) before its use.
This serialization/deserialization process for objects doesn’t just occur with databases. An object can change its state to a byte stream when stored in a file, from computer to computer or moving through the network.
These objects can be cookies, voice streams, tokens, and cache files, to name a few. They serialize and deserialize at endpoints.
So what is insecure deserialization? It is a vulnerability that exposes the binary to a MITM or injected code that can change the integrity of the object when it is deserialized. Java has dedicated libraries for serialization and deserialization. YAML has libraries dedicated towards serialization for various programming languages.
For example, a cookie with a session ID and credentials is sent from the browser to the web server running Java.
The cookie is serialized using the Java OutputStream class with a constructor.
Malicious code is injected in the binary by bad pirates.
The cookie is deserialized using Java’s ObjectInputStream class without a constructor. The constructor is created after the object has been created. Without verification or input validation during the deserialization process, deserialization attacks are a hacker’s paradise.
A hacker can use a Java class that extends Serializable or Externalizable because they are already available in the library. The member values can be manipulated to create functionality different from the original object.
Here is an example with Java:
public class Hacked extends Serializable {
private String cmd;
private void readObject(ObjectInputStream hackedbinary) {
hackedbinary.defaultReadObject();
Runtime.getRuntime().exec(cmd);
}
}
You will see that this class Hacked extends one of the libraries being used in the system. The value of the class variable is called cmd, which coincides with the command in the Windows OS for the command prompt.
The readObject()
method anddefaultReadObject()
methods deserialize the hacked binary to coincide with the serialization process in the beginning.
The Runtime.getRuntime().exec(cmd);
opens a command prompt for the hacker to effectively change the functionality of the original object.
Prevent Insecure Deserialization
In the above example, the Java ObjectInputStream class restricts arbitrary use from that library by wrapping it in the SerialKiller library. The SerialKiller Java library was created to offset the issues in deserialization.
To prevent unauthorized escalated access using the deserialized object, a defensive move would be to provide input validation and verification of the functionality of the object.
Let’s Recap!
External XML entities can be used to reveal sensitive data, images, and document references saved on a computer.
External XML entities should be disabled as a best practice.
Insecure deserialization is the ability of an attacker to change the state of code while it is being converted to binary.
Insecure deserialization can be prevented by creating checks on the state of code as it is converted back.
Use code libraries that can prevent the use of standard Java libraries in the attack.