What Is Cross-Site Scripting (XSS)?
Imagine you just received a notification email announcing that your favorite blogger had just put up a new post. So you click the link as you usually do, which navigates you to the post, and you proceed to read. You want to leave a comment congratulating the author on the brilliance of the post, but you need to be logged in to do so. You click the login button, enter your credentials, and see the login screen again. Believing you have mistyped your password, you re-enter your credentials, and this time find yourself redirected to the Post Comment window on the page. Indeed, you must have fat-fingered the J in your password, you know, as you often do, hitting the K by mistake.
The next day you discover that you have been banned from your favorite blogger’s site for posting a plethora of offensive comments in violation of the Terms of Use agreement. Also, you no longer have access to your Facebook, Twitter, or LinkedIn accounts, and, worst of all, your bank account is suspiciously empty.
How could this happen? Did someone at work set up a spy cam that watched you as you typed in these things on your keyboard? Did someone hack into your phone and intercept everything you typed? No. It’s much simpler than that.
You were the unsuspecting victim of a cross-site scripting (XSS) attack. To make matters worse, because you can’t remember multiple passwords, and have don’t use a password management tool, you naively used the same password for the blogger’s site as you do for Facebook, Twitter, and LinkedIn. You also used that same password for your online banking account, which is now empty.
Cross-site scripting (XSS) is a malicious technique that allows the attacker to execute JavaScript (or other scripting language) code in another user’s browser. It is also the most common type of attack and one that developers often overlook. We’re going to deal with this one first, and teach you how to safeguard your apps against it.
With XSS, the attacker doesn’t directly target any specific victim, but rather, all visitors to a particular website. An attacker exploits security vulnerabilities within a site to deliver malicious code to any visitor. The malicious code appears to be a valid part of the website, and is thus delivered to the user requesting the page. In this way, the website actually acts as an unwitting accomplice to the attacker.
How Does it Happen?
The only way for the attacker to run malicious code in the victim's browser is to inject it into one of the pages that the victim downloads from the website. This can happen if the website directly includes user input in its pages because the attacker can then insert a string that is treated as code by the victim's browser.
A perfect place for XSS attacks to happen is in pages containing blog posts or news articles which allow visitors to comment on the post, such as the one in the example you just read. An attacker visited the site and injected malicious JavaScript code into a posted comment. Then when you visited the site, that malicious code was executed as the page, including the comment the attacker posted, was rendered in your browser. The malicious code modified the login button on the page. It then redirected you to another page, which was a duplicate of the login page on the site. There you entered your login credentials before being redirected you back to the original site again. All along, you believed you had mistyped your password. This redirection was a second attack - called open redirect, which was injected into the site using the initial cross-site scripting attack. We'll discuss that more in the next chapter.
What Are the Consequences?
So how is this even possible? Doesn’t JavaScript already run in a tightly restricted environment within browsers with minimal access to a machine’s files and operating system? Browsers have console windows that allow you to execute code directly on a web page. So why is this kind of an attack so bad? Are we exaggerating here? Can this scenario happen?
While it is true that JavaScript does run in a very restricted environment, there are several things it can do within the constraints of a browser.
First, JavaScript has access to cookies stored on a user’s machine that may contain sensitive information. An attacker can use document.cookie to access the victim’s cookies associated with the website, send them to his own server, and extract sensitive information to trick the server into thinking his requests are coming from a valid source – kind of like small scale identity theft.
Second, JavaScript can send HTTP requests to almost any destination with virtually any type of content. These requests can include registering certain JavaScript mechanisms, such as listeners, into the page. An attacker might register a keyboard event listener, for example, using addEventListener. This listener then captures all of the user’s keystrokes and sends them to the attacker’s server. This technique is often used to steal passwords and credit card numbers.
Third, using DOM manipulation, JavaScript can modify the HTML of the current page, adding anything the attacker wants to trick the user into providing sensitive data. For example, an attacker can insert a fake login form into the page, where the form’s action attribute targets the attacker’s server, tricking the user into sending the attacker his or her login credentials.
These types of attacks can be tough to spot because the malicious script executes within the context of the original site, not a clone. The script is treated like any other piece of code or data being served up by the site, it has access to any cookies associated with the site, and the host name displayed in the URL is the original website. The malicious script is considered to be a valid part of the website, and as such, it can do anything the original site can do.
Now, since our purpose here is to teach you how to prevent attacks using .NET Core (and not become an expert hacker), let’s jump right into preventing XSS attacks.
Prevent XSS Attacks
There are two ways you can prevent XSS attacks:
Encode the data before display so that the browser interprets it only as data, not as code.
Validate the data on input so that the stored data contains no malicious commands.
Encoding
As a general rule, you can prevent most of these types of attacks by NEVER putting untrusted data into your HTML. But what is untrusted data? Untrusted data is any data that might be controlled by an attacker. It could include HTML form inputs, query strings, HTTP headers, or even data sourced from a database.
If you must accept untrusted data, there are a few basic rules you should follow when dealing with user input to prevent the introduction of XSS into your sites, and thus render the data trustworthy.
Before putting untrusted data inside an HTML element, make sure all the data is HTML encoded. The encoding changes characters such as
<
and changes them into a safe form, like<
.Before putting untrusted data into an HTML attribute, make sure it too is HTML encoded. Attribute encoding is a superset of HTML encoding, and encodes additional characters such as
"
and'
.Before putting untrusted data into JavaScript, first put the data in an HTML element whose contents you will retrieve at runtime. If this isn't possible, then make sure the data is JavaScript encoded. This type of encoding replaces dangerous characters (
<
, for example) and replaces them with their hex equivalent.Before putting untrusted data into a URL query string, make sure the data URL encoded.
In other words, all untrusted data should be encoded according to its destination within the page.
So how do we encode the data?
HTML Encoding Using Razor
Fortunately, the Razor engine used in MVC automatically encodes all output sourced from variables. In fact, you have to work really hard to prevent it from doing so. Whenever you use the @
directive to access data, Razor uses HTML attribute encoding rules. Since attribute encoding is a superset of HTML encoding, there is no need to worry about whether you should use HTML encoding or attribute encoding for a piece of data – Razor handles all that for you.
Consider the following Razor example code:
@{
var untrustedData = "<\"This is untrusted source data\">";
}
@untrustedData
The above code outputs the contents of the untrustedData
variable. Notice that the value of the variable contains the <
, "
, and >
characters, all of which are used in XSS attacks. However, if you examine the HTML source once this code is rendered, you see the encoded output:
<"This%20is%20untrusted%20source%20data">
JavaScript Encoding Using Razor
Sometimes it’s necessary to insert a piece of data into JavaScript to process in your view. There are two ways to do this:
1. Place the data in a data attribute of an HTML tag, and then retrieve it in your JavaScript.
@{
var untrustedData = "<\"123\">";
}
<div id="injectedData" data-untrustedData="@untrustedData" />
<script>
var injectedData = document.getElementById("injectedData");
var clientSideUntrustedData = injectedData.getAttribute("data-untrusteddata");
document.write(clientSideUntrustedData);
</script>
This will produce the following HTML:
<div id="injectedData" data-untrusteddata="<"123">" />
<script>
var injectedData = document.getElementById("injectedData");
var clientSideUntrustedData = injectedData.getAttribute("data-untrusteddata");
document.write(clientSideUntrustedData);
</script>
Which, when run, renders the following:
<"123">
2. You can also call the JavaScript encoder directly:
@using System.Text.Encodings.Web;
@inject JavaScriptEncoder encoder;
@{
var untrustedData = "<\"123\">";
}
<script>
document.write("@encoder.Encode(untrustedData)");
</script>
This renders in the browser as follows:
<script>
document.write("\u003C\u0022123\u0022\u003E");
</script>
Encoding URL Parameters
If you want to build a URL query string with untrusted data as a value within the string, use the UrlEncoder
to encode the value.
var untrustedData = "\"Untrusted data with spaces and &\"";
var trustedData = _urlEncoder.Encode(untrustedData);
The encoded data contained in the trustedData variable will now be:
%22Untrusted%20Data%20with%20spaces%20and%20%26%22
Spaces, quotes, punctuation, and other unsafe characters are percent-encoded to their hexadecimal values; for example, a space character becomes %20
.
Validation
Validating user input is an effective means of preventing XSS attacks. For example, a numeric string containing only the characters 0-9 won't trigger an XSS attack. However, validation becomes more complicated when accepting HTML in user input, such as in blog comments or other such input elements. Parsing HTML input can be difficult, if not impossible, considering all the places such an attack might be hidden. Markdown, coupled with a parser that strips embedded HTML, is a safer option for accepting rich text input. As a general rule, never rely on validation as a lone solution for XSS prevention. Always encode untrusted input before output, no matter what validation has been performed on the data.
Let’s Recap!
At this point you have learned that:
Cross-site scripting (XSS) occurs when an attacker injects malicious JavaScript (or other scripting language) code into a site by posting it with other data. The malicious code then becomes part of the site and is run whenever the site is rendered in a user’s browser.
As a developer, you can prevent cross-site scripting attacks by:
Encoding any user-supplied data before display so that the browser interprets it as data rather than code.
Validating all user data on input so that the stored data contains no malicious commands.
Next you’re going to learn about cross-site request forgery (XSRF/CSRF), which exploits the trust relationship between a client and server to hijack authenticated user sessions.