No Love for the Null Byte

By | September 22, 2011
Attackers have commonly used the null character to bypass file extension restrictions during the exploitation of local file inclusion vulnerabilities.  rain.forest.puppy outlined this type of attack against Perl-based CGI applications in Phrack issue 55 over ten years ago, but the problem has also affected web applications written in other higher-level languages such as Java, .NET, and PHP.  Consider the following insecure PHP code.
if(file_exists('/tmp/'.$_GET['filename'].'.txt')) {
  include '/tmp/'.$_GET['filename'].'.txt'
}
Clearly, an attacker could abuse the poorly written application to include arbitrary TXT files stored on the web server, which is certainly not good, but until fairly recently an attacker could also include any file regardless of the extension.  Consider the following request an attacker could make to exploit the vulnerability and acquire the server’s password file.
http://www.victim.com/page.php?filename=../../etc/passwd%00
The developers of PHP addressed this issue in version 5.3.4 late last year, and “paths with NULL in them (foo\0bar.txt) are now considered as invalid.”  Finally, the file_exists function operates how most programmers would expect it to function.  File systems, at least NTFS and most Unix file systems, do not allow the null character to appear within a file name, although many other control characters are permitted within file names.
if(file_exists("/etc/passwd\0")) {
  // Dead code in 5.3.4 or later.
  echo 'There isn\'t a file named /etc/passwd\\0 on my server!!!';
}
So can we finally stop worrying about null bytes in PHP?  Not really, the null byte character can still cause issues in a number of other situations.  Consider an application that performs rudimentary input validation to prevent command injection, but still allows the user to type in something like ../etc/passwd\0. Same problem, different function.
// Guess what file gets deleted?
exec("rm /tmp/../etc/passwd\0.tmp");
Attackers could also use null byte characters to bypass black-list filters designed to mitigate the risk of cross-site scripting attacks by blocking specific HTML or JavaScript keywords.  Applications should avoid relying on black-listing to prevent attacks, but Internet Explorer complicates the situation, since IE essentially ignores null byte characters in every single context while rendering HTML, and JavaScript, and therefore we can easily craft a payload like the following to bypass black-list filters that attempts to sanitize input.  Luckily, other browsers simply ignore the null byte shenanigans and the JavaScript fails to execute.
http://target/lameBlackList.php?data=blah%00<%00s%00c%00r%00i%00p%00t%00/%00a%00=%00"%00<blah>%00"%00>%00a%00l%00e%00r%00t%00(%001%00)%00<%00/s%00c%00r%00i%00p%00t%00>
// lameBlacklist.php
$data = $_GET["data"];
$pattern = "/<[[:space:]]*([a-z]|[A-Z]|\/)+/";
$replacement = "denied";
$out = preg_replace($pattern, $replacement, $data);
echo $out;

This example also illustrates one way of bypassing IE8's XSS filter, since the parameter value in the request will not exactly match the parameter value reflected in the response, which is accomplished by adding in an erroneous <blah> element inside of a bogus HTML attribute, since we know the application will attempt to sanitize the input as opposed to outright blocking the malicious request.  Another way to ensure that the request signature will not match the response signature, and bypass the IE’s XSS filter, is to abuse applications that perform output encoding on some characters, such as double quotes, but fail to encode all relevant characters, such as single quotes.  But, I digress...
At the end of the day, null byte characters will continue to cause security issues when software written in higher-level language pass unvalidated user input to software written in C/C++ or assembly.  Higher-level languages such as Java, PHP, Perl, and .NET place no special meaning in a null character, while the null character is used as the string termination character in lower level languages, therefore this mismatch of data representation of strings will often adversely affect security.  Pascal style strings are sounding like a good idea again 🙂