Timing Attacks Against File Systems

Time is one of the key parameters in a pentester’s work. It can either interfere with security analysis efforts by reminding you about the deadline and an eager client, or help you out when performing an audit. How? Take for example the database data extraction technique based on measuring server reply times that’s used in blind SQL injections. However, this approach isn’t limited to database operations. It can also be applied when working with file systems. Practically every pentest begins with gathering information about the system, which more often than not involves blatant file searching using a dictionary or a simple dictionary attack. But why not simplify this step? To elaborate on this thought, we want to share our research on this topic, which we presented at the recent ZeroNights 2013 conference.

Foreword

The term “timing attack” hails from cryptography, where it refers to a side channel attack on a cryptosystem in which the attacker attempts to compromise the cryptosystem by analyzing the time taken to execute cryptographic algorithms. However, cryptographic algorithms aren’t the only viable target for timing attacks, which substantially widens their spectrum of application. Just in July of this year Paul Stone of Context IS presented a wonderful practical application of a timing attack against browsers. It entailed the use of timing attacks to bypass the SOP (Same Origin Policy) in modern browsers by calculating delays during page content pixel manipulations, which are generally inaccessible via classic methods due to SOP cross-domain limitations.

Our research of timing attacks against file systems started in late 2012 and is based on our experience in web application security audits and penetration tests. In this paper we share the results and lessons learned from this research.

Background information & Direction of Research

When analyzing web application or network service security by means of black box method, we initially gather information about the target system. Information about accessible files and directories or application handlers (in the event of rewrite or other user-friendly URLs techniques) is generally acquired by means of a dictionary attack. This is accomplished by using such tools as OWASP DirBuster.

Our goal was to refine the approach to web application file search by utilizing timing attacks. Ideally, we wanted to shift from classic dirbusting (a direct dictionary attack method) to a method similar to file searching by filename parts. In the simplest case, this will entail timing attacks on the file system, or in the case of application servers, it’ll mean using timing attacks against rewrite mechanisms and other methods of choosing an action for a specific URL.

We were inspired to perform this research by practical observations of TFTP server reply times on a network router (the times between a UDP datagram being sent and a reply being received) when requesting various filenames — part of our information security audit efforts. By statistically resolving relevant errors, we managed to calculate possible firmware file prefixes (the first 3–5 filename bytes on the server) via a dictionary attack. We then continued with the attack by going through the remaining filename parts looking for prefixes that create anomalous reply times.

Timing Attack Basics

Before we move forward, we need to get the theory down. Let’s take an abstract function A with user data and certain private data A (UserData, PrivateData). This function performs certain operations with PrivateData based on UserData, without fully displaying all PrivateData information in its output. For simplicity’s sake, let’s say that function A() doesn’t output anything at all.

Function A() is claimed to be vulnerable to a timing attack, if the result of function A() being executed depends on UserData in such a way that execution time for function A can be used to retrieve some information about PrivateData. A classic example from cryptography lies in restoring a private key (PrivateData) based on the time it takes to decrypt relevant cryptotexts (UserData).

File System Timing Attack

When it comes to file systems, it makes most sense to choose a model in which the function vulnerable to timing attacks will be a file search function that searches a catalog for names. For UNIX systems that would mean checking execution time of STAT() system call. Keep in mind that once we have the directory access permissions we can also check file permissions using data received from calling the same STAT(). As such file access permissions have no effect on the timing attack described here.

Of course, options for performing an attack aren’t limited by checking STAT() execution time. There are other options, but in our case, this particular approach looks as the most general and practically applicable model to research.

Algorithms for File Searching Within a Directory

Throughout our research project we looked into the functionality of file searching algorithms for the following file systems: EXT2, EXT3, EXT4, UFS2, NFS, FAT, and NTFS (chosen based on web server usage trends). Since they will directly affect reply times, all of the file search algorithms can be divided into unoptimized (lists) and optimized (trees and hash tables).

Comparative table of file search algorithms for different file systems

Let’s look at caching and how it influences the possibility of timing attack execution. At first glance it may seem that caching interferes with or even prevents such attacks, but the situation is actually quite the opposite. Caching (by this we mean keeping results of operations in memory) reduces the amount of requests to the data storage device, thus removing excess noise in relevant delay stats when accessing a file.

Ext2

Now let’s move on to specific file search algorithms for each of the aforementioned file systems. We’ll start with ext2. This file system uses a simple list, which means it’s searched by comparing each element with each other element. However, before filename strings are compared, the lengths of these strings are compared byte-by-byte. If the lengths don’t match, the comparison is automatically treated as a non-match.

As you can probably guess, this sort of optimization creates anomalies for filename lengths that don’t match any of the filenames in the catalog. Monitoring timing anomalies related to filename lengths is quite useful when attacking directories with files of only one format, e.g. log-20131113–1122-c4daa58ccb8ee718.dat. We’ll go into more detail about a practical attack against ext2 below.

Ext3/4

These file systems utilize an optimized HTree algorithm, initially developed for ext2 but never implemented in any of its stable branches. During file system creation, HTree directory indexing is enabled by default. However, it can be disabled, which makes everything identical to the aforementioned ext2 scenario.

The HTree algorithm creates hashed filenames and stores results in tree form. A timing attack on such an algorithm allows us to calculate stored filename hashes, which then leads to calculating filenames based on their hash functions.

By default ext3/4 utilize the half_md4 hash function, which uses 24 rounds (3x8). However, for a seed (a cryptologic initialization vector, aka IV) it uses a random number (4 words, 32 bits each) generated by mkfs or some other file system generation utility. Attacking a hash with an unknown IV of such length (128 bits) with a direct dictionary attack is impossible with modern day solutions. Research has yet to be performed to identify the cryptographic integrity of the half_md4 (24 round) algorithm.

Note that the use of IV instead of a classic salt on the left prevents the execution of a Length-Extension attack that’s usually viable for this type of hash. As such, the attacker has no way of creating a hash of a string containing a substring with a known hash. This sort of attack still requires knowing the state of the algorithm at the moment when the hash function is generated, meaning knowing the IV, which is inaccessible. Were the seed a classic salt, such an attack would actually be feasible.

However, despite the use of IVs, the ext3/4 file systems can still be vulnerable to timing attacks in scenarios when the attacker knows the IV value. For example, when publically available device firmware is used.

UFS2/NFS

The popular *BSD file system, UFS2, has two hash function implementations for OpenBSD and FreeBSD. FreeBSD uses an FNV hash, while OpenBSD uses DJB. However, from the perspective of a timing attack, there’s really no difference between the two. Neither hash function uses IV, nor do they use salts therefore fully allowing timing attacks. The hash table search algorithm used in these systems is fully optimized, which provides a solid timing effect.

FAT/NTFS

Windows file systems search algorithms are based on BTree. FAT has been optimized to use BTree starting Windows98. FAT used in DOS or open UNIX systems however uses a simple list.

The BTree is a tree data structure where nodes can contain several children (not to be confused with a binary tree). File systems using BTrees in search are also vulnerable to timing attacks as the file search times directly depend on the structure of a BTree, i.e. files in a directory.

PoC

For demonstration purposes we’ve developed a tool which is publicly available on our Github. It can be used to showcase the timing effects in various file systems. It compares execution times of STAT() system calls for different filenames and different set of filenames in a folder.

The tool implements the following basic checks:
STAT() execution time for a filename of length not present in the directory,
STAT() execution time for a filename of same length as other filenames in the directory but different in the first byte from existing filenames in the directory,
Same check for filenames that differ in 4th, 8th and 100th bytes from existing filenames in the directory.

Let’s take a look at a sample application for ext2. First, we create a new ext2 partition within a loop device (local file) for testing purposes:

$ dd if=/dev/zero of=ext2_example bs=1M count=128
$ mkfs.ext2 -F ext2_example
$ mount -o loop ext2_example /mnt/ext2_example
$ cd /mnt/ext2_example
$ git clone https://github.com/wallarm/researches.git

Then on line 18 of our code we chose the filename prefix of 20 bytes:

#define FILENAME_PREFIX "aaaaaaaaaaaaaaaaaaaa"

Then we modify the code to calculate the search time for target filenames that differs from existing filenames in each of the 20 prefix bytes sequentially (from 0 to 19, since the numbering starts with zero). To do this we replace the line 119 (check_diff_byte call) with the loop below:

int i;
for(i=0;i<20;i++){
    check_diff_byte( timings, rounds, measures, i);
    printf("testing timings on difference in %d byte", i);
    print_timings( "", timings, measures);
}

Now, we recompile and run:

$ gcc fs-timing.c -lm
$ mkdir test
$ ./a.out test/

As you can see below for ext2, if the directory contains a filename of the same length as the target filename, execution time of STAT() increases significantly: from ca. 450ms to ca. 660ms (in case filenames differ in the first byte).

Next we want to test the case when the target filename has the same length as other filenames in the directory. For this we measure execution time of STAT() as a function of byte position in which the target filename differs from existing filenames in directory. We find that that our results can be perfectly approximated with a linear function (see plot). This demonstrates the relevance of our theory to practical applications.

Wrapping Up & The Big Question: What’s next?

We are currently working on a complete framework for timing anomaly monitoring within file systems. It will be applicable to both local attacks and remote service scenarios (FTP, TFTP, HTTP, etc.). In the future, we would like to provide a solution for identifying web server files using these techniques alternative to the classic dirbusting,

However, our efforts in this area aren’t entirely focused on file systems, those will be announced at BlackHat 2014. In our project we look at timing attacks from the perspective of their application in nosql databases and key-value repositories, granting the ability to extract data from tables in a manner similar to SQL injections but without actually using them.