How Internet Explorer stores web history

Internet Explorer stores files downloaded from the internet in a cache called Temporary Internet Files (e.g. html pages, images, CSS files). Each cached file is assigned an alphanumeric cache name. Some index.dat files serve to map the cached name with the filename and URL it came from. Other index.dat files store the userā€™s cookies or web browser history (by default 20 daysā€™ worth). index.dat files are in binary format, and need to be viewed using a hex editor.

There are numerous index.dat files kept on Windows machines. Assuming the computer is running Windows XP, the locations of the main index.dat files are:

C:\Documents and Settings\<UserName>\Local Settings\History\History.IE5\index.dat

(Older history index.dat files can be found in C:\…\History.IE5\MSHist[18digits])

C:\Documents and Settings\<UserName>\Local Settings\Temporary Internet Files\Content.IE5\index.dat

For Windows Vista and Windows 7, the corresponding paths are:

C:\Users\<UserName>\Local\Microsoft\Windows\History\History.IE5\index.dat

C:\Users\<UserName>\Local\Microsoft\Windows\Temporary Internet Files\Content.IE5\index.dat

The index.dat files all have the same format, and comprise of a header followed by a series of records. There are four types of records: HASH, REPR, URL and LEAK. HASH records are indexes to the other three record types, and can be ignored as they are only used internally by Internet Explorer. REPR, URL and LEAK are called activity records, since they each contain information about some sort of online browser activity.

There are a few differences between the various index.dat files. The one stored in the Temporary Internet Files folder (the “cache index.dat” file) is used to relate web files to those cached on the computer, so this additionally stores the names of the cached folders in the file header, and a reference to a corresponding cache folder within each activity record. Other differences will be explained below.

INDEX.DAT FILE HEADERS

The headers contain a small amount of information about the file and, for a cacheĀ index.dat, an array of cache folder names.

The image above shows the header of an example cache index.dat file. All index.dat files start with “Client UrlCache MMF” followed by the version number, which is shown in red. Next, in blue, is the size of the file. All numbers are stored little-endian. Following on, in yellow, is a pointer to the start of the first record. In this example the next part of the header names four subfolders where the cached files are located ā€“ shown in green. In non-cache index.dat files, these would be 0x00 (null) values.

INDEX.DAT FILE CONTENTS

There are three types of activity records. These contain URL information and have the following common structure, illustrated by the image below:

  • TYPE: 4 bytes, eitherĀ URL, LEAKĀ orĀ REDR. Shown in yellow.
  • LENGTH: 4 bytes, contains the length of the record in 128 byte (0x80) sized blocks.Ā 
  • DATA: variable length, the data we are interested in. Shown in grey. The end of every record is given by a 0x00 character, which can be seen in blue. The rest of the record is just filled with junk.

REDR ACTIVITY RECORDS

REDR records contain just a URL and indicate a redirect to a different location.

URL ACTIVITY RECORDS

These are the important records and an example can be seen in the image below. The information held in the DATA section is dependent on the type of index.dat file. They all start with the last modified time (in blue) followed by the last accessed time (in green). Time is stored in Windows FILETIME format (100-nanosecond intervals since 1st January, 1601 UTC).

If theĀ index.datĀ is a cache file, like that of the image below, the structure follows that of Table 1. If theĀ index.datĀ is a history file, the structure follows that of Table 2, and looks like the final image.

LocationMeaning
38 bytes inReference to the cache folder the file is located in. This is just one byte long and is an index into the array of cache folders given in the file header. Shown in dark grey (second to last image).
96 bytes inThe URL the file came from (shown in purple). This is followed by the name of the corresponding cached file stored on disk (orange) and finally the HTTP headers (dark blue). Each part starts on a new 16 byte boundary. The Windows username is attached to the end of the HTTP headers.
Table 1 – TheĀ DATAĀ structure of a URL activity record in a cacheĀ index.datĀ file

LocationMeaning
96 bytes inA URL starting with “Visited: <user>@”. This is a URL the user with the login name <user> has visited using their Internet Explorer browser (shown in purple in the last image).
Table 2 – TheĀ DATAĀ structure of a URL activity record in aĀ History.IE5 index.datĀ file

LEAK ACTIVITY RECORDS

LEAK activity records look the same as URL activity records, and are essentially a Microsoft term for an error.

References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s