Pattern Matching Comparison

Comparing the efficiency of matching Hash Codes or Pattern Signatures
As a comparison test, we scanned a hard drive with Hash Code Matching (as the primary step), then the same hard drive with our Pattern Matching (as the primary step).

Summary

When Pattern Matching was the primary step, Hash Code Matching was still used on the files when Pattern Matching failed. When Hash Code Matching was the primary step, Pattern Matching was still used on the files when Hash Code Matching failed.

  • • 89 Gigabytes of Files
  • • 214,142 Files (+94 more files for Pattern Test due to OS operation)
  • • Used NIST NSRL v2.8 Hash Code database (all 4 discs)
  • • Used Hashkeeper Win98 with Office 2000+ database
  • • Used Hashkeeper WinNT 4 CD database

Hash Code Matching (Primary) Test:

  • • 214,142 Total Files Tested (variation between tests, due to OS operations)
  • • 23.53% Identified by Hash Codes (50,392)
  • • 75.69% Identified by Pattern Signatures (162,077 files)
  • • 00.70% Identified by File Extension (1,501 files)
  • • 00.08% Unidentifiable (172 files)
  • • 07.33% Files with Wrong Extensions (15,686 files)
  • • 4x the time to process all files (11 hours, 52 minutes)
  • • 27 Identified by 'Known Bad' Hash Codes (ID #1369)

Pattern Matching (Primary) Test:

  • • 214,236 Total Files Tested (variation between tests, due to OS operations)
  • • 98.60% Identified by Pattern Signatures (211,244 files)
  • • 00.62% Identified by 'Known Good' Hash Codes (1,318 files; ID #1368)
  • • 00.70% Identified by File Extension (1,502 files)
  • • 00.08% Unidentifiable (172 files)
  • • 03.70% Files with Wrong Extensions (7,925 files)
  • • 1/4 the time to process all files (2 hours, 48 minutes)

Conclusions

When searching for 'Known Bad' files by hash code, setting Hash Code Matching as Primary was best, because it found 27 'Known Bad' files and eliminated 23.53% from any further investigation time. The Pattern Matching as Primary did not find any 'Known Bad' files.

When searching for potential evidence files, setting Pattern Matching as Primary appeared to be best, because it took 1/4 the time and cut the number of suspicious Wrong File Extension files in half. However, the extra 22.91% of files not eliminated as 'Known Good' may increase investigation time depending on their file types.

When identifying files for other non-criminal purposes, setting Pattern Matching as Primary was best, because it took 1/4 the time. This is the default configuration of our File Investigator products.

Test Results

The test used File Investigator Directory for Windows v2.07.00 in a Windows DOS box.

The following command line was used for the Pattern Matching test:

>fiwdir.exe c:\*.* /D /I /RT /S

/D turns off the directory file size adding. (increases speed)
/I turns on the filter to only show files with the wrong file extension.
/RT turns on the summary report output.
/S turns on recursive directory scanning.

The resulting summary report is available here.

The following command line was used for the Pattern Matching test:

>fiwdir.exe c:\*.* /D /I /RT /S /ST0

/ST0 moved the legal hash database matching stage before all of the pattern matching stages.

The resulting summary report is available here.

Hardware Details

The test was run on a Dell Dimension 3000, Pentium 4, 3 GHz, 512 MB RAM, with MS Windows XP Home Edition (Service Pack 2) installed on a 145 GB partition of a 160 GB Western Digital hard drive.