File Investigator Identification Stages

The leading engine in file identification
The following stages are used to identify each file:

    1. 1. Match Legal Database(s) Hash Codes (optional)

      SHA-1, MD5 and CRC32 hash code values are calculated and matched against the entries in any EnCase, Hashkeeper, NIST NSRL or compatible legal hash code databases that the user has provided. Read the included fihash.txt document for details on using third party hash databases. This option does not move the File Investigator Hash database to the 1st stage, only the third party databases. By default this stage is included with the later Hash Code Matching stage, but can be moved to the front when Forensic Investigators need to eliminate 'Known Good' files before spending the time to identify them.

    1. 2. Match File Header/Magic #

The first 32 bytes, of each file, are read and matched against the entries in the File Investigator Pattern database. This is typically the first stage used.

    1. 3. Match Inter-File Pattern/Signature/Magic #

Several methods are used to match patterns from the File Investigator Pattern Database to patterns located deeper inside each file. This stage is used when the Header Matching stage fails.
Methods:
  • • Seek offset from end of file
  • • Seek offset, read new offset, Seek new offset (LH)
  • • Seek offset, read new offset, Seek new offset (HL)
  • • Find text string before offset (case sensitive)
  • • Find text string before offset (non-case sensitive)

    1. 4. Match Byte Value Distribution Pattern

The entire file is read, and all of the byte values are tallied. After being run through a normalizing calculation, the resulting pattern is matched against the entries in the File Investigator Byte Value Distribution (BVD) database. This stage is used when the Inter-File Matching stage fails.


    1. 5. Interpret & Validate Identification

Several methods are used to interpret and validate the results of the previous identification stages. Each time a potential identification is made, this stage is used to decide whether the identification is accurate. If it is found to be inaccurate, then the stage responsible for the pattern match is instructed to continue looking for a better match.

    1. 6. Match Hash Codes (Our hash DB, then the Legal DB(s))

SHA-1, MD5 and CRC32 hash code values are calculated and matched against the entries in the File Investigator Hash Database (first) and any legal hash code databases that the user has provided. Read the included fihash.txt document for details on using third party hash databases. This option does not use the third party hash databases if they were already used as the first identification stage.

    1. 7. Floating Header Match (Secondary)

If a file fails to get identified by the previous pattern matching and interpretation methods, then this option allows the file to be matched against the list of known file extensions. This is the method that MS Windows uses and produces a LOW accuracy rating in File Investigator.

    1. 8. Match Hash Codes (Secondary, Legal DB(s) only)
Same as the previous "Match Inter-File Pattern/Signature/Magic #" stage, but records the resulting FI Description Database index value(s) in separately in the "Numbers Metadata" field.  This optional stage is intended to catch files with floating headers that are stored within a different type of file that looks innocent.  This stage will be performed even when the file is already identified by a previous stage.  This "Floating Header" support has been added in response to a whitepaper located at http://www.securityelf.org/magicbyte.html.


    1. 9. Match File Extension

Same as the previous "Match Hash Codes" stage, but records the resulting FI Description Database index value(s) separately in the "Numbers Metadata" field. This optional stage allows files to be identified by their content as well as a legal hash code database, rather than by just one method. This stage will be performed even if the file is already identified by a previous stage.

    1. 10. Read Metadata 

Once a file is identified, an attempt will be made to read its Metadata values.

Any of the above stages can be disabled when the user wants to speed up the process.
* Forensic Innovations does not guarantee that all of the file's metadata is extracted.