Logothash method

thash method is an improved way of hashing files, designed for integrity applications and optimized to deliver speed in hash generation and verification for large data volumes.

How it Works

Normal Method

Consider the file as a single block, read all and calculate a single hash:

Normal Method

thash method

In thash method, we consider the file as individual blocks of size BlockSize, process the hash of each block separately. Then, concatenate the hashes of each block in the same order in which the blocks were divided, and finally, calculate the hash of this concatenated block of hashes.:

However, if the whole file fits inside just one block (fileSize<=blockSize), the above normal method is applied.

How thash works Obs. Similar to Blockchain: Hash of hashes.


Advantage

Speed: As the hash of blocks can be done in parallel, the process is much faster.

Level of security: The same level as the normal process, as all original data is hashed and any modification will result in a different hash.

Considerations

Hash Algorithm: Can be any one, as sha256, sha512, whirlpool, etc.

BlockSize:This will be the size of each block with the unit of measurement (KB, MB, GB, TB, etc.). The last Block will be the remainder part (unless FileSize matches a multiple of BlockSize). In processing very big amount of data, the blockSize can be fixed for all files, or it can be calculated automatically to each file based on his properties (fileSize, for instance), or on the type of hardware being used (data storage, SSD disks, processors architecture, num cpus available, memory size, etc.) or others aspects.

thash: To correctly identify when a file was hashed by thash method, the char 't' can be put in front of the algorithm, separated by char '-' to indicate the BlockSize used.

Normal method identification:

sha256 = normal method using hash algorithm: sha256, no blockSize. 

thash identification:

tsha256-50MB = thash method with algorithm: sha256, BlockSize: 50MB. 

tsha512-1GB = thash method with algorithm: sha512, BlockSize: 1GB. 

twhirlpool-15GB = thash method with algorithm: whirlpool, BlockSize: 15GB.


Attention!To ensure Verification succeeds, it is necessary to use the same BlockSize used in generation; otherwise, it will fail! So, as to you need store which hash algorithm was used, the BlockSize used also needs to be stored.

auditor

auditor is a forensic tool for fast integrity auditing. It is a command-line tool that implements the thash method. It is similar to other popular tools (fsum, hashdeep, sha256sum, etc.), but faster.

Usage

Example of Forensic Audit Generation

auditor fgen C:\Folder1\Data\ -o -b 10MB

Obs.The parameters -o will overwrite audit files if some exist, -b 10MB indicates to use blocksize of 10 MegaBytes

This will generate two audit files inside the root of C:\Folder1\Data\:


You can see the results of this command Here. The format of files generates are described in Output Formats.


Example of Forensic Audit Check

auditor fcheck C:\Folder1\Data\

This will:

You can see the results of this command Here



Others parameters

Others parameters can be used in the command line, as n_workers, buffersize, overwrite results, change default name of audit files, quiet mode, stop on error, etc. that are ommited here for sake of simplicity.

Check all parameters available, using:

Global Parameters:

auditor --help

Parameters of subcommands:

auditor fgen --help
auditor fcheck --help

Important: Global parameters must be before the subcommand (fgen, fcheck). Parameters specifics to each subcommand must be after subcommand.



Audit file Integrity

To securely ensure future check of all chain of integrity, you should save all data, including audit files, and either print the contents of Audit Stamp or digitally sign this file. When someone performs a check, the content of audit file Audit Stamp MUST BE the same of printed or digitally signed. This will prevent malicious behaviour.

If you don't have a digital certificate, you can use a free timestamping authority, such as freetsa.org.

Output Formats

Simple Format: The result is text file where each line contains the hash value, the method an algorithm used to calculate the hash, the size file of file, and the relative file path from the original directory:

281d5d93464f1165ea7c403ca99d63ff4bf9a360864f8df4bd0e8e6c03774e98 ?tsha256-50MB|500000*file_hashed.bin
7357b67824d086dc53f5e1ded565f500456bea1812783f1fbcddc08fddc3944c ?sha256|2233*hashes.txt

Obs 1. Example of simple format, one using thash with algorithm sha256 and BlockSize 50MB, and other using normal method, just with algorithm sha256.

Obs 2. The size of file is important, because with it we can minimize check verification time. Why hash a big file when already is known that its size does not match with original?


Full format: To minimize check verification time, whe can store information of each block. This enable that a fail of a independent block check can fail the whole processs too..

JSON Format: can be implemented in future, to be used for several others applications.

Download

Latest version of auditor:

Windows 64 bits

Proof

Below it is presented a script code that does what auditor (that implements thash method) does. It can be used as proof of concept.

proof.sh
#!/bin/bash
# This script simulates thash method. 

if [ "$#" -ne 2 ]; then
    echo "Use: $0 file blockSize"
    exit 1
fi

# get arguments
file="$1"
block_str="$2"

# test if file exists
if [ ! -f "$file" ]; then
    echo "File '$file' not found."
    exit 1
fi

#create dir, name files, clean eventual files
dir_name="${file}_dir"
file_chain_hash="${file}_${block_str}_chain.sha256.txt"
file_thash="${file}_${block_str}.thash.txt"
mkdir -p "$dir_name"
rm "$dir_name"/part_*

#split file in blocks of blockSize
split -b "$block_str" "$file" "$dir_name"/part_

# Hash each split file and save the hash
for file in "$dir_name"/part_*; do
    #cut part is to remove filename and tr is to remove \n, then save just the hash 
    sha256sum "$file" | cut -d " " -f 1 | tr -d '\n' > "$file".sha256.txt
done

#chain all hashes
cat "$dir_name"/part_*.sha256.txt > "$file_chain_hash"

#hash the chain of hashes, cutting the hash and tr is to remove \n.
sha256sum  "$file_chain_hash" | cut -d " " -f 1 | tr -d '\n' > "$file_thash"

#print just the hash and echo to prompt shows in newline
echo "Thash value with block ${block_str}:"
cat "$file_thash"
echo ""
            

The results of proof.sh and the results of auditor applies to the same file file_bin its shown below:

Example of Forensic Audit Generation with auditor fgen

Forensic Audit Generation Obs 1.: The same result obtained by script proof.sh is produced by auditor in the file Audit FullList, that list all files hashed.

Obs 2.: auditor uses sha256 as default algorithm and the parameter -b specifies the block size of 20MB. This are the same parameters used in proof.sh, and to do the same thing.

Example of Forensic Audit Check with auditor fcheck

Here are the results of proof.sh and the results of auditor applies to the same file file_bin:

proof.sh results: Proof

Example of Forensic Audit Generation with auditor fgen

auditor fgen results: Forensic Audit Generation Obs 1.: The same result obtained by script proof.sh is produced by auditor in the file Audit FullList, that list all files hashed.

Obs 2.: auditor uses sha256 as default algorithm and the parameter -b specifies the block size of 20MB. This are the same parameters used in proof.sh, and to do the same thing.

Example of Forensic Audit Generation with auditor fgen --disable-thash-method

auditor fgen results 'no thash': Forensic Audit Generation - No Thash Method Obs 3.: Just to show that normal sha256 can be produced by auditor, using parameter -d (--disable-thash-method ). This hash matches with produced by sha256sum and listed in proof.sh results

Example of Forensic Audit Check with auditor fcheck

auditor fcheck results: Forensic Audit Check See Usage for details
Have suggestions or found a bug? Contact us at: contact@thash.org