thash method is an improved way of hashing files, designed for integrity applications and optimized to deliver speed in hash generation and verification for large data volumes.
auditor
is the first forensic tool to implement this method. See about auditor
Consider the file as a single block, read all and calculate a single hash:
In thash method, we consider the file as individual blocks of size BlockSize, process the hash of each block separately. Then, concatenate the hashes of each block in the same order in which the blocks were divided, and finally, calculate the hash of this concatenated block of hashes.:
However, if the whole file fits inside just one block (fileSize<=blockSize), the above normal method is applied.
Obs. Similar to Blockchain: Hash of hashes.Speed: As the hash of blocks can be done in parallel, the process is much faster.
Level of security: The same level as the normal process, as all original data is hashed and any modification will result in a different hash. The idea of Hash of hashes was inpired in blockchains of cryptocurrency, widely used. Click here to questions.
Hash Algorithm: Can be any one, as sha256, sha512, whirlpool, etc.
BlockSize: This will be the size of each block with the unit of measurement (KB, MB, GB, TB, etc.). The last Block will be the remainder part (unless FileSize matches a multiple of BlockSize). In processing very big amount of data, the blockSize can be fixed for all files, or it can be calculated automatically to each file based on his properties (fileSize, for instance), or on the type of hardware being used (data storage, SSD disks, processors architecture, num cpus available, memory size, etc.) or others aspects.
thash identification: To correctly identify when a file was hashed by thash method, the char 't' can be put in front of the algorithm, separated by char '-' to indicate the BlockSize used.
Normal method identification:
sha256 = normal method using hash algorithm: sha256, no blockSize.
thash identification:
tsha256-50MB = thash method with algorithm: sha256, BlockSize: 50MB.
tsha512-1GB = thash method with algorithm: sha512, BlockSize: 1GB.
twhirlpool-15GB = thash method with algorithm: whirlpool, BlockSize: 15GB.
Attention!To ensure Verification succeeds, it is necessary to use the same BlockSize used in generation; otherwise, it will fail! So, as to you need store which hash algorithm was used, the BlockSize used also needs to be stored.
Below it is presented a script code that does what thash method does. It can be used as proof of concept and also
can be used to test for the correctness of the auditor
.
proof.sh
#!/bin/bash
# This script simulates thash method.
if [ "$#" -ne 2 ]; then
echo "Use: $0 file blockSize"
exit 1
fi
# get arguments
file="$1"
block_str="$2"
# test if file exists
if [ ! -f "$file" ]; then
echo "File '$file' not found."
exit 1
fi
#create dir, name files, clean eventual files
dir_name="${file}_dir"
file_chain_hash="${file}_${block_str}_chain.sha256.txt"
file_thash="${file}_${block_str}.thash.txt"
mkdir -p "$dir_name"
rm "$dir_name"/part_*
#split file in blocks of blockSize
split -b "$block_str" "$file" "$dir_name"/part_
# Hash each split file and save the hash
for file in "$dir_name"/part_*; do
#cut part is to remove filename and tr is to remove \n, then save just the hash
sha256sum "$file" | cut -d " " -f 1 | tr -d '\n' > "$file".sha256.txt
done
#chain all hashes
cat "$dir_name"/part_*.sha256.txt > "$file_chain_hash"
#hash the chain of hashes, cutting the hash and tr is to remove \n.
sha256sum "$file_chain_hash" | cut -d " " -f 1 | tr -d '\n' > "$file_thash"
#print just the hash and echo to prompt shows in newline
echo "Thash value with block ${block_str}:"
cat "$file_thash"
echo ""
Here are the results of proof.sh
applies to the file file_bin
:
proof.sh results:
If you have comments, or concerns, about the security of this method, a public discussion can be found here: crypto.stackexchange.com