thash method is an improved way of hashing files, designed for integrity applications and optimized to deliver speed in hash generation and verification for large data volumes.
auditor
is the first forensic tool to implement this method. See about auditor
In normal method, that is how all tools works, the content of file is read and hashed whole, as one single block:
In thash method, we consider the file as individual blocks of size BlockSize, that are processed separately and in parallel. The hashes of each block are then concatenate in the same order of blocks, and finally, calculate the hash of this concatenated block of hashes. Its similar to Blockchains: Hash of hashes.
However, if the whole file fits inside just one block (fileSize<=blockSize), the above normal method is applied.
Speed: As the hash of blocks can be done in parallel, the process is much faster.
Level of security: The same level as the normal process, as all original data is hashed and any modification will result in a different hash. The idea of Hash of hashes was inpired in blockchains of cryptocurrency, widely used. Click here to questions.
Hash Algorithm: Can be any one, as SHA256, SHA512, BLAKE3, etc.
BlockSize: This will be the size of each block with the unit of measurement (KB, MB, GB, TB, etc.). The last Block will be the remainder part (unless FileSize matches a multiple of BlockSize). In processing very big amount of data, the blockSize can be fixed for all files, or it can be calculated automatically to each file based on his properties (fileSize, for instance), or on the type of hardware being used (data storage, SSD disks, processors architecture, num cpus available, memory size, etc.) or others aspects.
Normal method identification:
SHA256 = normal method using hash algorithm: SHA256, no blockSize.
thash identification: To correctly identify when a file was hashed by thash method, we use the tag <thash-BlockSize>
as:
SHA256<thash-50MB> = algorithm: SHA256 using thash method with BlockSize: 50MB.
BLAKE3<thash-1GB> = algorithm: BLAKE3, using thash method with BlockSize: 1GB.
Attention!To ensure Verification succeeds, it is necessary to use the same BlockSize used in generation; otherwise, it will fail! So, as to you need store which hash algorithm was used, the BlockSize used also needs to be stored.
Below it is presented a script code that does what thash method does. It can be used as proof of concept and also
can be used to test for the correctness of the auditor
.
proof.sh
#!/bin/bash # This script simulates thash method with selectable hash algorithms. if [ "$#" -ne 3 ]; then echo "Use: $0 file alg_hash blockSize" echo "Ex: $0 sha256 10MB" echo "alg_hash can be: sha256, sha512, sha3-256, sha3-512, k12 or blake3 ." echo "blockSize is a number with KB, MB, GB or TB. Ex: 10MB, 5GB, etc" echo "alg_hash will use one of commands to hash: sha256sum, sha512sum, sha3sum, k12sum or b3sum . So, it needs to in the path !" echo "If the file is in current dir, use ./file" exit 1 fi # Get arguments file="$1" alg_hash="$2" block_str_auditor="$3" block_str="${block_str_auditor::-1}" last_char="${block_str_auditor: -1}" if [[ "$last_char" != "B" ]]; then echo "Error: Last character must be 'B'" >&2 exit 1 fi unity="${block_str_auditor: -2:1}" if [[ ! "$unity" =~ ^[KMGTPE]$ ]]; then echo "Error: Invalid unit '$unity'" >&2 exit 1 fi # Test if file exists if [ ! -f "$file" ]; then echo "File '$file' not found." exit 1 fi # Select hashing utility based on alg_hash option_cmd="" case "$alg_hash" in sha256) hash_cmd="sha256sum" ;; sha512) hash_cmd="sha512sum" ;; sha3-256) hash_cmd="sha3sum" option_cmd=" -a 256" ;; sha3-512) hash_cmd="sha3sum" option_cmd=" -a 512" ;; k12) hash_cmd="k12sum" ;; blake3) hash_cmd="b3sum" ;; *) echo "Invalid hash algorithm. Choose: sha256, sha512, sha3-256, sha3-512, k12, blake3." exit 1 ;; esac if ! command -v "$hash_cmd" &> /dev/null; then echo "Error: $hash_cmd is not installed. Install it and try again." exit 1 fi hash_cmd="$hash_cmd$option_cmd" auditor_cmd="auditor" if ! command -v "$auditor_cmd" &> /dev/null; then echo "Error: $auditor_cmd is not installed. Install it and try again." exit 1 fi # Create dir, name files, clean eventual files dir_name="${file}_dir_${block_str}_${alg_hash}" file_chain_hash="${file}_${block_str}_chain.${alg_hash}.txt" file_thash="${file}_${block_str}.${alg_hash}.thash.txt" file_thash_auditor="${file}_${block_str}.${alg_hash}.thash.auditor.txt" mkdir -p "$dir_name" rm -f "$dir_name"/part_* # Split file in blocks of blockSize split -b "$block_str" "$file" "$dir_name"/part_ # Hash each split file and save the hash for part_file in "$dir_name"/part_*; do $hash_cmd "$part_file" > "$part_file"."${alg_hash}".hash.txt # sha3sum prints c2a0 (in hex) after hash value. Need remove it, as well \n and spaces cat "$part_file"."${alg_hash}".hash.txt | awk '{gsub(/ *\*.*/, ""); print $1}' | cut -d " " -f 1 | tr -d '\n' | sed $'s/\xc2\xa0//g' > "$part_file"."${alg_hash}".txt done # Chain all hashes cat "$dir_name"/*."${alg_hash}".txt > "$file_chain_hash" # Hash the chain of hashes, cutting spaces and newline char $hash_cmd "$file_chain_hash" | cut -d " " -f 1 | tr -d '\n' > "$file_thash" # Print result echo "${alg_hash}<thash-${block_str_auditor}> with this script and ${hash_cmd}:" cat "$file_thash" echo "" echo "" echo "${alg_hash}<thash-${block_str_auditor}> with auditor:" auditor hash $file -a $alg_hash -b $block_str_auditor -l -q > "$file_thash_auditor" 2> /dev/null cat "$file_thash_auditor" echo "" echo "normal hash with ${hash_cmd}" $hash_cmd "$file" echo "" echo "normal hash with auditor " auditor hash $file -d -a $alg_hash -q -l 2> /dev/null echo ""
Here are the results of proof.sh
applies to a random file:
proof.sh results:
If you have comments, or concerns, about the security of this method, a public discussion can be found here: crypto.stackexchange.com