Hash & Sketching

3 functions for checksums and sequence sketching. Essential for data integrity verification and approximate sequence comparison.

verify_checksum

Verify a file's checksum against an expected value. Supports MD5, SHA-1, and SHA-256 algorithms.

verify_checksum(path, expected, algorithm?) -> bool
ParameterTypeDescription
pathstringPath to file
expectedstringExpected checksum (hex string)
algorithmstring (optional)"md5", "sha1", or "sha256" (default: "sha256")
# Verify downloaded reference genome
let ok = verify_checksum("GRCh38.fa.gz", "abc123def...", "sha256")
assert(ok, "Checksum mismatch!")

# Verify with MD5
let valid = verify_checksum("data.bam", "d41d8cd98f00b204e9800998ecf8427e", "md5")
println("File integrity:", valid)

sketch

Create a bottom-sketch (sorted MinHash) of k-mer hashes from a sequence. Compact representation for approximate sequence comparison.

sketch(sequence, k, sketch_size) -> list
ParameterTypeDescription
sequencestringDNA/RNA/protein sequence
kintK-mer size (e.g., 21 for DNA, 10 for protein)
sketch_sizeintNumber of minimum hashes to keep
let s1 = sketch("ATCGATCGATCGATCGATCGATCG", 7, 100)
let s2 = sketch("ATCGATCGATCGATCGATCGATCG", 7, 100)
# s1 == s2 (identical sequences)

# Compare two sequences with sketch_dist
let d = sketch_dist(s1, s2)
println("Distance:", d)