Hash & Sketching
3 functions for checksums and sequence sketching. Essential for data integrity verification and approximate sequence comparison.
verify_checksum
Verify a file's checksum against an expected value. Supports MD5, SHA-1, and SHA-256 algorithms.
verify_checksum(path, expected, algorithm?) -> bool
| Parameter | Type | Description |
|---|---|---|
| path | string | Path to file |
| expected | string | Expected checksum (hex string) |
| algorithm | string (optional) | "md5", "sha1", or "sha256" (default: "sha256") |
# Verify downloaded reference genome
let ok = verify_checksum("GRCh38.fa.gz", "abc123def...", "sha256")
assert(ok, "Checksum mismatch!")
# Verify with MD5
let valid = verify_checksum("data.bam", "d41d8cd98f00b204e9800998ecf8427e", "md5")
println("File integrity:", valid)
sketch
Create a bottom-sketch (sorted MinHash) of k-mer hashes from a sequence. Compact representation for approximate sequence comparison.
sketch(sequence, k, sketch_size) -> list
| Parameter | Type | Description |
|---|---|---|
| sequence | string | DNA/RNA/protein sequence |
| k | int | K-mer size (e.g., 21 for DNA, 10 for protein) |
| sketch_size | int | Number of minimum hashes to keep |
let s1 = sketch("ATCGATCGATCGATCGATCGATCG", 7, 100)
let s2 = sketch("ATCGATCGATCGATCGATCGATCG", 7, 100)
# s1 == s2 (identical sequences)
# Compare two sequences with sketch_dist
let d = sketch_dist(s1, s2)
println("Distance:", d)