Advanced
15 functions for parallelism, data provenance, set operations, genomic intervals, and type guards. Power-user features for high-performance bioinformatics pipelines.
Parallelism
par_map
Parallel version of map. Distributes work across multiple threads. Order is preserved.
par_map(list, fn, threads?) -> list
| Parameter | Type | Description |
|---|---|---|
| list | list | Input list |
| fn | function | Transformation (must be side-effect free) |
| threads | int (optional) | Thread count (default: CPU count) |
# Process FASTQ files in parallel
let files = glob("raw_data/*.fastq.gz")
let stats = par_map(files, |f| {
let lines = read_lines(f)
let n_reads = len(lines) / 4
{"file": f, "reads": n_reads}
})
# Parallel checksums
let checksums = par_map(glob("*.bam"), |f| {
{"file": f, "sha256": sha256(read_text(f))}
})
par_filter
Parallel version of filter. Order is preserved.
par_filter(list, fn) -> list
# Filter large list in parallel
let valid = par_filter(sequences, |seq| {
len(seq) >= 100 and not contains(seq, "N")
})
Provenance
provenance
Get the data lineage / provenance chain for a value. Returns a map describing how the value was created: source files, transformations applied, timestamps.
provenance(value) -> map
let data = csv("input.csv")
|> filter(|r| r.pvalue < 0.05)
|> mutate("log2fc_abs", |r| abs(r.log2fc))
let prov = provenance(data)
println(json_pretty(prov))
# {
# "source": "input.csv",
# "operations": [
# {"op": "csv", "timestamp": "2026-03-05T14:30:00Z"},
# {"op": "filter", "predicate": "r.pvalue < 0.05", "rows_before": 10000, "rows_after": 523},
# {"op": "mutate", "column": "log2fc_abs"}
# ]
# }
Set Operations
union / intersection / difference
Mathematical set operations on lists. Results are deduplicated.
union(a, b) -> list
intersection(a, b) -> list
difference(a, b) -> list # elements in a but not in b
let de_genes = ["BRCA1", "TP53", "EGFR", "KRAS", "MYC"]
let pathway_genes = ["TP53", "MDM2", "CDKN2A", "BRCA1", "ATM"]
intersection(de_genes, pathway_genes) # ["BRCA1", "TP53"]
union(de_genes, pathway_genes) # ["BRCA1", "TP53", "EGFR", "KRAS", "MYC", "MDM2", "CDKN2A", "ATM"]
difference(de_genes, pathway_genes) # ["EGFR", "KRAS", "MYC"]
# Venn diagram overlap
let upregulated = filter(de, |r| r.log2fc > 1) |> map(|r| r.gene)
let chip_targets = read_lines("chip_peaks_genes.txt")
let overlap = intersection(upregulated, chip_targets)
println("Upregulated AND bound:", len(overlap), "genes")
Type Guards
Type Check Functions
Predicate functions that return true if a value matches the specified type. Useful for validating input in generic functions and pipelines.
| Function | Signature | Description |
|---|---|---|
| is_dna | is_dna(s) -> bool | String contains only [ATCGNatcgn] |
| is_rna | is_rna(s) -> bool | String contains only [AUCGNaucgn] |
| is_protein | is_protein(s) -> bool | String contains only valid amino acid letters |
| is_table | is_table(v) -> bool | Value is a table |
| is_list | is_list(v) -> bool | Value is a list |
| is_map | is_map(v) -> bool | Value is a map |
| is_num | is_num(v) -> bool | Value is int or float |
| is_str | is_str(v) -> bool | Value is a string |
is_dna("ATCGATCG") # true
is_dna("ATCXATCG") # false (X is not valid DNA)
is_rna("AUCGAUCG") # true
is_protein("MKLVFG") # true
# Validate input
fn process_sequence(seq) {
assert(is_str(seq), "Expected string input")
if is_dna(seq) {
println("Processing DNA sequence")
} else if is_rna(seq) {
println("Processing RNA sequence")
} else if is_protein(seq) {
println("Processing protein sequence")
} else {
println("Unknown sequence type")
}
}
# Guard in pipeline
let inputs = ["ATCGATCG", "not a sequence", "MKLVFG"]
let valid_dna = filter(inputs, is_dna) # ["ATCGATCG"]