Advanced

15 functions for parallelism, data provenance, set operations, genomic intervals, and type guards. Power-user features for high-performance bioinformatics pipelines.

Parallelism

par_map

Parallel version of map. Distributes work across multiple threads. Order is preserved.

par_map(list, fn, threads?) -> list
ParameterTypeDescription
listlistInput list
fnfunctionTransformation (must be side-effect free)
threadsint (optional)Thread count (default: CPU count)
# Process FASTQ files in parallel
let files = glob("raw_data/*.fastq.gz")
let stats = par_map(files, |f| {
  let lines = read_lines(f)
  let n_reads = len(lines) / 4
  {"file": f, "reads": n_reads}
})

# Parallel checksums
let checksums = par_map(glob("*.bam"), |f| {
  {"file": f, "sha256": sha256(read_text(f))}
})

par_filter

Parallel version of filter. Order is preserved.

par_filter(list, fn) -> list
# Filter large list in parallel
let valid = par_filter(sequences, |seq| {
  len(seq) >= 100 and not contains(seq, "N")
})

Provenance

provenance

Get the data lineage / provenance chain for a value. Returns a map describing how the value was created: source files, transformations applied, timestamps.

provenance(value) -> map
let data = csv("input.csv")
  |> filter(|r| r.pvalue < 0.05)
  |> mutate("log2fc_abs", |r| abs(r.log2fc))

let prov = provenance(data)
println(json_pretty(prov))
# {
#   "source": "input.csv",
#   "operations": [
#     {"op": "csv", "timestamp": "2026-03-05T14:30:00Z"},
#     {"op": "filter", "predicate": "r.pvalue < 0.05", "rows_before": 10000, "rows_after": 523},
#     {"op": "mutate", "column": "log2fc_abs"}
#   ]
# }

Set Operations

union / intersection / difference

Mathematical set operations on lists. Results are deduplicated.

union(a, b) -> list
intersection(a, b) -> list
difference(a, b) -> list          # elements in a but not in b
let de_genes = ["BRCA1", "TP53", "EGFR", "KRAS", "MYC"]
let pathway_genes = ["TP53", "MDM2", "CDKN2A", "BRCA1", "ATM"]

intersection(de_genes, pathway_genes)   # ["BRCA1", "TP53"]
union(de_genes, pathway_genes)          # ["BRCA1", "TP53", "EGFR", "KRAS", "MYC", "MDM2", "CDKN2A", "ATM"]
difference(de_genes, pathway_genes)     # ["EGFR", "KRAS", "MYC"]

# Venn diagram overlap
let upregulated = filter(de, |r| r.log2fc > 1) |> map(|r| r.gene)
let chip_targets = read_lines("chip_peaks_genes.txt")
let overlap = intersection(upregulated, chip_targets)
println("Upregulated AND bound:", len(overlap), "genes")

Type Guards

Type Check Functions

Predicate functions that return true if a value matches the specified type. Useful for validating input in generic functions and pipelines.

FunctionSignatureDescription
is_dnais_dna(s) -> boolString contains only [ATCGNatcgn]
is_rnais_rna(s) -> boolString contains only [AUCGNaucgn]
is_proteinis_protein(s) -> boolString contains only valid amino acid letters
is_tableis_table(v) -> boolValue is a table
is_listis_list(v) -> boolValue is a list
is_mapis_map(v) -> boolValue is a map
is_numis_num(v) -> boolValue is int or float
is_stris_str(v) -> boolValue is a string
is_dna("ATCGATCG")     # true
is_dna("ATCXATCG")     # false (X is not valid DNA)
is_rna("AUCGAUCG")     # true
is_protein("MKLVFG")   # true

# Validate input
fn process_sequence(seq) {
  assert(is_str(seq), "Expected string input")
  if is_dna(seq) {
    println("Processing DNA sequence")
  } else if is_rna(seq) {
    println("Processing RNA sequence")
  } else if is_protein(seq) {
    println("Processing protein sequence")
  } else {
    println("Unknown sequence type")
  }
}

# Guard in pipeline
let inputs = ["ATCGATCG", "not a sequence", "MKLVFG"]
let valid_dna = filter(inputs, is_dna)   # ["ATCGATCG"]