Appendix A: Builtin Reference

BioLang ships with a comprehensive standard library of builtins designed for bioinformatics workflows. Every function listed here is available without imports – they are part of the language runtime.

Sequence Operations

Builtins that operate on bio-typed sequences (dna, rna, protein).

Builtin	Description
`complement(seq) -> Dna \| Rna`	Watson-Crick complement of a nucleotide sequence
`reverse_complement(seq) -> Dna \| Rna`	Reverse complement – the opposing strand
`transcribe(seq) -> Rna`	Transcribe DNA to RNA (T to U)
`translate(seq) -> Protein`	Translate an RNA or DNA coding sequence to amino acids
`gc_content(seq) -> Float`	GC fraction of a nucleotide sequence (0.0 – 1.0)
`find_motif(seq, pattern) -> List[Int]`	All start positions where `pattern` occurs in `seq`
`hamming_distance(a, b) -> Int`	Number of mismatched positions between equal-length sequences
`edit_distance(a, b) -> Int`	Edit distance between two sequences
`find_orfs(seq, min_len?) -> List[Record]`	Open reading frames with start, stop, and frame fields
`restriction_sites(seq, enzyme?) -> List[Record]`	Recognition sites for restriction enzymes
`tm(seq) -> Float`	Melting temperature estimate for a short oligonucleotide
`slice(seq, start, end) -> Dna \| Rna \| Protein`	Extract a subsequence by 0-based coordinates

# Example: quick primer analysis
let primer = dna"ATCGATCGATCG"
let rc     = reverse_complement(primer)
let temp   = tm(primer)
print("Primer Tm = " + str(temp) + "C, reverse complement = " + str(rc))

Collection Operations

General-purpose operations on lists, records, and sets.

Builtin	Description
`len(coll) -> Int`	Number of elements in a list, string, or sequence
`push(list, item) -> List`	Append an element, returning a new list
`pop(list) -> List`	Remove the last element, returning a new list
`concat(a, b) -> List`	Concatenate two lists
`flatten(nested) -> List`	Flatten one level of nesting
`reverse(list) -> List`	Reverse element order
`contains(coll, item) -> Bool`	True if `item` is present
`index_of(list, item) -> Int \| Nil`	First index of `item`, or nil
`last(list) -> Any`	Last element
`first(list) -> Any`	First element
`head(list, n) -> List`	First `n` elements
`tail(list, n) -> List`	Last `n` elements
`unique(list) -> List`	Remove duplicates, preserving order
`zip(a, b) -> List`	Pair elements from two lists into a list of tuples
`enumerate(list) -> List`	Pair each element with its 0-based index
`chunk(list, size) -> List[List]`	Split into fixed-size sublists
`window(list, size) -> List[List]`	Sliding window of the given size
`scan(list, init, fn) -> List`	Running accumulation (like reduce but keeps intermediates)
`range(start, end, step?) -> List`	Integer range
`set(list) -> Set`	Convert a list to a deduplicated set
`keys(record) -> List[Str]`	Field names of a record
`values(record) -> List`	Field values of a record
`has_key(record, key) -> Bool`	True if the record contains the named field
`sort_by(list, fn) -> List`	Sort by a key function
`group_by(list, fn) -> Record`	Group elements by a key function into a record of lists
`partition(list, fn) -> [List, List]`	Split into elements that pass and fail a predicate

# Example: enumerate quality-filtered reads
let good_reads = reads
  |> filter(|r| mean_phred(r.quality) > 30)
  |> enumerate()
  |> head(5)

Higher-Order Functions

Functions that accept other functions as arguments – the backbone of BioLang’s pipeline style.

Builtin	Description
`map(coll, fn) -> List`	Apply `fn` to every element
`filter(coll, fn) -> List`	Keep elements where `fn` returns true
`reduce(coll, init, fn) -> Any`	Fold elements into a single value
`sort(coll, fn?) -> List`	Sort, optionally by comparator
`each(coll, fn) -> Nil`	Execute `fn` for side effects on every element
`flat_map(coll, fn) -> List`	Map then flatten one level
`take_while(coll, fn) -> List`	Take leading elements while predicate holds
`any(coll, fn) -> Bool`	True if `fn` returns true for at least one element
`all(coll, fn) -> Bool`	True if `fn` returns true for every element
`none(coll, fn) -> Bool`	True if `fn` returns true for no elements
`find(coll, fn) -> Any \| Nil`	First element satisfying `fn`
`find_index(coll, fn) -> Int \| Nil`	Index of first element satisfying `fn`
`par_map(coll, fn) -> List`	Parallel map across available cores
`par_filter(coll, fn) -> List`	Parallel filter across available cores
`mat_map(matrix, fn) -> Matrix`	Apply `fn` element-wise to a matrix
`try_call(fn, args) -> Result`	Call `fn` with `args`, capturing errors instead of panicking

# Example: parallel GC content across a genome's chromosomes
let gc_values = chromosomes
  |> par_map(|chr| {name: chr.name, gc: gc_content(chr.seq)})
  |> sort_by(|r| r.gc)

Table Operations

Tabular data manipulation inspired by dataframe semantics – designed for sample sheets, variant tables, and expression matrices.

Builtin	Description
`table(data) -> Table`	Create a table from a list of records
`select(tbl, ...cols) -> Table`	Pick columns by name
`mutate(tbl, name, fn) -> Table`	Add or transform a column
`summarize(tbl, ...aggs) -> Table`	Aggregate columns (sum, mean, etc.)
`group_by(tbl, col) -> GroupedTable`	Group rows by a column value (table variant)
`join(a, b, on, how?) -> Table`	Join two tables on a key column
`csv(path) -> Table`	Read a CSV file into a table
`tsv(path) -> Table`	Read a TSV file into a table
`write_csv(tbl, path) -> Nil`	Write a table to CSV
`write_tsv(tbl, path) -> Nil`	Write a table to TSV
`nrow(tbl) -> Int`	Number of rows
`ncol(tbl) -> Int`	Number of columns
`colnames(tbl) -> List[Str]`	Column name list
`row_names(tbl) -> List[Str]`	Row name list (if set)

# Example: summarize variant counts per chromosome
tsv("variants.tsv")
  |> group_by("chrom")
  |> summarize(count: len, mean_qual: |rows| mean(rows |> map(|r| r.quality)))
  |> write_csv("chrom_summary.csv")

Bio File I/O

Read and write standard bioinformatics file formats. Readers return lazy streams that integrate with pipes; writers flush on completion.

Builtin	Description
`read_fasta(path) -> List[Record]`	Parse FASTA; each record has `id`, `desc`, `seq` fields
`read_fastq(path) -> List[Record]`	Parse FASTQ; each record has `id`, `seq`, `quality`, `length` fields
`read_vcf(path) -> List[Record]`	Parse VCF; each record has `chrom`, `pos`, `ref`, `alt`, `qual`, `info` fields
`read_bed(path) -> List[Record]`	Parse BED; each record has `chrom`, `start`, `end` and optional fields
`read_gff(path) -> List[Record]`	Parse GFF/GTF; each record has `seqid`, `type`, `start`, `end`, `attributes`
`write_fasta(records, path) -> Nil`	Write records to FASTA format
`write_fastq(records, path) -> Nil`	Write records to FASTQ format
`write_bed(records, path) -> Nil`	Write records to BED format

# Example: filter FASTQ reads by quality and write survivors
read_fastq("sample_R1.fastq.gz")
  |> filter(|r| mean_phred(r.quality) > 30)
  |> write_fastq("sample_R1.filtered.fq")

Genomic Intervals

Interval arithmetic for coordinate-based genomic analysis. Intervals carry chrom, start, end, and optional strand and data fields.

Builtin	Description
`interval(chrom, start, end, strand?) -> Interval`	Create a genomic interval
`interval_tree(intervals) -> IntervalTree`	Build an index for fast overlap queries
`query_overlaps(tree, query) -> List[Interval]`	All intervals overlapping the query
`query_nearest(tree, query, k?) -> List[Interval]`	K nearest intervals to the query
`coverage(intervals) -> List[Record]`	Per-base or per-region coverage depth
`merge_intervals(intervals, dist?) -> List[Interval]`	Merge overlapping or nearby intervals
`intersect(a, b) -> List[Interval]`	Pairwise intersection of two interval sets
`subtract(a, b) -> List[Interval]`	Regions in `a` not covered by `b`

# Example: find promoter-peak overlaps
let promoters = read_bed("examples/sample-data/promoters.bed") |> map(|r| interval(r.chrom, r.start, r.end))
let peaks     = read_bed("examples/sample-data/chip_peaks.bed") |> map(|r| interval(r.chrom, r.start, r.end))
let tree      = interval_tree(peaks)
let hits      = promoters |> flat_map(|p| query_overlaps(tree, p))
print("Found " + str(len(hits)) + " promoter-peak overlaps")

Variants

Builtins for working with genetic variant records. Variant objects carry chrom, pos, ref, alt, qual, and info fields.

Builtin	Description
`variant(chrom, pos, ref, alt) -> Variant`	Construct a variant record
`is_snp(v) -> Bool`	True if single-nucleotide polymorphism
`is_indel(v) -> Bool`	True if insertion or deletion
`is_transition(v) -> Bool`	True if purine-purine or pyrimidine-pyrimidine substitution
`is_transversion(v) -> Bool`	True if purine-pyrimidine substitution
`variant_type(v) -> Str`	Classification string: “snp”, “ins”, “del”, “mnv”, “complex”
`is_het(v) -> Bool`	True if heterozygous genotype
`is_hom_ref(v) -> Bool`	True if homozygous reference
`is_hom_alt(v) -> Bool`	True if homozygous alternate
`is_multiallelic(v) -> Bool`	True if more than one alt allele
`parse_vcf_info(info_str) -> Record`	Parse a VCF INFO field string into a record
`variant_summary(variants) -> Record`	Aggregate counts of SNPs, indels, Ti/Tv ratio, het/hom ratio

# Example: compute Ti/Tv ratio for a VCF
let vars = read_vcf("examples/sample-data/calls.vcf") |> filter(|v| v.qual > 30)
let summary = variant_summary(vars)
print("Ti/Tv = " + str(summary.ti_tv_ratio) + ", SNPs = " + str(summary.snp_count))

Statistics

Statistical functions for quality control, expression analysis, and hypothesis testing.

Builtin	Description
`mean(xs) -> Float`	Arithmetic mean
`median(xs) -> Float`	Median value
`stdev(xs) -> Float`	Sample standard deviation
`variance(xs) -> Float`	Sample variance
`quantile(xs, q) -> Float`	Quantile at fraction `q` (0.0 – 1.0)
`min(xs) -> Num`	Minimum value
`max(xs) -> Num`	Maximum value
`sum(xs) -> Num`	Sum of all elements
`cor(xs, ys) -> Float`	Pearson correlation coefficient
`ttest(xs, ys) -> Record`	Two-sample t-test; returns `{statistic, pvalue}`
`chi_square(observed, expected) -> Record`	Chi-squared test; returns `{statistic, pvalue, df}`
`wilcoxon(xs, ys) -> Record`	Wilcoxon rank-sum test
`anova(groups) -> Record`	One-way ANOVA across groups
`fisher_exact(table) -> Record`	Fisher’s exact test on a 2x2 contingency table
`p_adjust(pvals, method?) -> List[Float]`	Multiple testing correction (default: Benjamini-Hochberg)
`lm(xs, ys) -> Record`	Simple linear regression; returns `{slope, intercept, r_squared}`
`ks_test(xs, ys) -> Record`	Kolmogorov-Smirnov test
`mean_phred(quals) -> Float`	Mean Phred quality score from a quality string

# Example: differential expression significance
let control   = [5.2, 4.8, 5.1, 4.9]
let treatment = [8.1, 7.5, 8.3, 7.9]
let result    = ttest(control, treatment)
print("p-value = " + str(result.pvalue))

Linear Algebra

Matrix operations for expression matrices, PCA, distance calculations, and numerical biology.

Builtin	Description
`matrix(data) -> Matrix`	Create a matrix from a list of lists (row-major)
`transpose(m) -> Matrix`	Transpose rows and columns
`mat_mul(a, b) -> Matrix`	Matrix multiplication
`determinant(m) -> Float`	Determinant of a square matrix
`inverse(m) -> Matrix`	Matrix inverse
`eigenvalues(m) -> List[Float]`	Eigenvalues of a square matrix
`svd(m) -> Record`	Singular value decomposition; returns `{u, s, vt}`
`solve(a, b) -> Matrix`	Solve the linear system Ax = b
`trace(m) -> Float`	Sum of diagonal elements
`norm(m, p?) -> Float`	Matrix or vector norm (default: Frobenius / L2)
`rank(m) -> Int`	Numerical rank
`eye(n) -> Matrix`	n x n identity matrix
`zeros(rows, cols) -> Matrix`	Matrix of zeros
`ones(rows, cols) -> Matrix`	Matrix of ones
`diag(values) -> Matrix`	Diagonal matrix from a list of values
`mat_map(m, fn) -> Matrix`	Apply `fn` element-wise

# Example: PCA on a gene expression matrix
let expr = tsv("examples/sample-data/counts.tsv") |> table()
let m    = matrix(expr |> select("gene_a", "gene_b", "gene_c"))
let decomp = svd(m)
print("Top 3 singular values: " + str(head(decomp.s, 3)))

Math

Standard mathematical functions available for scoring, normalization, and modeling.

Builtin	Description
`abs(x) -> Num`	Absolute value
`ceil(x) -> Int`	Round up to nearest integer
`floor(x) -> Int`	Round down to nearest integer
`round(x, digits?) -> Float`	Round to `digits` decimal places (default: 0)
`sqrt(x) -> Float`	Square root
`log(x) -> Float`	Natural logarithm
`log2(x) -> Float`	Base-2 logarithm (common in fold-change analysis)
`log10(x) -> Float`	Base-10 logarithm
`exp(x) -> Float`	Euler’s number raised to `x`
`pow(base, exp) -> Float`	Exponentiation
`sin(x) -> Float`	Sine
`cos(x) -> Float`	Cosine
`tan(x) -> Float`	Tangent
`ode_solve(fn, y0, t_span, dt?) -> List[Record]`	Numerical ODE integration (Runge-Kutta)

# Example: log2 fold-change between conditions
let control   = 12.5
let treatment = 50.0
let lfc = log2(treatment / control)
print("Log2 fold-change = " + str(lfc))

String Operations

Text manipulation for parsing identifiers, annotations, and formatted output.

Builtin	Description
`split(s, delim) -> List[Str]`	Split string on delimiter
`join(list, delim) -> Str`	Join list elements into a string
`trim(s) -> Str`	Strip leading and trailing whitespace
`upper(s) -> Str`	Convert to uppercase
`lower(s) -> Str`	Convert to lowercase
`starts_with(s, prefix) -> Bool`	True if `s` begins with `prefix`
`ends_with(s, suffix) -> Bool`	True if `s` ends with `suffix`
`replace(s, from, to) -> Str`	Replace all occurrences
`regex_match(s, pattern) -> Bool`	True if regex `pattern` matches
`format(template, ...args) -> Str`	Printf-style formatting

BioLang also supports f-strings for inline interpolation:

# Example: parse a FASTA header
let header = ">sp|P12345|MYG_HUMAN Myoglobin OS=Homo sapiens"
let parts  = split(header, "|")
let acc    = parts[1]
print("Accession: " + acc)

Type Operations

Runtime type inspection and conversion – useful for dynamic dispatch in pipelines that handle mixed bio types.

Builtin	Description
`type(val) -> Str`	Runtime type name as a string
`is_dna(val) -> Bool`	True if val is a DNA sequence
`is_rna(val) -> Bool`	True if val is an RNA sequence
`is_protein(val) -> Bool`	True if val is a protein sequence
`is_interval(val) -> Bool`	True if val is a genomic interval
`is_variant(val) -> Bool`	True if val is a variant record
`is_record(val) -> Bool`	True if val is a record
`is_list(val) -> Bool`	True if val is a list
`is_table(val) -> Bool`	True if val is a table
`is_nil(val) -> Bool`	True if val is nil
`is_int(val) -> Bool`	True if val is an integer
`is_float(val) -> Bool`	True if val is a float
`is_str(val) -> Bool`	True if val is a string
`is_bool(val) -> Bool`	True if val is a boolean
`into(val, target_type) -> Any`	Convert between compatible types

# Example: route processing based on sequence type
let seq = read_fasta("input.fa") |> first() |> |r| r.seq
if is_dna(seq) then
  print("DNA, GC = " + str(gc_content(seq)))
else if is_protein(seq) then
  print("Protein, length = " + str(len(seq)))

Bio APIs

Remote database queries for annotation enrichment. All API builtins are async-aware and return structured records.

Builtin	Description
`ncbi_search(db, query, max?) -> List[Str]`	Search NCBI Entrez databases (returns ID list)
`ncbi_gene(symbol, max?) -> Record or List[Str]`	Gene lookup: Record if single match, else ID list
`ncbi_sequence(acc) -> Str`	Fetch sequence by accession as FASTA text
`ensembl_gene(ensembl_id) -> Record`	Ensembl gene lookup by Ensembl ID
`ensembl_symbol(species, symbol) -> Record`	Ensembl gene lookup by species and symbol
`ensembl_vep(variants) -> List[Record]`	Variant Effect Predictor annotation
`uniprot_search(query, max?) -> List[Record]`	Search UniProt by keyword or accession
`uniprot_entry(acc) -> Record`	Full UniProt entry
`ucsc_sequence(genome, chrom, start, end) -> Dna`	Fetch genomic sequence from UCSC DAS
`kegg_get(entry) -> Record`	Retrieve a KEGG database entry
`kegg_find(db, query) -> List[Record]`	Search KEGG databases
`string_network(proteins, species) -> List[Record]`	STRING interactions: {protein_a, protein_b, score}
`pdb_entry(pdb_id) -> Record`	Fetch PDB structure metadata
`reactome_pathways(gene) -> List[Record]`	Reactome pathway memberships for a gene
`go_term(go_id) -> Record`	Gene Ontology term details
`go_annotations(gene, species?) -> List[Record]`	GO annotations for a gene
`cosmic_gene(symbol) -> Record`	COSMIC cancer gene census entry
`datasets_gene(symbol, taxon?) -> Record`	NCBI Datasets gene data
`biomart_query(dataset, attributes, filters?) -> Table`	BioMart query returning a table
`nfcore_list(sort_by?, limit?) -> List[Record]`	List nf-core pipelines
`nfcore_search(query, limit?) -> List[Record]`	Search nf-core pipelines by name/topic
`nfcore_info(name) -> Record`	Detailed nf-core pipeline metadata
`nfcore_releases(name) -> List[Record]`	Release history for an nf-core pipeline
`nfcore_params(name) -> Record`	Parameter schema for an nf-core pipeline
`biocontainers_search(query, limit?) -> List[Record]`	Search BioContainers registry
`biocontainers_popular(limit?) -> List[Record]`	Popular BioContainers tools
`biocontainers_info(tool) -> Record`	Detailed tool info with versions
`biocontainers_versions(tool) -> List[Record]`	All versions with container image URIs
`galaxy_search(query, limit?) -> List[Record]`	Search Galaxy ToolShed repositories
`galaxy_popular(limit?) -> List[Record]`	Popular Galaxy ToolShed tools
`galaxy_categories() -> List[Record]`	Galaxy ToolShed tool categories
`galaxy_tool(owner, name) -> Record`	Galaxy ToolShed repository details
`nf_parse(path) -> Record`	Parse a Nextflow .nf file into a structured Record
`nf_to_bl(record) -> Str`	Generate BioLang pipeline code from parsed Nextflow
`galaxy_to_bl(record) -> Str`	Generate BioLang pipeline code from Galaxy workflow
`api_endpoints() -> Record`	Show current API endpoint URLs

# Example: annotate a gene list with pathway data
let genes = ["BRCA1", "TP53", "EGFR"]
genes |> each(|g| {
  let pathways = reactome_pathways(g)
  print(g + ": " + str(len(pathways)) + " pathways")
})

Utility

General-purpose helpers for debugging, timing, unit conversion, and serialization.

Builtin	Description
`print(val) -> Nil`	Print a value followed by a newline
`assert(cond, msg?) -> Nil`	Panic with `msg` if `cond` is false
`sleep(ms) -> Nil`	Pause execution for `ms` milliseconds
`now() -> Float`	Current timestamp in seconds since epoch
`now() - start`	Seconds elapsed since `start` (use `now()` for both timestamps)
`bp(n) -> Int`	Identity; documents that `n` is in base pairs
`kb(n) -> Int`	Convert kilobases to base pairs (`n * 1000`)
`mb(n) -> Int`	Convert megabases to base pairs (`n * 1_000_000`)
`gb(n) -> Int`	Convert gigabases to base pairs (`n * 1_000_000_000`)
`json_stringify(val) -> Str`	Serialize any value to a JSON string
`json_parse(s) -> Any`	Parse a JSON string into a BioLang value
`env(name) -> Str \| Nil`	Read an environment variable
`exit(code?) -> Never`	Terminate the process with an exit code (default: 0)

# Example: time a heavy operation
let t0 = now()
let result = read_fasta("genome.fa")
  |> flat_map(|r| find_orfs(r.seq, 300))
print("Found " + str(len(result)) + " ORFs in " + str(now() - t0) + "s")

Keyboard shortcuts

The BioLang Book