Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix A: Builtin Reference

BioLang ships with a comprehensive standard library of builtins designed for bioinformatics workflows. Every function listed here is available without imports – they are part of the language runtime.


Sequence Operations

Builtins that operate on bio-typed sequences (dna, rna, protein).

BuiltinDescription
complement(seq) -> Dna | RnaWatson-Crick complement of a nucleotide sequence
reverse_complement(seq) -> Dna | RnaReverse complement – the opposing strand
transcribe(seq) -> RnaTranscribe DNA to RNA (T to U)
translate(seq) -> ProteinTranslate an RNA or DNA coding sequence to amino acids
gc_content(seq) -> FloatGC fraction of a nucleotide sequence (0.0 – 1.0)
find_motif(seq, pattern) -> List[Int]All start positions where pattern occurs in seq
hamming_distance(a, b) -> IntNumber of mismatched positions between equal-length sequences
edit_distance(a, b) -> IntEdit distance between two sequences
find_orfs(seq, min_len?) -> List[Record]Open reading frames with start, stop, and frame fields
restriction_sites(seq, enzyme?) -> List[Record]Recognition sites for restriction enzymes
tm(seq) -> FloatMelting temperature estimate for a short oligonucleotide
slice(seq, start, end) -> Dna | Rna | ProteinExtract a subsequence by 0-based coordinates
# Example: quick primer analysis
let primer = dna"ATCGATCGATCG"
let rc     = reverse_complement(primer)
let temp   = tm(primer)
print("Primer Tm = " + str(temp) + "C, reverse complement = " + str(rc))

Collection Operations

General-purpose operations on lists, records, and sets.

BuiltinDescription
len(coll) -> IntNumber of elements in a list, string, or sequence
push(list, item) -> ListAppend an element, returning a new list
pop(list) -> ListRemove the last element, returning a new list
concat(a, b) -> ListConcatenate two lists
flatten(nested) -> ListFlatten one level of nesting
reverse(list) -> ListReverse element order
contains(coll, item) -> BoolTrue if item is present
index_of(list, item) -> Int | NilFirst index of item, or nil
last(list) -> AnyLast element
first(list) -> AnyFirst element
head(list, n) -> ListFirst n elements
tail(list, n) -> ListLast n elements
unique(list) -> ListRemove duplicates, preserving order
zip(a, b) -> ListPair elements from two lists into a list of tuples
enumerate(list) -> ListPair each element with its 0-based index
chunk(list, size) -> List[List]Split into fixed-size sublists
window(list, size) -> List[List]Sliding window of the given size
scan(list, init, fn) -> ListRunning accumulation (like reduce but keeps intermediates)
range(start, end, step?) -> ListInteger range
set(list) -> SetConvert a list to a deduplicated set
keys(record) -> List[Str]Field names of a record
values(record) -> ListField values of a record
has_key(record, key) -> BoolTrue if the record contains the named field
sort_by(list, fn) -> ListSort by a key function
group_by(list, fn) -> RecordGroup elements by a key function into a record of lists
partition(list, fn) -> [List, List]Split into elements that pass and fail a predicate
# Example: enumerate quality-filtered reads
let good_reads = reads
  |> filter(|r| mean_phred(r.quality) > 30)
  |> enumerate()
  |> head(5)

Higher-Order Functions

Functions that accept other functions as arguments – the backbone of BioLang’s pipeline style.

BuiltinDescription
map(coll, fn) -> ListApply fn to every element
filter(coll, fn) -> ListKeep elements where fn returns true
reduce(coll, init, fn) -> AnyFold elements into a single value
sort(coll, fn?) -> ListSort, optionally by comparator
each(coll, fn) -> NilExecute fn for side effects on every element
flat_map(coll, fn) -> ListMap then flatten one level
take_while(coll, fn) -> ListTake leading elements while predicate holds
any(coll, fn) -> BoolTrue if fn returns true for at least one element
all(coll, fn) -> BoolTrue if fn returns true for every element
none(coll, fn) -> BoolTrue if fn returns true for no elements
find(coll, fn) -> Any | NilFirst element satisfying fn
find_index(coll, fn) -> Int | NilIndex of first element satisfying fn
par_map(coll, fn) -> ListParallel map across available cores
par_filter(coll, fn) -> ListParallel filter across available cores
mat_map(matrix, fn) -> MatrixApply fn element-wise to a matrix
try_call(fn, args) -> ResultCall fn with args, capturing errors instead of panicking
# Example: parallel GC content across a genome's chromosomes
let gc_values = chromosomes
  |> par_map(|chr| {name: chr.name, gc: gc_content(chr.seq)})
  |> sort_by(|r| r.gc)

Table Operations

Tabular data manipulation inspired by dataframe semantics – designed for sample sheets, variant tables, and expression matrices.

BuiltinDescription
table(data) -> TableCreate a table from a list of records
select(tbl, ...cols) -> TablePick columns by name
mutate(tbl, name, fn) -> TableAdd or transform a column
summarize(tbl, ...aggs) -> TableAggregate columns (sum, mean, etc.)
group_by(tbl, col) -> GroupedTableGroup rows by a column value (table variant)
join(a, b, on, how?) -> TableJoin two tables on a key column
csv(path) -> TableRead a CSV file into a table
tsv(path) -> TableRead a TSV file into a table
write_csv(tbl, path) -> NilWrite a table to CSV
write_tsv(tbl, path) -> NilWrite a table to TSV
nrow(tbl) -> IntNumber of rows
ncol(tbl) -> IntNumber of columns
colnames(tbl) -> List[Str]Column name list
row_names(tbl) -> List[Str]Row name list (if set)
# Example: summarize variant counts per chromosome
tsv("variants.tsv")
  |> group_by("chrom")
  |> summarize(count: len, mean_qual: |rows| mean(rows |> map(|r| r.quality)))
  |> write_csv("chrom_summary.csv")

Bio File I/O

Read and write standard bioinformatics file formats. Readers return lazy streams that integrate with pipes; writers flush on completion.

BuiltinDescription
read_fasta(path) -> List[Record]Parse FASTA; each record has id, desc, seq fields
read_fastq(path) -> List[Record]Parse FASTQ; each record has id, seq, quality, length fields
read_vcf(path) -> List[Record]Parse VCF; each record has chrom, pos, ref, alt, qual, info fields
read_bed(path) -> List[Record]Parse BED; each record has chrom, start, end and optional fields
read_gff(path) -> List[Record]Parse GFF/GTF; each record has seqid, type, start, end, attributes
write_fasta(records, path) -> NilWrite records to FASTA format
write_fastq(records, path) -> NilWrite records to FASTQ format
write_bed(records, path) -> NilWrite records to BED format
# Example: filter FASTQ reads by quality and write survivors
read_fastq("sample_R1.fastq.gz")
  |> filter(|r| mean_phred(r.quality) > 30)
  |> write_fastq("sample_R1.filtered.fq")

Genomic Intervals

Interval arithmetic for coordinate-based genomic analysis. Intervals carry chrom, start, end, and optional strand and data fields.

BuiltinDescription
interval(chrom, start, end, strand?) -> IntervalCreate a genomic interval
interval_tree(intervals) -> IntervalTreeBuild an index for fast overlap queries
query_overlaps(tree, query) -> List[Interval]All intervals overlapping the query
query_nearest(tree, query, k?) -> List[Interval]K nearest intervals to the query
coverage(intervals) -> List[Record]Per-base or per-region coverage depth
merge_intervals(intervals, dist?) -> List[Interval]Merge overlapping or nearby intervals
intersect(a, b) -> List[Interval]Pairwise intersection of two interval sets
subtract(a, b) -> List[Interval]Regions in a not covered by b
# Example: find promoter-peak overlaps
let promoters = read_bed("examples/sample-data/promoters.bed") |> map(|r| interval(r.chrom, r.start, r.end))
let peaks     = read_bed("examples/sample-data/chip_peaks.bed") |> map(|r| interval(r.chrom, r.start, r.end))
let tree      = interval_tree(peaks)
let hits      = promoters |> flat_map(|p| query_overlaps(tree, p))
print("Found " + str(len(hits)) + " promoter-peak overlaps")

Variants

Builtins for working with genetic variant records. Variant objects carry chrom, pos, ref, alt, qual, and info fields.

BuiltinDescription
variant(chrom, pos, ref, alt) -> VariantConstruct a variant record
is_snp(v) -> BoolTrue if single-nucleotide polymorphism
is_indel(v) -> BoolTrue if insertion or deletion
is_transition(v) -> BoolTrue if purine-purine or pyrimidine-pyrimidine substitution
is_transversion(v) -> BoolTrue if purine-pyrimidine substitution
variant_type(v) -> StrClassification string: “snp”, “ins”, “del”, “mnv”, “complex”
is_het(v) -> BoolTrue if heterozygous genotype
is_hom_ref(v) -> BoolTrue if homozygous reference
is_hom_alt(v) -> BoolTrue if homozygous alternate
is_multiallelic(v) -> BoolTrue if more than one alt allele
parse_vcf_info(info_str) -> RecordParse a VCF INFO field string into a record
variant_summary(variants) -> RecordAggregate counts of SNPs, indels, Ti/Tv ratio, het/hom ratio
# Example: compute Ti/Tv ratio for a VCF
let vars = read_vcf("examples/sample-data/calls.vcf") |> filter(|v| v.qual > 30)
let summary = variant_summary(vars)
print("Ti/Tv = " + str(summary.ti_tv_ratio) + ", SNPs = " + str(summary.snp_count))

Statistics

Statistical functions for quality control, expression analysis, and hypothesis testing.

BuiltinDescription
mean(xs) -> FloatArithmetic mean
median(xs) -> FloatMedian value
stdev(xs) -> FloatSample standard deviation
variance(xs) -> FloatSample variance
quantile(xs, q) -> FloatQuantile at fraction q (0.0 – 1.0)
min(xs) -> NumMinimum value
max(xs) -> NumMaximum value
sum(xs) -> NumSum of all elements
cor(xs, ys) -> FloatPearson correlation coefficient
ttest(xs, ys) -> RecordTwo-sample t-test; returns {statistic, pvalue}
chi_square(observed, expected) -> RecordChi-squared test; returns {statistic, pvalue, df}
wilcoxon(xs, ys) -> RecordWilcoxon rank-sum test
anova(groups) -> RecordOne-way ANOVA across groups
fisher_exact(table) -> RecordFisher’s exact test on a 2x2 contingency table
p_adjust(pvals, method?) -> List[Float]Multiple testing correction (default: Benjamini-Hochberg)
lm(xs, ys) -> RecordSimple linear regression; returns {slope, intercept, r_squared}
ks_test(xs, ys) -> RecordKolmogorov-Smirnov test
mean_phred(quals) -> FloatMean Phred quality score from a quality string
# Example: differential expression significance
let control   = [5.2, 4.8, 5.1, 4.9]
let treatment = [8.1, 7.5, 8.3, 7.9]
let result    = ttest(control, treatment)
print("p-value = " + str(result.pvalue))

Linear Algebra

Matrix operations for expression matrices, PCA, distance calculations, and numerical biology.

BuiltinDescription
matrix(data) -> MatrixCreate a matrix from a list of lists (row-major)
transpose(m) -> MatrixTranspose rows and columns
mat_mul(a, b) -> MatrixMatrix multiplication
determinant(m) -> FloatDeterminant of a square matrix
inverse(m) -> MatrixMatrix inverse
eigenvalues(m) -> List[Float]Eigenvalues of a square matrix
svd(m) -> RecordSingular value decomposition; returns {u, s, vt}
solve(a, b) -> MatrixSolve the linear system Ax = b
trace(m) -> FloatSum of diagonal elements
norm(m, p?) -> FloatMatrix or vector norm (default: Frobenius / L2)
rank(m) -> IntNumerical rank
eye(n) -> Matrixn x n identity matrix
zeros(rows, cols) -> MatrixMatrix of zeros
ones(rows, cols) -> MatrixMatrix of ones
diag(values) -> MatrixDiagonal matrix from a list of values
mat_map(m, fn) -> MatrixApply fn element-wise
# Example: PCA on a gene expression matrix
let expr = tsv("examples/sample-data/counts.tsv") |> table()
let m    = matrix(expr |> select("gene_a", "gene_b", "gene_c"))
let decomp = svd(m)
print("Top 3 singular values: " + str(head(decomp.s, 3)))

Math

Standard mathematical functions available for scoring, normalization, and modeling.

BuiltinDescription
abs(x) -> NumAbsolute value
ceil(x) -> IntRound up to nearest integer
floor(x) -> IntRound down to nearest integer
round(x, digits?) -> FloatRound to digits decimal places (default: 0)
sqrt(x) -> FloatSquare root
log(x) -> FloatNatural logarithm
log2(x) -> FloatBase-2 logarithm (common in fold-change analysis)
log10(x) -> FloatBase-10 logarithm
exp(x) -> FloatEuler’s number raised to x
pow(base, exp) -> FloatExponentiation
sin(x) -> FloatSine
cos(x) -> FloatCosine
tan(x) -> FloatTangent
ode_solve(fn, y0, t_span, dt?) -> List[Record]Numerical ODE integration (Runge-Kutta)
# Example: log2 fold-change between conditions
let control   = 12.5
let treatment = 50.0
let lfc = log2(treatment / control)
print("Log2 fold-change = " + str(lfc))

String Operations

Text manipulation for parsing identifiers, annotations, and formatted output.

BuiltinDescription
split(s, delim) -> List[Str]Split string on delimiter
join(list, delim) -> StrJoin list elements into a string
trim(s) -> StrStrip leading and trailing whitespace
upper(s) -> StrConvert to uppercase
lower(s) -> StrConvert to lowercase
starts_with(s, prefix) -> BoolTrue if s begins with prefix
ends_with(s, suffix) -> BoolTrue if s ends with suffix
replace(s, from, to) -> StrReplace all occurrences
regex_match(s, pattern) -> BoolTrue if regex pattern matches
format(template, ...args) -> StrPrintf-style formatting

BioLang also supports f-strings for inline interpolation:

# Example: parse a FASTA header
let header = ">sp|P12345|MYG_HUMAN Myoglobin OS=Homo sapiens"
let parts  = split(header, "|")
let acc    = parts[1]
print("Accession: " + acc)

Type Operations

Runtime type inspection and conversion – useful for dynamic dispatch in pipelines that handle mixed bio types.

BuiltinDescription
type(val) -> StrRuntime type name as a string
is_dna(val) -> BoolTrue if val is a DNA sequence
is_rna(val) -> BoolTrue if val is an RNA sequence
is_protein(val) -> BoolTrue if val is a protein sequence
is_interval(val) -> BoolTrue if val is a genomic interval
is_variant(val) -> BoolTrue if val is a variant record
is_record(val) -> BoolTrue if val is a record
is_list(val) -> BoolTrue if val is a list
is_table(val) -> BoolTrue if val is a table
is_nil(val) -> BoolTrue if val is nil
is_int(val) -> BoolTrue if val is an integer
is_float(val) -> BoolTrue if val is a float
is_str(val) -> BoolTrue if val is a string
is_bool(val) -> BoolTrue if val is a boolean
into(val, target_type) -> AnyConvert between compatible types
# Example: route processing based on sequence type
let seq = read_fasta("input.fa") |> first() |> |r| r.seq
if is_dna(seq) then
  print("DNA, GC = " + str(gc_content(seq)))
else if is_protein(seq) then
  print("Protein, length = " + str(len(seq)))

Bio APIs

Remote database queries for annotation enrichment. All API builtins are async-aware and return structured records.

BuiltinDescription
ncbi_search(db, query, max?) -> List[Str]Search NCBI Entrez databases (returns ID list)
ncbi_gene(symbol, max?) -> Record or List[Str]Gene lookup: Record if single match, else ID list
ncbi_sequence(acc) -> StrFetch sequence by accession as FASTA text
ensembl_gene(ensembl_id) -> RecordEnsembl gene lookup by Ensembl ID
ensembl_symbol(species, symbol) -> RecordEnsembl gene lookup by species and symbol
ensembl_vep(variants) -> List[Record]Variant Effect Predictor annotation
uniprot_search(query, max?) -> List[Record]Search UniProt by keyword or accession
uniprot_entry(acc) -> RecordFull UniProt entry
ucsc_sequence(genome, chrom, start, end) -> DnaFetch genomic sequence from UCSC DAS
kegg_get(entry) -> RecordRetrieve a KEGG database entry
kegg_find(db, query) -> List[Record]Search KEGG databases
string_network(proteins, species) -> List[Record]STRING interactions: {protein_a, protein_b, score}
pdb_entry(pdb_id) -> RecordFetch PDB structure metadata
reactome_pathways(gene) -> List[Record]Reactome pathway memberships for a gene
go_term(go_id) -> RecordGene Ontology term details
go_annotations(gene, species?) -> List[Record]GO annotations for a gene
cosmic_gene(symbol) -> RecordCOSMIC cancer gene census entry
datasets_gene(symbol, taxon?) -> RecordNCBI Datasets gene data
biomart_query(dataset, attributes, filters?) -> TableBioMart query returning a table
nfcore_list(sort_by?, limit?) -> List[Record]List nf-core pipelines
nfcore_search(query, limit?) -> List[Record]Search nf-core pipelines by name/topic
nfcore_info(name) -> RecordDetailed nf-core pipeline metadata
nfcore_releases(name) -> List[Record]Release history for an nf-core pipeline
nfcore_params(name) -> RecordParameter schema for an nf-core pipeline
biocontainers_search(query, limit?) -> List[Record]Search BioContainers registry
biocontainers_popular(limit?) -> List[Record]Popular BioContainers tools
biocontainers_info(tool) -> RecordDetailed tool info with versions
biocontainers_versions(tool) -> List[Record]All versions with container image URIs
galaxy_search(query, limit?) -> List[Record]Search Galaxy ToolShed repositories
galaxy_popular(limit?) -> List[Record]Popular Galaxy ToolShed tools
galaxy_categories() -> List[Record]Galaxy ToolShed tool categories
galaxy_tool(owner, name) -> RecordGalaxy ToolShed repository details
nf_parse(path) -> RecordParse a Nextflow .nf file into a structured Record
nf_to_bl(record) -> StrGenerate BioLang pipeline code from parsed Nextflow
galaxy_to_bl(record) -> StrGenerate BioLang pipeline code from Galaxy workflow
api_endpoints() -> RecordShow current API endpoint URLs
# Example: annotate a gene list with pathway data
let genes = ["BRCA1", "TP53", "EGFR"]
genes |> each(|g| {
  let pathways = reactome_pathways(g)
  print(g + ": " + str(len(pathways)) + " pathways")
})

Utility

General-purpose helpers for debugging, timing, unit conversion, and serialization.

BuiltinDescription
print(val) -> NilPrint a value followed by a newline
assert(cond, msg?) -> NilPanic with msg if cond is false
sleep(ms) -> NilPause execution for ms milliseconds
now() -> FloatCurrent timestamp in seconds since epoch
now() - startSeconds elapsed since start (use now() for both timestamps)
bp(n) -> IntIdentity; documents that n is in base pairs
kb(n) -> IntConvert kilobases to base pairs (n * 1000)
mb(n) -> IntConvert megabases to base pairs (n * 1_000_000)
gb(n) -> IntConvert gigabases to base pairs (n * 1_000_000_000)
json_stringify(val) -> StrSerialize any value to a JSON string
json_parse(s) -> AnyParse a JSON string into a BioLang value
env(name) -> Str | NilRead an environment variable
exit(code?) -> NeverTerminate the process with an exit code (default: 0)
# Example: time a heavy operation
let t0 = now()
let result = read_fasta("genome.fa")
  |> flat_map(|r| find_orfs(r.seq, 300))
print("Found " + str(len(result)) + " ORFs in " + str(now() - t0) + "s")