A DSL purpose-built for genomics and bioinformatics — native DNA/RNA/protein types, 290+ built-in functions, streaming I/O, 16 bio API clients, and Rust performance with a clean, pipe-first syntax.
# Load FASTQ, filter, and analyze
let reads = fastq("sample.fq.gz")
reads
|> filter(|r| mean_phred(r.quality) >= 30)
|> map(|r| r.seq)
|> kmer_count(21)
|> sort(|a, b| b.count - a.count)
|> head(10)
|> bar_chart("Top 21-mers")
# Native DNA literal + operations
let seq = dna"ATCGATCGATCG"
let gc = gc_content(seq) # 0.5
let rc = reverse_complement(seq)
# Query NCBI in one line
ncbi_gene("BRCA1")
|> print()
General-purpose languages like Python and R require stitching together dozens of packages to do bioinformatics. BioLang is a DSL — every design decision, from the type system to the syntax, is made for genomics workflows. Here's what that means in practice:
pip install, no import boilerplate.
dna"ATCG", Interval, Variant, AlignedRead, Quality are first-class values with built-in methods, not strings pretending to be sequences.
reads |> filter |> map |> summarize. Data flows through a pipeline, just like the biological workflow it models.
BioLang is not a general-purpose language, and that's the point. It does one thing — bioinformatics scripting — and does it well.
Everything you need from read to result — batteries included.
Chain operations naturally with |>. No nested function calls, no temp variables. Data flows left to right.
First-class dna"...", rna"...", protein"..." literals with built-in methods for complement, translate, GC content.
Statistics, tables, matrices, file I/O (FASTA/FASTQ/VCF/BAM/BED/GFF), plotting, k-mers, alignment, motifs — all built in.
Process multi-GB FASTQ/BAM files without loading into memory. Lazy streams + pipes = constant memory usage.
NCBI, Ensembl, UniProt, UCSC, KEGG, STRING, PDB, Reactome, GO, COSMIC, BioMart, QuickGO, nf-core, BioContainers, Galaxy ToolShed, NCBI Datasets — query any database in one line.
Bytecode compiler + Cranelift JIT. Native Rust I/O via noodles. 5-20x faster than Python for common bioinformatics tasks.
Real bioinformatics tasks, concise code.
let seq = dna"ATCGATCGATCG"
# GC content, k-mer spectrum
gc_content(seq) # 0.5
kmer_count(seq, 3) # {ATC: 3, ...}
seq |> reverse_complement
|> transcribe
|> translate
# VCF → filter → analyze
vcf("variants.vcf.gz")
|> filter(|v| v.quality >= 30)
|> filter(|v| v.filter == "PASS")
|> collect
|> variant_summary
|> print
# Fetch BRCA1 info from NCBI
let gene = ncbi_gene("BRCA1")
print(gene)
# Get protein from UniProt
let brca1 = uniprot_entry("P38398")
print(brca1.sequence)
print(len(brca1.sequence)) # 1863 aa
Same task, less code, more clarity.
from Bio import SeqIO
import pandas as pd
records = []
for rec in SeqIO.parse("reads.fq", "fastq"):
quals = rec.letter_annotations[
"phred_quality"
]
if sum(quals)/len(quals) >= 30:
gc = (rec.seq.count("G")
+ rec.seq.count("C")) \
/ len(rec.seq)
records.append({"id": rec.id,
"gc": gc})
df = pd.DataFrame(records)
print(df.describe())
fastq("reads.fq")
|> filter(|r| mean_phred(r.quality) >= 30)
|> map(|r| {id: r.id,
gc: gc_content(r.seq)})
|> collect
|> describe
Single binary, no runtime dependencies.