A DSL purpose-built for genomics and bioinformatics — native DNA/RNA/protein types, 750+ built-in functions, streaming I/O, 15 bio API clients, and Rust performance with a clean, pipe-first syntax.
# Stream FASTQ — constant memory
read_fastq("data/reads.fastq")
|> filter(|r| mean_phred(r.quality) >= 30)
|> map(|r| r.id)
|> each(|id| println(id))
# Native DNA literal + operations
let seq = dna"ATCGATCGATCG"
println(gc_content(seq)) # 0.5
println(reverse_complement(seq)) # DNA(CGATCGATCGAT)
# Query NCBI for gene info
println("Fetching BRCA1 from NCBI...")
ncbi_gene("BRCA1") |> println()
General-purpose languages like Python and R require stitching together dozens of packages to do bioinformatics. BioLang is a DSL — every design decision, from the type system to the syntax, is made for genomics workflows. Here's what that means in practice:
pip install, no import boilerplate.
dna"ATCG", Interval, Variant, AlignedRead, Quality are first-class values with built-in methods, not strings pretending to be sequences.
reads |> filter |> map |> summarize. Data flows through a pipeline, just like the biological workflow it models.
BioLang is not a general-purpose language, and that's the point. It does one thing — bioinformatics scripting — and does it well.
Analyze sequences, variants, and expression data without learning a programming language first.
The fastest on-ramp to bioinformatics. Write your first analysis in minutes, not days of environment setup.
Quick one-off analyses without spinning up a Jupyter notebook. One command, instant results.
No conda environments, no dependency conflicts, no wheel compilation failures. Single binary, works everywhere.
BioLang isn't here to replace your existing tools — it's the fastest path from raw data to first result. Start here, then grow into Python or R when your analysis demands it.
Everything you need from read to result — batteries included.
Chain operations naturally with |>. No nested function calls, no temp variables. Data flows left to right.
First-class dna"...", rna"...", protein"..." literals with built-in methods for complement, translate, GC content.
Statistics, tables, matrices, file I/O (FASTA/FASTQ/VCF/BAM/BED/GFF), plotting, k-mers, alignment, motifs — all built in.
Process multi-GB FASTQ/BAM files without loading into memory. Lazy streams + pipes = constant memory usage.
NCBI, Ensembl, UniProt, UCSC, KEGG, STRING, PDB, Reactome, GO, COSMIC, BioMart, nf-core, BioContainers, Galaxy ToolShed, NCBI Datasets — query any database in one line.
Native Rust I/O via noodles. Up to 7.1x faster on ENCODE overlap, 7.0x on protein k-mers, 6.7x on FASTA parsing, 3.2x on k-mer counting, 50–70% fewer lines of code. See benchmarks →
BioLang runs right here via WebAssembly. Click Run on any example below — no install needed.
First click downloads the runtime (~4 MB), then it's cached for the session. Every code block across the docs is interactive too.
let seq = dna"ATCGATCGATCG"
println(f"GC content: {gc_content(seq)}")
println(f"Complement: {complement(seq)}")
println(f"Rev-comp: {reverse_complement(seq)}")
println(f"Transcribe: {transcribe(seq)}")
println(f"Length: {seq_len(seq)} bp")
let coding = dna"ATGAAAGCTTTTGACTGA"
let prot = translate(coding)
println(f"Protein: {prot}")
let seq = dna"ATCGATCGATCG"
let kmer_list = kmers(seq, 4)
println(f"4-mers: {kmer_list}")
let normal = [5.2, 4.8, 5.1, 4.9, 5.3]
let tumor = [8.1, 7.9, 8.5, 7.6, 8.3]
let result = ttest(normal, tumor)
println(f"t = {round(result.statistic, 3)}")
println(f"p = {result.p_value}")
println(f"Significant: {result.p_value < 0.05}")
# Pipe-first: data flows left to right
let genes = ["BRCA1", "TP53", "EGFR", "KRAS"]
genes
|> filter(|g| len(g) <= 4)
|> map(|g| f"{g} ({len(g)} chars)")
|> each(|g| println(g))
Same task, less code, more clarity.
from Bio import SeqIO
import pandas as pd
records = []
for rec in SeqIO.parse("reads.fq", "fastq"):
quals = rec.letter_annotations[
"phred_quality"
]
if sum(quals)/len(quals) >= 30:
gc = (rec.seq.count("G")
+ rec.seq.count("C")) \
/ len(rec.seq)
records.append({"id": rec.id,
"gc": gc})
df = pd.DataFrame(records)
print(df.describe())
read_fastq("data/reads.fastq")
|> filter(|r| mean_phred(r.quality) >= 30)
|> each(|r| println(f"{r.id}: len={r.length}"))
30 bioinformatics tasks on real-world data (NCBI, UniProt, ClinVar, ENCODE). Correctness validated on both synthetic and real biological data (E. coli, yeast, ClinVar) against BioPython and Bioconductor.
| Task | BioLang | Python | R | Speedup |
|---|---|---|---|---|
| ENCODE Peak Overlap | 0.363s | 2.574s | — | 7.1x |
| Protein K-mers | 0.191s | 1.331s | 1.298s | 7.0x |
| FASTA Parse (30 KB) | 0.138s | 0.926s | 1.243s | 6.7x |
| E. coli Genome | 0.176s | 1.081s | 1.354s | 6.1x |
| GC Content (51 MB) | 0.830s | 2.771s | 2.358s | 3.3x |
| K-mer Counting (21-mers) | 6.551s | 21.01s | — | 3.2x |
Linux (WSL2) — Intel i9-12900K, 16 GB RAM. Python wins on VCF/CSV text parsing where C extensions dominate. K-mer counting uses canonical (strand-agnostic) 21-mers — BioLang does strictly more work.
No installation, no server, no uploads. BioLang compiles to WebAssembly so you can analyze bioinformatics data entirely client-side. All tools work offline as installable PWAs.
Write and execute BioLang code blocks with persistent state, inline SVG charts, and syntax highlighting. Great for experimenting and learning.
Drop FASTA, FASTQ, VCF, BED, GFF, CSV files for instant parsing, statistics (N50, GC%, Q30, Ti/Tv), sortable tables, column filters, multi-format export, URL loading, and BioLang analysis. Data never leaves your machine.
Auto-detects genes, variants, accessions, and species on any webpage. Click any entity for instant details from NCBI, UniProt, gnomAD, and ClinVar. Chrome sidebar extension + PWA.
Personal literature monitor. Watch genes, drugs, and variants across PubMed and bioRxiv. Signal scoring ranks papers by relevance. Background checks, co-mention detection, and weekly digest. Chrome sidebar extension + PWA.
Open-source books covering the language and applied bioinformatics. Read online or download the PDF.
From zero to bioinformatician. Biology + BioLang + Python + R taught together with real datasets from NCBI, ClinVar, TCGA, and UniProt.
From zero to biostatistician. t-tests, ANOVA, regression, survival analysis, PCA, clustering, Bayesian, and GWAS with BioLang, Python, and R.
Comprehensive reference for every language feature: syntax, types, pipes, tables, streams, file I/O, bio APIs, statistics, and visualization.
Single binary, no runtime dependencies.