ATCGATCG
GCTAGCTA
|> filter |> map
v0.3.0 — Now Available

A domain-specific
language for
bioinformatics

A DSL purpose-built for genomics and bioinformatics — native DNA/RNA/protein types, 750+ built-in functions, streaming I/O, 15 bio API clients, and Rust performance with a clean, pipe-first syntax.

Early Preview: BioLang is a new experimental language under active development. While we strive for stability, you may encounter rough edges. If you find issues, please report them on GitHub — your feedback shapes the language.

Why a domain-specific language?

General-purpose languages like Python and R require stitching together dozens of packages to do bioinformatics. BioLang is a DSL — every design decision, from the type system to the syntax, is made for genomics workflows. Here's what that means in practice:

No import ceremony — DNA types, FASTA readers, GC content, k-mers, interval queries, and 750+ functions are available immediately. No pip install, no import boilerplate.
Types that match the domain dna"ATCG", Interval, Variant, AlignedRead, Quality are first-class values with built-in methods, not strings pretending to be sequences.
Pipes match how bioinformatics thinks reads |> filter |> map |> summarize. Data flows through a pipeline, just like the biological workflow it models.
Safe by default — no null pointer exceptions, no silent type coercions. Errors point to the genomic operation that failed, not a stack trace in pandas internals.
Fast without C extensions — compiled to native bytecode via Rust. No NumPy wheel issues, no Cython compilation step. Single binary, runs everywhere.
Streaming by design — process 100 GB FASTQ files in constant memory. Lazy evaluation is the default, not an afterthought bolted onto eager collections.

BioLang is not a general-purpose language, and that's the point. It does one thing — bioinformatics scripting — and does it well.

Who is BioLang for?

🧬
Biologists

Analyze sequences, variants, and expression data without learning a programming language first.

🎓
Students

The fastest on-ramp to bioinformatics. Write your first analysis in minutes, not days of environment setup.

Researchers

Quick one-off analyses without spinning up a Jupyter notebook. One command, instant results.

🔬
Anyone tired of setup

No conda environments, no dependency conflicts, no wheel compilation failures. Single binary, works everywhere.

BioLang isn't here to replace your existing tools — it's the fastest path from raw data to first result. Start here, then grow into Python or R when your analysis demands it.

What's included

Everything you need from read to result — batteries included.

Pipe-First Syntax

Chain operations naturally with |>. No nested function calls, no temp variables. Data flows left to right.

Bio-Native Types

First-class dna"...", rna"...", protein"..." literals with built-in methods for complement, translate, GC content.

750+ Builtins

Statistics, tables, matrices, file I/O (FASTA/FASTQ/VCF/BAM/BED/GFF), plotting, k-mers, alignment, motifs — all built in.

Streaming I/O

Process multi-GB FASTQ/BAM files without loading into memory. Lazy streams + pipes = constant memory usage.

15 API Clients

NCBI, Ensembl, UniProt, UCSC, KEGG, STRING, PDB, Reactome, GO, COSMIC, BioMart, nf-core, BioContainers, Galaxy ToolShed, NCBI Datasets — query any database in one line.

Rust Performance

Native Rust I/O via noodles. Up to 7.1x faster on ENCODE overlap, 7.0x on protein k-mers, 6.7x on FASTA parsing, 3.2x on k-mer counting, 50–70% fewer lines of code. See benchmarks →

Try it in your browser

BioLang runs right here via WebAssembly. Click Run on any example below — no install needed.

First click downloads the runtime (~4 MB), then it's cached for the session. Every code block across the docs is interactive too.

DNA Operations

let seq = dna"ATCGATCGATCG"
println(f"GC content: {gc_content(seq)}")
println(f"Complement: {complement(seq)}")
println(f"Rev-comp:   {reverse_complement(seq)}")
println(f"Transcribe: {transcribe(seq)}")
println(f"Length:     {seq_len(seq)} bp")

Translation & K-mers

let coding = dna"ATGAAAGCTTTTGACTGA"
let prot = translate(coding)
println(f"Protein: {prot}")

let seq = dna"ATCGATCGATCG"
let kmer_list = kmers(seq, 4)
println(f"4-mers: {kmer_list}")

Statistics

let normal = [5.2, 4.8, 5.1, 4.9, 5.3]
let tumor = [8.1, 7.9, 8.5, 7.6, 8.3]
let result = ttest(normal, tumor)
println(f"t = {round(result.statistic, 3)}")
println(f"p = {result.p_value}")
println(f"Significant: {result.p_value < 0.05}")

Pipes & Lambdas

# Pipe-first: data flows left to right
let genes = ["BRCA1", "TP53", "EGFR", "KRAS"]
genes
  |> filter(|g| len(g) <= 4)
  |> map(|g| f"{g} ({len(g)} chars)")
  |> each(|g| println(g))

BioLang vs Python

Same task, less code, more clarity.

Python + BioPython + pandas 14 lines
from Bio import SeqIO
import pandas as pd

records = []
for rec in SeqIO.parse("reads.fq", "fastq"):
    quals = rec.letter_annotations[
        "phred_quality"
    ]
    if sum(quals)/len(quals) >= 30:
        gc = (rec.seq.count("G")
              + rec.seq.count("C")) \
              / len(rec.seq)
        records.append({"id": rec.id,
                        "gc": gc})
df = pd.DataFrame(records)
print(df.describe())
BioLang 5 lines
read_fastq("data/reads.fastq")
  |> filter(|r| mean_phred(r.quality) >= 30)
  |> each(|r| println(f"{r.id}: len={r.length}"))

Benchmarked against BioPython & Bioconductor

30 bioinformatics tasks on real-world data (NCBI, UniProt, ClinVar, ENCODE). Correctness validated on both synthetic and real biological data (E. coli, yeast, ClinVar) against BioPython and Bioconductor.

7.1x
ENCODE Overlap
7.0x
Protein K-mers
6.7x
FASTA Parse
3.2x
K-mer Counting
Task BioLang Python R Speedup
ENCODE Peak Overlap0.363s2.574s7.1x
Protein K-mers0.191s1.331s1.298s7.0x
FASTA Parse (30 KB)0.138s0.926s1.243s6.7x
E. coli Genome0.176s1.081s1.354s6.1x
GC Content (51 MB)0.830s2.771s2.358s3.3x
K-mer Counting (21-mers)6.551s21.01s3.2x

Linux (WSL2) — Intel i9-12900K, 16 GB RAM. Python wins on VCF/CSV text parsing where C extensions dominate. K-mer counting uses canonical (strand-agnostic) 21-mers — BioLang does strictly more work.

750+
Built-in Functions
15
Bio API Clients
15
File Formats
42
Bio Plot Types
Rust
Pure Performance
Browser Tools — Installable PWAs

Everything runs in your browser

No installation, no server, no uploads. BioLang compiles to WebAssembly so you can analyze bioinformatics data entirely client-side. All tools work offline as installable PWAs.

Playground

Run code instantly

Write and execute BioLang code blocks with persistent state, inline SVG charts, and syntax highlighting. Great for experimenting and learning.

🔍

Viewer

Inspect bio files

Drop FASTA, FASTQ, VCF, BED, GFF, CSV files for instant parsing, statistics (N50, GC%, Q30, Ti/Tv), sortable tables, column filters, multi-format export, URL loading, and BioLang analysis. Data never leaves your machine.

🔎

BioGist

Gene intelligence sidebar

Auto-detects genes, variants, accessions, and species on any webpage. Click any entity for instant details from NCBI, UniProt, gnomAD, and ClinVar. Chrome sidebar extension + PWA.

🔬

BioKhoj

Research radar

Personal literature monitor. Watch genes, drugs, and variants across PubMed and bioRxiv. Signal scoring ranks papers by relevance. Background checks, co-mention detection, and weekly digest. Chrome sidebar extension + PWA.

Free books

Open-source books covering the language and applied bioinformatics. Read online or download the PDF.

Get started in seconds

Single binary, no runtime dependencies.

$ curl -fsSL https://lang.bio/install.sh | sh
$ bl repl
BioLang v0.3.0 REPL — type :help for commands
bl> dna"ATCG" |> gc_content()
0.5