ATCGATCG
GCTAGCTA
|> filter |> map
v0.1.0 — Now Available

A domain-specific
language for
bioinformatics

A DSL purpose-built for genomics and bioinformatics — native DNA/RNA/protein types, 290+ built-in functions, streaming I/O, 16 bio API clients, and Rust performance with a clean, pipe-first syntax.

Early Preview: BioLang is a new experimental language under active development. While we strive for stability, you may encounter rough edges. If you find issues, please report them on GitHub — your feedback shapes the language.

Why a domain-specific language?

General-purpose languages like Python and R require stitching together dozens of packages to do bioinformatics. BioLang is a DSL — every design decision, from the type system to the syntax, is made for genomics workflows. Here's what that means in practice:

No import ceremony — DNA types, FASTA readers, GC content, k-mers, interval queries, and 290+ functions are available immediately. No pip install, no import boilerplate.
Types that match the domain dna"ATCG", Interval, Variant, AlignedRead, Quality are first-class values with built-in methods, not strings pretending to be sequences.
Pipes match how bioinformatics thinks reads |> filter |> map |> summarize. Data flows through a pipeline, just like the biological workflow it models.
Safe by default — no null pointer exceptions, no silent type coercions. Errors point to the genomic operation that failed, not a stack trace in pandas internals.
Fast without C extensions — compiled to native bytecode via Rust. No NumPy wheel issues, no Cython compilation step. Single binary, runs everywhere.
Streaming by design — process 100 GB FASTQ files in constant memory. Lazy evaluation is the default, not an afterthought bolted onto eager collections.

BioLang is not a general-purpose language, and that's the point. It does one thing — bioinformatics scripting — and does it well.

What's included

Everything you need from read to result — batteries included.

Pipe-First Syntax

Chain operations naturally with |>. No nested function calls, no temp variables. Data flows left to right.

Bio-Native Types

First-class dna"...", rna"...", protein"..." literals with built-in methods for complement, translate, GC content.

290+ Builtins

Statistics, tables, matrices, file I/O (FASTA/FASTQ/VCF/BAM/BED/GFF), plotting, k-mers, alignment, motifs — all built in.

Streaming I/O

Process multi-GB FASTQ/BAM files without loading into memory. Lazy streams + pipes = constant memory usage.

16 API Clients

NCBI, Ensembl, UniProt, UCSC, KEGG, STRING, PDB, Reactome, GO, COSMIC, BioMart, QuickGO, nf-core, BioContainers, Galaxy ToolShed, NCBI Datasets — query any database in one line.

Rust Performance

Bytecode compiler + Cranelift JIT. Native Rust I/O via noodles. 5-20x faster than Python for common bioinformatics tasks.

See it in action

Real bioinformatics tasks, concise code.

Sequence Analysis
let seq = dna"ATCGATCGATCG"

# GC content, k-mer spectrum
gc_content(seq)            # 0.5
kmer_count(seq, 3)         # {ATC: 3, ...}
seq |> reverse_complement
    |> transcribe
    |> translate
Data Pipeline
# VCF → filter → analyze
vcf("variants.vcf.gz")
  |> filter(|v| v.quality >= 30)
  |> filter(|v| v.filter == "PASS")
  |> collect
  |> variant_summary
  |> print
API Query
# Fetch BRCA1 info from NCBI
let gene = ncbi_gene("BRCA1")
print(gene)

# Get protein from UniProt
let brca1 = uniprot_entry("P38398")
print(brca1.sequence)
print(len(brca1.sequence))  # 1863 aa

BioLang vs Python

Same task, less code, more clarity.

Python + BioPython + pandas 14 lines
from Bio import SeqIO
import pandas as pd

records = []
for rec in SeqIO.parse("reads.fq", "fastq"):
    quals = rec.letter_annotations[
        "phred_quality"
    ]
    if sum(quals)/len(quals) >= 30:
        gc = (rec.seq.count("G")
              + rec.seq.count("C")) \
              / len(rec.seq)
        records.append({"id": rec.id,
                        "gc": gc})
df = pd.DataFrame(records)
print(df.describe())
BioLang 5 lines
fastq("reads.fq")
  |> filter(|r| mean_phred(r.quality) >= 30)
  |> map(|r| {id: r.id,
              gc: gc_content(r.seq)})
  |> collect
  |> describe
290+
Built-in Functions
14
Bio API Clients
15
File Formats
21
Bio Plot Types
Rust
Pure Performance

Get started in seconds

Single binary, no runtime dependencies.

$ cargo install biolang
$ bl repl
BioLang v0.1.0 REPL — type :help for commands
bl> dna"ATCG" |> gc_content()
0.5

The BioLang Book

A comprehensive 17-chapter guide with real bioinformatics examples.

Read the Book