BioLang at a Glance

A domain-specific language for bioinformatics. Purpose-built for genomics workflows — not another general-purpose language with bio libraries bolted on.

750+ builtins 15 bio APIs 15 file formats 42 plot types 6 transfer protocols 3 LLM providers Rust native
quick_taste.bl
# Native DNA types + pipe-first syntax
let seq = dna"ATCGATCGATCG"
gc_content(seq)                # 0.5
seq |> reverse_complement |> translate

# Stream a 50 GB FASTQ in constant memory
fastq("reads.fq.gz")
  |> filter(|r| mean_phred(r.quality) >= 30)
  |> map(|r| gc_content(r.seq))
  |> mean

# Query NCBI + Ensembl in one line
ncbi_gene("BRCA1") |> print()
ensembl_vep("17:g.43092919G>A") |> print()

# Ask an LLM about your data
chat("What pathways involve BRCA1 and TP53?")

Language

SYNTAX Pipe-First Operator

Data flows left-to-right via |>. No nested calls, no temp vars. Use |> into to bind results.

# a |> f(b) = f(a, b)
data |> filter(|x| x > 0) |> mean()
# bind mid-chain
data |> filter(|x| x > 0) |> into clean
BIO Native Bio Types

First-class DNA, RNA, Protein literals — not strings.

dna"ATCGATCG"
rna"AUCGAUCG"
protein"MKTLLILAVS"
TYPES Genomic Value Types

Variant, AlignedRead, Interval, Gene, Quality, Table, Matrix, Stream.

v.is_snp   v.is_transition
read.is_mapped   read.mapq
iv.chrom   iv.start   iv.end
SYNTAX Pattern Matching

Match expressions with destructuring and guards.

match variant_type(v) {
  "missense" => analyze(v),
  "nonsense" => flag_lof(v),
  _ => "benign"
}
TYPES Gradual Typing

Full type inference. Add annotations when you want safety.

x = 42   # inferred Int
s = dna"ATCG"
t = [0.1, 0.9]
ERRORS Result + try/catch

Rust-style Result with ? operator, plus try/catch for scripts.

seq = read_fasta(path)
try { align(reads, ref) }
catch e { print(e.message) }

Bioinformatics

I/O 15 File Formats

Native readers/writers with auto gzip detection.

read_fasta()   read_fastq()
fasta()   fastq()   vcf()
bed()   gff()   sam()   bam()
csv()   tsv()   json()
+ write_fasta/fastq/vcf/bed/gff
SEQ Sequence Operations

All common sequence ops as builtins.

reverse_complement()   transcribe()
translate()   gc_content()
kmer_count()   hamming_distance()
find_motif()   find_orfs()
GENOMICS Interval Operations

Interval trees with O(log n + k) query performance.

interval_tree(regions)
query_overlaps(tree, chrom, start, end)
query_nearest(tree, chrom, start)
coverage(intervals)
VARIANT Variant Analysis

SNP/indel classification, genotype queries, VCF INFO parsing.

is_snp(v)   is_transition(v)
is_het(v)   variant_type(v)
parse_vcf_info(info_str)
strip_chr()   normalize_chrom()
ALIGN Alignment / Reads

AlignedRead type with SAM flag queries and CIGAR parsing.

read.is_mapped   read.mapq
read.is_duplicate
read.aligned_length
flagstat(reads)
COORD Coordinate Handling

Automatic 0-based/1-based conversion. Chromosome normalization.

strip_chr("chr1") # "1"
add_chr("X") # "chrX"
normalize_chrom("chrMT") # "chrM"
# VCF→BED coords auto-handled

Data & Tables

TABLE dplyr-Style Verbs

Built-in table type with familiar data manipulation verbs.

select()   filter()   mutate()
group_by()   summarize()
arrange()   inner_join()   pivot_longer()
STREAM Lazy Streams

Process 100 GB files in constant memory. Lazy by default.

fastq("big.fq.gz")
  |> filter(|r| mean_phred(r.quality) >= 30)
  |> take(1_000_000)
  |> collect()
STATS Statistics

Descriptive stats, hypothesis tests, regression, PCA.

mean()   median()   stdev()
ttest()   cor()   lm()
pca()   p_adjust("BH")
ks_test()   chi_square()
HOF Higher-Order Functions

Functional operations on collections and streams.

map()   filter()   reduce()
flat_map()   zip()   each()
sort_by()   unique()   chunk()
par_map()   par_filter()
MATRIX Matrix Operations

Dense and sparse matrices for expression data.

matrix()   sparse_matrix()
mat_mul()   transpose()
row_means()   col_sums()
PARALLEL Parallel Processing

Multi-threaded map/filter for CPU-bound tasks.

data |> par_map(|r| heavy_compute(r))
data |> par_filter(|r| expensive(r))
# Auto thread pool sizing

Built-in API Clients

NCBI

E-utilities, Datasets v2

Ensembl

Genes, VEP, orthologs

UniProt

Search, entry, GO, ID mapping

UCSC

Genomes, tracks, sequence

KEGG

Pathways, genes, links

STRING

Networks, enrichment

PDB

Entries, chains, sequences, search

Reactome

Pathways, analysis

QuickGO

GO terms, annotations

BioMart

Bulk queries, datasets

COSMIC

Cancer mutations, census

NCBI Datasets

Genes, taxonomy, genomes

nf-core

Pipeline catalog, params, parsing

BioContainers

9,000+ container images

Galaxy ToolShed

Tool repositories, categories

Knowledge Graphs, Enrichment & Literature

GRAPH Knowledge Graphs

Built-in graph data structure for PPI networks, regulatory networks, and pathways.

let g = graph()
let g = add_edge(g, "BRCA1", "TP53")
neighbors(g, "BRCA1")
shortest_path(g, "A", "B")
connected_components(g)
ENRICH Enrichment Analysis

ORA (hypergeometric + BH FDR) and GSEA (permutation-based) with GMT file parsing.

let sets = read_gmt("hallmark.gmt")
enrich(genes, sets, 20000)
gsea(ranked, sets)
# Filter by FDR < 0.05
PDB PDB & PubMed

Fetch PDB structures, chains, sequences. Search PubMed and retrieve abstracts.

pdb_entry("4HHB")
pdb_chains("4HHB")
pdb_sequence("4HHB", 1)
pubmed_search("CRISPR", 10)
pubmed_abstract(pmid)

Visualization — 42 Plot Types

ASCII Terminal Plots

Instant plots in REPL and terminal — no GUI needed.

sparkline(values)
bar_chart(data, opts)
boxplot(groups)
heatmap_ascii(matrix)
dotplot(x, y)   coverage(depths)
SVG SVG Plots

SVG output for papers and reports. Save with save_svg().

plot(data, opts)
histogram(values, opts)
heatmap(matrix, opts)
volcano(table, opts)
ma_plot(table, opts)
BIO 21 Bio-Specific Plots

Domain-specific visualizations built for genomics data.

manhattan()   qq_plot()
ideogram()   circos()
kaplan_meier()   oncoprint()
sequence_logo()   phylo_tree()
sashimi()   hic_map()
STATS Statistical Plots

Statistical and diagnostic visualizations.

violin(data)   density(values)
pca_plot(table)   roc_curve(data)
forest_plot(table)
clustered_heatmap(matrix)
venn(sets)   upset(sets)
SEQ Sequence Viz

Alignment views, quality plots, genome tracks.

alignment_view(reads)
quality_plot(fastq_stats)
genome_track(intervals)
lollipop(mutations)
rainfall(variants)   cnv_plot(data)
EXPORT SVG Export

All SVG plots can be saved to file.

p = volcano(table, opts)
save_svg(p, "figure1.svg")

# All plot functions return SVG
# strings you can save or embed

Data Transfer & Cloud

HTTP HTTP Downloads

Resumable downloads with progress. Proxy support built-in.

download(url, "output.fa.gz")
upload("results.vcf", url)
http_get(url, headers)
http_post(url, body)
FTP FTP & SFTP

Native FTP via suppaftp. SFTP/SCP via system SSH.

ftp_download("ftp://host/file")
ftp_upload("local.fa", "ftp://...")
ftp_list("ftp://host/dir/")
sftp_download("sftp://host/file")
scp("host:file", "local")
S3 AWS S3

Download, upload, list S3 buckets via AWS CLI.

s3_download("s3://bucket/file")
s3_upload("local.bam", "s3://...")
s3_list("s3://bucket/prefix/")
GCS Google Cloud Storage

Download and upload via gsutil.

gcs_download("gs://bucket/file")
gcs_upload("local.vcf", "gs://...")
ASPERA Aspera & SRA

High-speed Aspera transfers for EBI/NCBI. SRA toolkit integration.

aspera_download("era-fasp@...")
sra_prefetch("SRR12345678")
sra_fastq("SRR12345678")
SYNC Rsync & References

Rsync directories. 15 built-in reference genome sources.

rsync(src, dest, {compress: true})
ref_genome("GRCh38")
bio_sources() # list all 15
bio_fetch("dbSNP")

Export & Output Formats

BIO Bio Format Writers

Write all major bioinformatics formats.

write_fasta(seqs, "out.fa")
write_fastq(reads, "out.fq")
write_vcf(variants, "out.vcf")
write_bed(regions, "out.bed")
write_gff(features, "out.gff")
TABLE Tabular Export

Write tables to CSV, TSV, and JSON.

write_csv(table, "results.csv")
write_tsv(table, "results.tsv")
write_json(data, "results.json")

# JSON supports any value type
SVG Plot Export

Save any plot to SVG file. All 36 plot functions return SVG strings.

p = volcano(table, opts)
save_svg(p, "figure1.svg")

m = manhattan(gwas, opts)
save_svg(m, "manhattan.svg")
MD Markdown Export

Convert any value to Markdown. Tables, records, lists auto-formatted.

to_markdown(table) # MD string
write_markdown(data, "report.md")
# List[Record] → auto-table
HTML HTML Reports

Self-contained HTML reports with inline CSS. SVG plots embedded.

to_html(table) # full HTML
write_html(results, "report.html")
# Styled tables, zebra stripes

AI & Tooling

LLM Built-in LLM Chat

Ask questions, generate code, analyze data — from your script.

chat("What does BRCA1 do?")
chat_code("filter VCF by qual")
llm_models() # show provider
PROVIDERS Multi-Provider LLM

Auto-detects from env vars. Zero config for most setups.

ANTHROPIC_API_KEY # Claude
OPENAI_API_KEY # GPT
OLLAMA_MODEL # local, free
LLM_BASE_URL # any compatible
TOOLS Container Tools

Run samtools, bwa, GATK from BioLang via BioContainers.

tool("samtools", "view -c in.bam")
tool_search("bwa")
tool_pull("samtools")
tool_popular()

Developer Experience

CLI REPL

Interactive shell with :time, :type, :env, :profile, tab completion, history.

IDE LSP Server

Diagnostics, completion, hover — works with VS Code, Neovim, Helix.

DOC Literate Notebooks

.bln format: Markdown + code. Cell directives, HTML export, Jupyter import/export.

PKG Plugin System

Extend with Python, TypeScript, R, or native Rust plugins.

PERF Rust + JIT

Bytecode compiler + Cranelift JIT. Native noodles I/O. Single binary.

How it compares

Feature BioLang Python + Bio* R + Bioconductor
Native DNA/RNA types Built-in BioPython Seq Biostrings
Pipe syntax |> native None |> (magrittr)
Streaming large files Default Manual generators Limited
Bio API clients 16 built-in Separate packages Separate packages
LLM integration Built-in LangChain etc. ellmer/tidychat
Setup Single binary pip + conda R + BiocManager
Performance Rust-native C extensions C/Fortran

Ready to try it?

Single binary. No runtime deps. Works on Linux, macOS, Windows.

$ cargo install biolang
$ bl repl

MIT License · Open Source · Built with Rust