BioLang at a Glance

SYNTAX Pipe-First Operator

Data flows left-to-right via |>. No nested calls, no temp vars. Use |> into to bind results.

# a |> f(b) = f(a, b)
data |> filter(|x| x > 0) |> mean()
# bind mid-chain
data |> filter(|x| x > 0) |> into clean

BIO Native Bio Types

First-class DNA, RNA, Protein literals — not strings.

dna"ATCGATCG"
rna"AUCGAUCG"
protein"MKTLLILAVS"

TYPES Genomic Value Types

Variant, AlignedRead, Interval, Gene, Quality, Table, Matrix, Stream.

v.is_snp v.is_transition
read.is_mapped read.mapq
iv.chrom iv.start iv.end

SYNTAX Pattern Matching

Match expressions with destructuring and guards.

match variant_type(v) {
"missense" => analyze(v),
"nonsense" => flag_lof(v),
_ => "benign"
}

TYPES Gradual Typing

Full type inference. Add annotations when you want safety.

x = 42 # inferred Int
s = dna"ATCG"
t = [0.1, 0.9]

ERRORS Result + try/catch

Rust-style Result with ? operator, plus try/catch for scripts.

seq = read_fasta(path)
try { align(reads, ref) }
catch e { print(e.message) }

I/O 15 File Formats

Native readers/writers with auto gzip detection.

read_fasta() read_fastq()
fasta() fastq() vcf()
bed() gff() sam() bam()
csv() tsv() json()
+ write_fasta/fastq/vcf/bed/gff

SEQ Sequence Operations

All common sequence ops as builtins.

reverse_complement() transcribe()
translate() gc_content()
kmer_count() hamming_distance()
find_motif() find_orfs()

GENOMICS Interval Operations

Interval trees with O(log n + k) query performance.

interval_tree(regions)
query_overlaps(tree, chrom, start, end)
query_nearest(tree, chrom, start)
coverage(intervals)

VARIANT Variant Analysis

SNP/indel classification, genotype queries, VCF INFO parsing.

is_snp(v) is_transition(v)
is_het(v) variant_type(v)
parse_vcf_info(info_str)
strip_chr() normalize_chrom()

ALIGN Alignment / Reads

AlignedRead type with SAM flag queries and CIGAR parsing.

read.is_mapped read.mapq
read.is_duplicate
read.aligned_length
flagstat(reads)

COORD Coordinate Handling

Automatic 0-based/1-based conversion. Chromosome normalization.

strip_chr("chr1") # "1"
add_chr("X") # "chrX"
normalize_chrom("chrMT") # "chrM"
# VCF→BED coords auto-handled

TABLE dplyr-Style Verbs

Built-in table type with familiar data manipulation verbs.

select() filter() mutate()
group_by() summarize()
arrange() inner_join() pivot_longer()

STREAM Lazy Streams

Process 100 GB files in constant memory. Lazy by default.

STATS Statistics

Descriptive stats, hypothesis tests, regression, PCA.

mean() median() stdev()
ttest() cor() lm()
pca() p_adjust("BH")
ks_test() chi_square()

HOF Higher-Order Functions

Functional operations on collections and streams.

map() filter() reduce()
flat_map() zip() each()
sort_by() unique() chunk()
par_map() par_filter()

MATRIX Matrix Operations

Dense and sparse matrices for expression data.

matrix() sparse_matrix()
mat_mul() transpose()
row_means() col_sums()

PARALLEL Parallel Processing

Multi-threaded map/filter for CPU-bound tasks.

NCBI

E-utilities, Datasets v2

Ensembl

Genes, VEP, orthologs

UniProt

Search, entry, GO, ID mapping

UCSC

Genomes, tracks, sequence

KEGG

Pathways, genes, links

STRING

Networks, enrichment

PDB

Entries, chains, sequences, search

Reactome

Pathways, analysis

QuickGO

GO terms, annotations

BioMart

Bulk queries, datasets

COSMIC

Cancer mutations, census

NCBI Datasets

Genes, taxonomy, genomes

nf-core

Pipeline catalog, params, parsing

BioContainers

9,000+ container images

Galaxy ToolShed

Tool repositories, categories

GRAPH Knowledge Graphs

Built-in graph data structure for PPI networks, regulatory networks, and pathways.

let g = graph()
let g = add_edge(g, "BRCA1", "TP53")
neighbors(g, "BRCA1")
shortest_path(g, "A", "B")
connected_components(g)

ENRICH Enrichment Analysis

ORA (hypergeometric + BH FDR) and GSEA (permutation-based) with GMT file parsing.

let sets = read_gmt("hallmark.gmt")
enrich(genes, sets, 20000)
gsea(ranked, sets)
# Filter by FDR < 0.05

PDB PDB & PubMed

Fetch PDB structures, chains, sequences. Search PubMed and retrieve abstracts.

pdb_entry("4HHB")
pdb_chains("4HHB")
pdb_sequence("4HHB", 1)
pubmed_search("CRISPR", 10)
pubmed_abstract(pmid)

ASCII Terminal Plots

Instant plots in REPL and terminal — no GUI needed.

sparkline(values)
bar_chart(data, opts)
boxplot(groups)
heatmap_ascii(matrix)
dotplot(x, y) coverage(depths)

SVG SVG Plots

SVG output for papers and reports. Save with save_svg().

plot(data, opts)
histogram(values, opts)
heatmap(matrix, opts)
volcano(table, opts)
ma_plot(table, opts)

BIO 21 Bio-Specific Plots

Domain-specific visualizations built for genomics data.

manhattan() qq_plot()
ideogram() circos()
kaplan_meier() oncoprint()
sequence_logo() phylo_tree()
sashimi() hic_map()

STATS Statistical Plots

Statistical and diagnostic visualizations.

violin(data) density(values)
pca_plot(table) roc_curve(data)
forest_plot(table)
clustered_heatmap(matrix)
venn(sets) upset(sets)

SEQ Sequence Viz

Alignment views, quality plots, genome tracks.

alignment_view(reads)
quality_plot(fastq_stats)
genome_track(intervals)
lollipop(mutations)
rainfall(variants) cnv_plot(data)

EXPORT SVG Export

All SVG plots can be saved to file.

p = volcano(table, opts)
save_svg(p, "figure1.svg")

# All plot functions return SVG
# strings you can save or embed

HTTP HTTP Downloads

Resumable downloads with progress. Proxy support built-in.

download(url, "output.fa.gz")
upload("results.vcf", url)
http_get(url, headers)
http_post(url, body)

FTP FTP & SFTP

Native FTP via suppaftp. SFTP/SCP via system SSH.

ftp_download("ftp://host/file")
ftp_upload("local.fa", "ftp://...")
ftp_list("ftp://host/dir/")
sftp_download("sftp://host/file")
scp("host:file", "local")

S3 AWS S3

Download, upload, list S3 buckets via AWS CLI.

s3_download("s3://bucket/file")
s3_upload("local.bam", "s3://...")
s3_list("s3://bucket/prefix/")

GCS Google Cloud Storage

Download and upload via gsutil.

gcs_download("gs://bucket/file")
gcs_upload("local.vcf", "gs://...")

ASPERA Aspera & SRA

High-speed Aspera transfers for EBI/NCBI. SRA toolkit integration.

aspera_download("era-fasp@...")
sra_prefetch("SRR12345678")
sra_fastq("SRR12345678")

SYNC Rsync & References

Rsync directories. 15 built-in reference genome sources.

rsync(src, dest, {compress: true})
ref_genome("GRCh38")
bio_sources() # list all 15
bio_fetch("dbSNP")

BIO Bio Format Writers

Write all major bioinformatics formats.

write_fasta(seqs, "out.fa")
write_fastq(reads, "out.fq")
write_vcf(variants, "out.vcf")
write_bed(regions, "out.bed")
write_gff(features, "out.gff")

TABLE Tabular Export

Write tables to CSV, TSV, and JSON.

write_csv(table, "results.csv")
write_tsv(table, "results.tsv")
write_json(data, "results.json")

# JSON supports any value type

SVG Plot Export

Save any plot to SVG file. All 36 plot functions return SVG strings.

p = volcano(table, opts)
save_svg(p, "figure1.svg")

m = manhattan(gwas, opts)
save_svg(m, "manhattan.svg")

MD Markdown Export

Convert any value to Markdown. Tables, records, lists auto-formatted.

to_markdown(table) # MD string
write_markdown(data, "report.md")
# List[Record] → auto-table

HTML HTML Reports

Self-contained HTML reports with inline CSS. SVG plots embedded.

to_html(table) # full HTML
write_html(results, "report.html")
# Styled tables, zebra stripes

LLM Built-in LLM Chat

Ask questions, generate code, analyze data — from your script.

chat("What does BRCA1 do?")
chat_code("filter VCF by qual")
llm_models() # show provider

PROVIDERS Multi-Provider LLM

Auto-detects from env vars. Zero config for most setups.

ANTHROPIC_API_KEY # Claude
OPENAI_API_KEY # GPT
OLLAMA_MODEL # local, free
LLM_BASE_URL # any compatible

TOOLS Container Tools

Run samtools, bwa, GATK from BioLang via BioContainers.

tool("samtools", "view -c in.bam")
tool_search("bwa")
tool_pull("samtools")
tool_popular()

CLI REPL

Interactive shell with :time, :type, :env, :profile, tab completion, history.

IDE LSP Server

Diagnostics, completion, hover — works with VS Code, Neovim, Helix.

DOC Literate Notebooks

.bln format: Markdown + code. Cell directives, HTML export, Jupyter import/export.

PKG Plugin System

Extend with Python, TypeScript, R, or native Rust plugins.

PERF Rust + JIT

Bytecode compiler + Cranelift JIT. Native noodles I/O. Single binary.

Feature	BioLang	Python + Bio*	R + Bioconductor
Native DNA/RNA types	Built-in	BioPython Seq	Biostrings
Pipe syntax	\|> native	None	\|> (magrittr)
Streaming large files	Default	Manual generators	Limited
Bio API clients	16 built-in	Separate packages	Separate packages
LLM integration	Built-in	LangChain etc.	ellmer/tidychat
Setup	Single binary	pip + conda	R + BiocManager
Performance	Rust-native	C extensions	C/Fortran

Language

Bioinformatics

Data & Tables

Built-in API Clients

Knowledge Graphs, Enrichment & Literature

Visualization — 42 Plot Types

Data Transfer & Cloud

Export & Output Formats

AI & Tooling

Developer Experience

How it compares

Ready to try it?