From Python/R
If you are coming from Python (BioPython, pandas) or R (Bioconductor, tidyverse), this guide maps familiar patterns to their BioLang equivalents. BioLang is not a replacement for Python or R, but for bioinformatics-specific pipelines it offers a more concise syntax, built-in sequence types, and native performance without a runtime dependency.
Key Differences at a Glance
| Feature | Python | R | BioLang |
|---|---|---|---|
| Type system | Dynamic, duck typing | Dynamic, vector-first | Static inference, gradual typing |
| Pipe operator | None (method chains) | |> (R 4.1+) / %>% |
|> (first-class) |
| Sequence types | Seq object (BioPython) |
DNAStringSet |
dna"..." literal |
| Package manager | pip / conda | install.packages / BiocManager | Built-in stdlib, no package manager needed |
| Concurrency | GIL, multiprocessing | Single-threaded (parallel pkg) | Native async, parallel iterators |
| Performance | Interpreted (NumPy for speed) | Interpreted (Rcpp for speed) | Compiled, native speed |
| Null handling | None |
NA / NULL |
None with ?? operator |
| Error handling | try/except | tryCatch | try/catch + Result type |
Reading a FASTQ File
Python (BioPython)
from Bio import SeqIO
records = list(SeqIO.parse("reads.fastq", "fastq"))
for record in records[:5]:
quals = record.letter_annotations["phred_quality"]
mean_q = sum(quals) / len(quals)
print(f"{record.id}: mean Q={mean_q:.1f}, len={len(record.seq)}")
R (ShortRead)
library(ShortRead)
reads <- readFastq("reads.fastq")
quals <- quality(reads)
avg_qual <- rowMeans(as(quals, "matrix"))
data.frame(
id = as.character(id(reads))[1:5],
mean_q = round(avg_qual[1:5], 1),
len = width(reads)[1:5]
)
BioLang
read_fastq("reads.fastq")
|> take(5)
|> map(|r| {
let mean_q = mean_phred(r.quality)
print("{r.id}: mean Q={mean_q:.1}, len={r.length}")
})
GC Content Calculation
Python
from Bio.SeqUtils import gc_fraction
seq = "ATCGATCGATCGATCG"
gc = gc_fraction(seq)
print(f"GC content: {gc:.3f}")
R
library(Biostrings)
seq <- DNAString("ATCGATCGATCGATCG")
gc <- letterFrequency(seq, letters = c("G", "C"), as.prob = TRUE)
cat(sprintf("GC content: %.3f\n", sum(gc)))
BioLang
dna"ATCGATCGATCGATCG" |> gc_content() |> print()
# 0.500
Filtering and Transforming Data
Python (pandas)
import pandas as pd
df = pd.read_csv("samples.csv")
result = (
df[df["reads"] >= 1_000_000]
.assign(million_reads=lambda x: x["reads"] / 1e6)
.sort_values("reads", ascending=False)
[["sample_id", "million_reads"]]
)
print(result.to_string(index=False))
R (dplyr)
library(dplyr)
read.csv("samples.csv") |>
filter(reads >= 1e6) |>
mutate(million_reads = reads / 1e6) |>
arrange(desc(reads)) |>
select(sample_id, million_reads) |>
print()
BioLang
csv("samples.csv")
|> filter(|row| row["reads"] >= 1_000_000)
|> mutate(|row| { "million_reads": row["reads"] / 1_000_000.0 })
|> select(["sample_id", "million_reads"])
|> arrange("million_reads", descending: true)
|> print()
Reverse Complement
Python
from Bio.Seq import Seq
seq = Seq("ATCGATCG")
print(seq.reverse_complement()) # CGATCGAT
R
library(Biostrings)
reverseComplement(DNAString("ATCGATCG"))
# [1] "CGATCGAT"
BioLang
dna"ATCGATCG" |> reverse_complement() |> print()
# CGATCGAT
Statistical Summary
Python (NumPy)
import numpy as np
data = [38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3]
print(f"Mean: {np.mean(data):.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Std: {np.std(data):.2f}")
print(f"Min: {np.min(data):.2f}")
print(f"Max: {np.max(data):.2f}")
R
data <- c(38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3)
summary(data)
BioLang
let data = [38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3]
data |> describe() |> print()
# { mean: 39.14, median: 39.10, std_dev: 2.27, min: 35.10, max: 42.00, count: 7 }
K-mer Counting
Python
from collections import Counter
seq = "ATCGATCGATCG"
k = 3
kmers = [seq[i:i+k] for i in range(len(seq) - k + 1)]
counts = Counter(kmers)
for kmer, count in counts.most_common(5):
print(f"{kmer}: {count}")
R
library(Biostrings)
seq <- DNAString("ATCGATCGATCG")
freqs <- oligonucleotideFrequency(seq, width = 3)
head(sort(freqs, decreasing = TRUE), 5)
BioLang
let counts = kmer_count(dna"ATCGATCGATCG", 3)
counts
|> sort(|a, b| b["count"] - a["count"])
|> take(5)
|> map(|kv| print("{kv['kmer']}: {kv['count']}"))
Writing Output Files
Python
import json
results = {"total_reads": 1500000, "pass_rate": 0.95, "mean_quality": 35.2}
with open("results.json", "w") as f:
json.dump(results, f, indent=2)
R
library(jsonlite)
results <- list(total_reads = 1500000, pass_rate = 0.95, mean_quality = 35.2)
write_json(results, "results.json", pretty = TRUE)
BioLang
let results = {
"total_reads": 1_500_000,
"pass_rate": 0.95,
"mean_quality": 35.2
}
write_text("results.json", results)
Pattern Matching
BioLang's match expression replaces chains of if/elif/else in Python or
switch/case patterns in R:
Python
def classify_quality(q):
if q >= 35:
return "excellent"
elif q >= 25:
return "good"
elif q >= 20:
return "acceptable"
else:
return "poor"
BioLang
fn classify_quality(q: Float) -> String {
if q >= 35.0 {
"excellent"
} else if q >= 25.0 {
"good"
} else if q >= 20.0 {
"acceptable"
} else {
"poor"
}
}
Parallel Processing
Python
from multiprocessing import Pool
def process_file(path):
# ... analysis logic ...
return result
with Pool(4) as pool:
results = pool.map(process_file, file_list)
R
library(parallel)
results <- mclapply(file_list, process_file, mc.cores = 4)
BioLang
let results = file_list
|> map(|path| {
read_fastq(path) |> map(|r| analyze(r))
})
Migration Tips
-
Start with the REPL. Port small snippets one at a time. The REPL's
:typecommand helps when you are unsure what types BioLang infers. -
Think in pipes. If you write Python method chains or R pipe chains,
BioLang pipes will feel natural. The key difference is that BioLang pipes pass the
value as the first argument, not via a
self/.receiver. -
Use sequence literals. Stop treating DNA as plain strings. The
dna"..."type catches errors at compile time and provides domain-specific functions likereverse_complement()andgc_content(). - Leverage type inference. You rarely need type annotations. BioLang infers types through the entire pipe chain and will tell you at compile time if something does not match.
- No package ecosystem needed. BioLang's standard library includes FASTQ/FASTA/BAM parsing, statistics, CSV/JSON/TSV I/O, and HTTP. You do not need to install external packages for common bioinformatics tasks.
- Think in data flow. Pipes naturally encourage a clean data flow style where each step transforms data and passes it to the next.
Common Gotchas
-
No implicit type coercion. Unlike Python,
1 + 1.0requires an explicitfloat(1) + 1.0. The compiler error message tells you exactly what conversion is needed. -
Indexing is zero-based. Like Python, unlike R.
list[0]is the first element. -
No
NULL/NApropagation. BioLang usesOption<T>(values are eitherSome(value)orNone). The??operator provides defaults. No silentNApropagation surprises. -
String formatting uses
{}not%orf"". BioLang strings support interpolation with curly braces:"value: {x}".
Next Steps
You now have a solid map between Python/R and BioLang. Explore the Language Reference for the full specification, or dive into the Standard Library documentation to see all available modules and functions.