From Python/R

If you are coming from Python (BioPython, pandas) or R (Bioconductor, tidyverse), this guide maps familiar patterns to their BioLang equivalents. BioLang is not a replacement for Python or R, but for bioinformatics-specific pipelines it offers a more concise syntax, built-in sequence types, and native performance without a runtime dependency.

Key Differences at a Glance

Feature	Python	R	BioLang
Type system	Dynamic, duck typing	Dynamic, vector-first	Static inference, gradual typing
Pipe operator	n/a (method chains)	`\|>` (R 4.1+) / `%>%`	`\|>` (first-class)
Sequence types	`Seq` object (BioPython)	`DNAStringSet`	`dna"..."` literal
Package manager	pip / conda	install.packages / BiocManager	Built-in stdlib, no package manager needed
Concurrency	GIL, multiprocessing	Single-threaded (parallel pkg)	Native async, parallel iterators
Performance	Interpreted (NumPy for speed)	Interpreted (Rcpp for speed)	Compiled, native speed
Null handling	`None`	`NA` / `NULL`	`nil` with `??` operator
Error handling	try/except	tryCatch	`try/catch` + `error()`

Reading a FASTQ File

Python (BioPython)

from Bio import SeqIO

records = list(SeqIO.parse("reads.fastq", "fastq"))
for record in records[:5]:
    quals = record.letter_annotations["phred_quality"]
    mean_q = sum(quals) / len(quals)
    print(f"{record.id}: mean Q={mean_q:.1f}, len={len(record.seq)}")

R (ShortRead)

library(ShortRead)

reads <- readFastq("reads.fastq")
quals <- quality(reads)
avg_qual <- rowMeans(as(quals, "matrix"))
data.frame(
  id = as.character(id(reads))[1:5],
  mean_q = round(avg_qual[1:5], 1),
  len = width(reads)[1:5]
)

BioLang

read_fastq("reads.fastq")
  |> take(5)
  |> map(|r| {
    let mean_q = mean_phred(r.quality)
    print("{r.id}: mean Q={mean_q:.1}, len={r.length}")
  })

GC Content Calculation

Python

from Bio.SeqUtils import gc_fraction

seq = "ATCGATCGATCGATCG"
gc = gc_fraction(seq)
print(f"GC content: {gc:.3f}")

R

library(Biostrings)

seq <- DNAString("ATCGATCGATCGATCG")
gc <- letterFrequency(seq, letters = c("G", "C"), as.prob = TRUE)
cat(sprintf("GC content: %.3f\n", sum(gc)))

BioLang

dna"ATCGATCGATCGATCG" |> gc_content() |> print()
# 0.500

Filtering and Transforming Data

Python (pandas)

import pandas as pd

df = pd.read_csv("samples.csv")
result = (
    df[df["reads"] >= 1_000_000]
    .assign(million_reads=lambda x: x["reads"] / 1e6)
    .sort_values("reads", ascending=False)
    [["sample_id", "million_reads"]]
)
print(result.to_string(index=False))

R (dplyr)

library(dplyr)

read.csv("samples.csv") |>
  filter(reads >= 1e6) |>
  mutate(million_reads = reads / 1e6) |>
  arrange(desc(reads)) |>
  select(sample_id, million_reads) |>
  print()

BioLang

read_csv("data/sample_sheet.csv")
  |> filter(|row| row["reads"] >= 1_000_000)
  |> mutate("million_reads", |row| row["reads"] / 1_000_000.0)
  |> select("sample_id", "million_reads")
  |> arrange("-million_reads")
  |> print()

Reverse Complement

Python

from Bio.Seq import Seq

seq = Seq("ATCGATCG")
print(seq.reverse_complement())  # CGATCGAT

R

library(Biostrings)
reverseComplement(DNAString("ATCGATCG"))
# [1] "CGATCGAT"

BioLang

dna"ATCGATCG" |> reverse_complement() |> print()
# CGATCGAT

Statistical Summary

Python (NumPy)

import numpy as np

data = [38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3]
print(f"Mean:   {np.mean(data):.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Std:    {np.std(data):.2f}")
print(f"Min:    {np.min(data):.2f}")
print(f"Max:    {np.max(data):.2f}")

R

data <- c(38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3)
summary(data)

BioLang

let data = [38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3]
data |> describe() |> print()
# { mean: 39.14, median: 39.10, std_dev: 2.27, min: 35.10, max: 42.00, count: 7 }

K-mer Counting

Python

from collections import Counter

seq = "ATCGATCGATCG"
k = 3
kmers = [seq[i:i+k] for i in range(len(seq) - k + 1)]
counts = Counter(kmers)
for kmer, count in counts.most_common(5):
    print(f"{kmer}: {count}")

R

library(Biostrings)

seq <- DNAString("ATCGATCGATCG")
freqs <- oligonucleotideFrequency(seq, width = 3)
head(sort(freqs, decreasing = TRUE), 5)

BioLang

let counts = kmer_count(dna"ATCGATCGATCG", 3)
counts
  |> sort(|a, b| b["count"] - a["count"])
  |> take(5)
  |> map(|kv| print("{kv['kmer']}: {kv['count']}"))

Writing Output Files

Python

import json

results = {"total_reads": 1500000, "pass_rate": 0.95, "mean_quality": 35.2}
with open("results.json", "w") as f:
    json.dump(results, f, indent=2)

R

library(jsonlite)

results <- list(total_reads = 1500000, pass_rate = 0.95, mean_quality = 35.2)
write_json(results, "results.json", pretty = TRUE)

BioLang

let results = {
  "total_reads": 1_500_000,
  "pass_rate": 0.95,
  "mean_quality": 35.2
}
write_text("results.json", results)

Pattern Matching

BioLang's match expression replaces chains of if/elif/else in Python or switch/case patterns in R:

Python

def classify_quality(q):
    if q >= 35:
        return "excellent"
    elif q >= 25:
        return "good"
    elif q >= 20:
        return "acceptable"
    else:
        return "poor"

BioLang

fn classify_quality(q: Float) -> String {
  if q >= 35.0 {
    "excellent"
  } else if q >= 25.0 {
    "good"
  } else if q >= 20.0 {
    "acceptable"
  } else {
    "poor"
  }
}

Parallel Processing

Python

from multiprocessing import Pool

def process_file(path):
    # ... analysis logic ...
    return result

with Pool(4) as pool:
    results = pool.map(process_file, file_list)

R

library(parallel)

results <- mclapply(file_list, process_file, mc.cores = 4)

BioLang

let results = file_list
  |> map(|path| {
    read_fastq(path) |> map(|r| analyze(r))
  })

Migration Tips

Start with the REPL. Port small snippets one at a time. The REPL's :type command helps when you are unsure what types BioLang infers.
Think in pipes. If you write Python method chains or R pipe chains, BioLang pipes will feel natural. The key difference is that BioLang pipes pass the value as the first argument, not via a self/. receiver.
Use sequence literals. Stop treating DNA as plain strings. The dna"..." type catches errors at compile time and provides domain-specific functions like reverse_complement() and gc_content().
Leverage type inference. You rarely need type annotations. BioLang infers types through the entire pipe chain and will tell you at compile time if something does not match.
No package ecosystem needed. BioLang's standard library includes FASTQ/FASTA/BAM parsing, statistics, CSV/JSON/TSV I/O, and HTTP. You do not need to install external packages for common bioinformatics tasks.
Think in data flow. Pipes naturally encourage a clean data flow style where each step transforms data and passes it to the next.

Common Gotchas

No implicit type coercion. Unlike Python, 1 + 1.0 requires an explicit float(1) + 1.0. The compiler error message tells you exactly what conversion is needed.
Indexing is zero-based. Like Python, unlike R. list[0] is the first element.
No NULL/NA propagation. BioLang uses nil for absent values. The ?? operator provides defaults. No silent NA propagation surprises.
String formatting uses {} not % or f"". BioLang strings support interpolation with curly braces: "value: {x}".

Next Steps

You now have a solid map between Python/R and BioLang. Explore the Language Reference for the full specification, or dive into the Standard Library documentation to see all available modules and functions.