From Python/R

If you are coming from Python (BioPython, pandas) or R (Bioconductor, tidyverse), this guide maps familiar patterns to their BioLang equivalents. BioLang is not a replacement for Python or R, but for bioinformatics-specific pipelines it offers a more concise syntax, built-in sequence types, and native performance without a runtime dependency.

Key Differences at a Glance

Feature Python R BioLang
Type system Dynamic, duck typing Dynamic, vector-first Static inference, gradual typing
Pipe operator None (method chains) |> (R 4.1+) / %>% |> (first-class)
Sequence types Seq object (BioPython) DNAStringSet dna"..." literal
Package manager pip / conda install.packages / BiocManager Built-in stdlib, no package manager needed
Concurrency GIL, multiprocessing Single-threaded (parallel pkg) Native async, parallel iterators
Performance Interpreted (NumPy for speed) Interpreted (Rcpp for speed) Compiled, native speed
Null handling None NA / NULL None with ?? operator
Error handling try/except tryCatch try/catch + Result type

Reading a FASTQ File

Python (BioPython)

from Bio import SeqIO

records = list(SeqIO.parse("reads.fastq", "fastq"))
for record in records[:5]:
    quals = record.letter_annotations["phred_quality"]
    mean_q = sum(quals) / len(quals)
    print(f"{record.id}: mean Q={mean_q:.1f}, len={len(record.seq)}")

R (ShortRead)

library(ShortRead)

reads <- readFastq("reads.fastq")
quals <- quality(reads)
avg_qual <- rowMeans(as(quals, "matrix"))
data.frame(
  id = as.character(id(reads))[1:5],
  mean_q = round(avg_qual[1:5], 1),
  len = width(reads)[1:5]
)

BioLang

read_fastq("reads.fastq")
  |> take(5)
  |> map(|r| {
    let mean_q = mean_phred(r.quality)
    print("{r.id}: mean Q={mean_q:.1}, len={r.length}")
  })

GC Content Calculation

Python

from Bio.SeqUtils import gc_fraction

seq = "ATCGATCGATCGATCG"
gc = gc_fraction(seq)
print(f"GC content: {gc:.3f}")

R

library(Biostrings)

seq <- DNAString("ATCGATCGATCGATCG")
gc <- letterFrequency(seq, letters = c("G", "C"), as.prob = TRUE)
cat(sprintf("GC content: %.3f\n", sum(gc)))

BioLang

dna"ATCGATCGATCGATCG" |> gc_content() |> print()
# 0.500

Filtering and Transforming Data

Python (pandas)

import pandas as pd

df = pd.read_csv("samples.csv")
result = (
    df[df["reads"] >= 1_000_000]
    .assign(million_reads=lambda x: x["reads"] / 1e6)
    .sort_values("reads", ascending=False)
    [["sample_id", "million_reads"]]
)
print(result.to_string(index=False))

R (dplyr)

library(dplyr)

read.csv("samples.csv") |>
  filter(reads >= 1e6) |>
  mutate(million_reads = reads / 1e6) |>
  arrange(desc(reads)) |>
  select(sample_id, million_reads) |>
  print()

BioLang

csv("samples.csv")
  |> filter(|row| row["reads"] >= 1_000_000)
  |> mutate(|row| { "million_reads": row["reads"] / 1_000_000.0 })
  |> select(["sample_id", "million_reads"])
  |> arrange("million_reads", descending: true)
  |> print()

Reverse Complement

Python

from Bio.Seq import Seq

seq = Seq("ATCGATCG")
print(seq.reverse_complement())  # CGATCGAT

R

library(Biostrings)
reverseComplement(DNAString("ATCGATCG"))
# [1] "CGATCGAT"

BioLang

dna"ATCGATCG" |> reverse_complement() |> print()
# CGATCGAT

Statistical Summary

Python (NumPy)

import numpy as np

data = [38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3]
print(f"Mean:   {np.mean(data):.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Std:    {np.std(data):.2f}")
print(f"Min:    {np.min(data):.2f}")
print(f"Max:    {np.max(data):.2f}")

R

data <- c(38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3)
summary(data)

BioLang

let data = [38.5, 40.2, 35.1, 42.0, 37.8, 39.1, 41.3]
data |> describe() |> print()
# { mean: 39.14, median: 39.10, std_dev: 2.27, min: 35.10, max: 42.00, count: 7 }

K-mer Counting

Python

from collections import Counter

seq = "ATCGATCGATCG"
k = 3
kmers = [seq[i:i+k] for i in range(len(seq) - k + 1)]
counts = Counter(kmers)
for kmer, count in counts.most_common(5):
    print(f"{kmer}: {count}")

R

library(Biostrings)

seq <- DNAString("ATCGATCGATCG")
freqs <- oligonucleotideFrequency(seq, width = 3)
head(sort(freqs, decreasing = TRUE), 5)

BioLang

let counts = kmer_count(dna"ATCGATCGATCG", 3)
counts
  |> sort(|a, b| b["count"] - a["count"])
  |> take(5)
  |> map(|kv| print("{kv['kmer']}: {kv['count']}"))

Writing Output Files

Python

import json

results = {"total_reads": 1500000, "pass_rate": 0.95, "mean_quality": 35.2}
with open("results.json", "w") as f:
    json.dump(results, f, indent=2)

R

library(jsonlite)

results <- list(total_reads = 1500000, pass_rate = 0.95, mean_quality = 35.2)
write_json(results, "results.json", pretty = TRUE)

BioLang

let results = {
  "total_reads": 1_500_000,
  "pass_rate": 0.95,
  "mean_quality": 35.2
}
write_text("results.json", results)

Pattern Matching

BioLang's match expression replaces chains of if/elif/else in Python or switch/case patterns in R:

Python

def classify_quality(q):
    if q >= 35:
        return "excellent"
    elif q >= 25:
        return "good"
    elif q >= 20:
        return "acceptable"
    else:
        return "poor"

BioLang

fn classify_quality(q: Float) -> String {
  if q >= 35.0 {
    "excellent"
  } else if q >= 25.0 {
    "good"
  } else if q >= 20.0 {
    "acceptable"
  } else {
    "poor"
  }
}

Parallel Processing

Python

from multiprocessing import Pool

def process_file(path):
    # ... analysis logic ...
    return result

with Pool(4) as pool:
    results = pool.map(process_file, file_list)

R

library(parallel)

results <- mclapply(file_list, process_file, mc.cores = 4)

BioLang

let results = file_list
  |> map(|path| {
    read_fastq(path) |> map(|r| analyze(r))
  })

Migration Tips

  1. Start with the REPL. Port small snippets one at a time. The REPL's :type command helps when you are unsure what types BioLang infers.
  2. Think in pipes. If you write Python method chains or R pipe chains, BioLang pipes will feel natural. The key difference is that BioLang pipes pass the value as the first argument, not via a self/. receiver.
  3. Use sequence literals. Stop treating DNA as plain strings. The dna"..." type catches errors at compile time and provides domain-specific functions like reverse_complement() and gc_content().
  4. Leverage type inference. You rarely need type annotations. BioLang infers types through the entire pipe chain and will tell you at compile time if something does not match.
  5. No package ecosystem needed. BioLang's standard library includes FASTQ/FASTA/BAM parsing, statistics, CSV/JSON/TSV I/O, and HTTP. You do not need to install external packages for common bioinformatics tasks.
  6. Think in data flow. Pipes naturally encourage a clean data flow style where each step transforms data and passes it to the next.

Common Gotchas

  • No implicit type coercion. Unlike Python, 1 + 1.0 requires an explicit float(1) + 1.0. The compiler error message tells you exactly what conversion is needed.
  • Indexing is zero-based. Like Python, unlike R. list[0] is the first element.
  • No NULL/NA propagation. BioLang uses Option<T> (values are either Some(value) or None). The ?? operator provides defaults. No silent NA propagation surprises.
  • String formatting uses {} not % or f"". BioLang strings support interpolation with curly braces: "value: {x}".

Next Steps

You now have a solid map between Python/R and BioLang. Explore the Language Reference for the full specification, or dive into the Standard Library documentation to see all available modules and functions.