Pipes
The pipe operator |> is the defining feature of BioLang. It passes the
value on the left as the first argument to the function on the right,
enabling clean, readable data transformation chains that mirror the conceptual flow of
bioinformatics pipelines.
Basic Syntax
The expression a |> f(b, c) is syntactic sugar for f(a, b, c).
The left-hand value is always inserted as the first argument:
# These two lines are equivalent:
let result = upper("hello")
let result = "hello" |> upper()
# With additional arguments:
let result = slice("hello world", 0, 5)
let result = "hello world" |> slice(0, 5)
Chaining Pipes
The real power of pipes emerges when chaining multiple transformations. Each step feeds its result into the next:
# Read a FASTA file, filter by length, compute GC content
let gc_values = read_fasta("genome.fa")
|> filter(|seq| seq_len(seq) >= 1000)
|> map(|seq| gc_content(seq))
|> sort()
|> reverse()
# Variant calling pipeline
let variants = read_bam("sample.bam")
|> filter(|r| r.mapq >= 30)
|> pileup("hg38.fa")
|> call_variants(min_depth = 10)
|> filter(|v| v.qual >= 20.0)
|> annotate("clinvar.vcf")
Pipes with Lambdas
Lambdas are frequently used with pipes to define inline transformation logic:
# Transform sample names
let clean_names = sample_names
|> map(|s| s |> lower() |> trim())
|> filter(|s| len(s) > 0)
|> unique()
# Nested pipes inside a lambda
let results = genes
|> map(|gene| {
let seq = fetch_sequence(gene)
seq
|> transcribe()
|> translate()
|> len()
})
Pipe-Friendly Function Design
BioLang functions are designed pipe-first: the primary data argument is always the first parameter. This convention ensures every function works naturally with pipes:
# The data argument comes first
fn normalize(values: List[Float], method: String) -> List[Float] {
match method {
"zscore" => {
let mu = mean(values)
let sd = stdev(values)
values |> map(|x| (x - mu) / sd)
},
"minmax" => {
let lo = min(values)
let hi = max(values)
values |> map(|x| (x - lo) / (hi - lo))
},
_ => values
}
}
# Now it works naturally with pipes:
let normed = raw_scores |> normalize("zscore")
Partial Application
You can create partially applied functions to use in pipe chains. The underscore
_ marks the position where the piped value will be inserted:
# Using partial application with _
let add_ten = _ + 10
let scores = [1, 2, 3] |> map(add_ten) # [11, 12, 13]
# Useful for reusable pipeline stages
let high_quality = filter(_, |r| mean_phred(r.quality) >= 30)
let long_reads = filter(_, |r| r.length >= 150)
reads |> high_quality |> long_reads
Operator Precedence
The pipe operator has the lowest precedence of all operators, so expressions on either side are fully evaluated before piping:
# Arithmetic is evaluated before piping
let result = 2 + 3 |> str() # str(5), not 2 + str(3)
# Comparison is evaluated before piping
let x = items |> len() > 0 # (len(items)) > 0
# Use parentheses when you need different grouping
let y = items |> (filter(|x| x > 0) |> len())
Multiline Pipes
BioLang suppresses newlines after the |> operator, so pipe chains can
span multiple lines without any special continuation syntax:
# Each pipe stage on its own line for readability
let summary = csv("experiment.csv")
|> filter(|row| row.p_value < 0.05)
|> select("gene", "log2fc", "p_value")
|> arrange(desc(log2fc))
|> head(20)
|> str()
Method-Style Calls
For functions that take a single argument (the piped value), you can omit the parentheses entirely:
# With parentheses (always works)
let rc = dna"ATCG" |> reverse_complement()
# Without parentheses (zero-arg pipe target)
let rc = dna"ATCG" |> reverse_complement
# Chained
let result = dna"ATCGATCG"
|> reverse_complement
|> gc_content
|> str
Pipes with Tables
Table operations are designed specifically for pipe chains, following the select-filter-arrange-summarize pattern:
let report = csv("variants.csv")
|> select("gene", "variant", "impact", "frequency")
|> filter(|r| r.impact == "HIGH")
|> arrange(desc(frequency))
|> group_by("gene")
|> summarize(
variant_count = n(),
max_freq = max(frequency)
)
|> filter(|r| r.variant_count >= 3)
|> arrange(desc(variant_count))
Pipes with Streams
Pipes compose naturally with lazy streams. The stream is only consumed when a terminal
operation (like collect or write) is reached:
# Nothing happens until collect() — all lazy
let results = stream_fastq("huge.fq.gz")
|> filter(|r| mean_phred(r.quality) >= 30)
|> filter(|r| r.length >= 50)
|> take(10_000)
|> collect()
Debugging Pipes
Use tap to inspect intermediate values without breaking the chain:
let result = data
|> filter(|r| r.score > 0.5)
|> tap(|d| print(f"After filter: {len(d)} rows"))
|> arrange(desc(score))
|> tap(|d| print(f"Top score: {d[0].score}"))
|> head(10)
Tap Pipe
The tap-pipe operator |>> evaluates a side effect without
breaking the data flow. The original value passes through unchanged — useful
for logging, debugging, or writing intermediate results:
# Log intermediate values without breaking the chain
let result = read_fastq("sample.fq")
|> filter(|r| mean_phred(r.quality) > 20)
|>> print(f"After QC: {len(_)} reads")
|> map(|r| gc_content(r.seq))
|>> write_csv(_, "gc_values.csv")
|> mean()
# _ binds to the piped value inside tap
let data = [3, 1, 4, 1, 5]
|> sort()
|>> print(f"Sorted: {_}")
|> reverse()
Common Patterns
# ETL: Extract, Transform, Load
let cleaned = csv("raw_data.csv")
|> select(new_col = "old_col")
|> mutate(score = |r| r.raw_score / 100.0)
|> filter(|r| r.score != None)
|> write_csv("cleaned.csv")
# Pipeline with error handling
let safe_result = raw_data
|> map(|item| try { process(item) } catch _ { None })
|> filter(|r| r != None)
|> map(|r| unwrap(r))
# Forking a pipeline
let base = read_fastq("sample.fq") |> filter(|r| mean_phred(r.quality) >= 20)
let short = base |> filter(|r| r.length < 150) |> collect()
let long = base |> filter(|r| r.length >= 150) |> collect()
Uniform Function Call Syntax (UFCS)
When x.f(args) is called and x has no method named f, BioLang automatically rewrites it to f(x, args). This lets you use dot-call syntax with any function.
# These are equivalent:
seq |> gc_content()
gc_content(seq)
seq.gc_content()
# Chain built-in functions with dot syntax
let result = dna"ATCGATCG"
.reverse_complement()
.gc_content()
# Works with any function, including user-defined ones
fn double(x) { x * 2 }
let n = 5.double() # 10
# Mix pipe and dot styles
reads
.filter(|r| mean_phred(r.quality) > 20)
.map(|r| r.length)
.mean()
UFCS is purely syntactic sugar — x.f(a, b) becomes f(x, a, b) when no native method exists.