Pipes & Transforms

The pipe operator |> is the defining feature of BioLang. It passes the value on the left as the first argument to the function on the right, enabling clean, readable data transformation chains.

Basic Syntax

The expression a |> f(b) is syntactic sugar for f(a, b). The left-hand value is always inserted as the first argument:

# These two lines are equivalent:
let r1 = upper("hello")
let r2 = "hello" |> upper()
println(r1)   # HELLO
println(r2)   # HELLO

# With additional arguments:
let s1 = slice("hello world", 0, 5)
let s2 = "hello world" |> slice(0, 5)
println(s1)   # hello
println(s2)   # hello

Chaining Pipes

The real power of pipes emerges when chaining multiple transformations. Each step feeds its result into the next:

# Chain string transforms
let result = "  Hello, World!  "
  |> trim()
  |> lower()
  |> split(" ")
  |> join("-")
println(result)   # hello,-world!

# Chain list transforms
let nums = [5, 3, 8, 1, 9, 2, 7]
  |> sort()
  |> reverse()
  |> take(3)
println(nums)   # [9, 8, 7]

Pipes with Lambdas

Lambdas are frequently used with pipes to define inline transformation logic:

# Filter and transform with lambdas
let scores = [85, 92, 45, 78, 95, 33, 88]
let high = scores
  |> filter(|s| s >= 80)
  |> map(|s| s / 100.0)
  |> sort()
println(high)   # [0.85, 0.88, 0.92, 0.95]

# Nested pipes inside a lambda
let words = ["BRCA1", "tp53", "Egfr"]
let cleaned = words
  |> map(|w| w |> upper() |> trim())
println(cleaned)   # [BRCA1, TP53, EGFR]

Pipe-Friendly Function Design

BioLang functions are designed pipe-first: the primary data argument is always the first parameter. This convention ensures every function works naturally with pipes:

# Data argument comes first so pipes work naturally
fn scale(values, factor) {
  values |> map(|x| x * factor)
}

# Now it chains with pipes:
let result = [1, 2, 3, 4, 5]
  |> scale(10)
  |> filter(|x| x > 20)
println(result)   # [30, 40, 50]

Multiline Pipes

BioLang suppresses newlines after the |> operator, so pipe chains can span multiple lines without any special continuation syntax:

# Each pipe stage on its own line for readability
let genes = ["BRCA1", "TP53", "EGFR", "MYC", "KRAS"]
let report = genes
  |> filter(|g| len(g) <= 4)
  |> map(|g| f"Gene: {g}")
  |> join(", ")
println(report)   # Gene: TP53, Gene: EGFR, Gene: MYC, Gene: KRAS

Higher-Order Functions

BioLang provides a rich set of higher-order functions that work with pipes. Here are the most commonly used:

map

Transform each element:

let doubled = [1, 2, 3] |> map(|x| x * 2)
println(doubled)   # [2, 4, 6]

filter

Keep elements matching a predicate:

let evens = [1, 2, 3, 4, 5, 6] |> filter(|x| x % 2 == 0)
println(evens)   # [2, 4, 6]

reduce

Fold a list into a single value:

let total = [1, 2, 3, 4, 5] |> reduce(|a, b| a + b)
println(total)   # 15

each

Execute a side effect for every element (returns the original list):

[10, 20, 30] |> each(|x| println(f"Value: {x}"))
# Value: 10
# Value: 20
# Value: 30

sort_by

Sort with a custom comparator:

# Sort strings by length
let sorted = ["BRCA1", "TP53", "EGFR", "A", "MYC"]
  |> sort_by(|a, b| len(a) - len(b))
println(sorted)   # [A, MYC, TP53, EGFR, BRCA1]

flat_map

Map and flatten the results:

let result = ["hello world", "foo bar"]
  |> flat_map(|s| split(s, " "))
println(result)   # [hello, world, foo, bar]

take and skip

Slice from the front or skip initial elements:

let first_three = [10, 20, 30, 40, 50] |> take(3)
println(first_three)   # [10, 20, 30]

let rest = slice([10, 20, 30, 40, 50], 2, 5)
println(rest)   # [30, 40, 50]

zip

Combine two lists into pairs:

let names = ["BRCA1", "TP53", "EGFR"]
let scores = [0.95, 0.87, 0.72]
let paired = zip(names, scores)
println(paired)   # [[BRCA1, 0.95], [TP53, 0.87], [EGFR, 0.72]]

enumerate

Pair each element with its index:

let indexed = ["A", "B", "C"] |> enumerate()
println(indexed)   # [[0, A], [1, B], [2, C]]

Combining Multiple HOFs

The full power of pipes comes from chaining these functions together into data processing pipelines:

# A complete data pipeline
let data = [
  {gene: "BRCA1", log2fc: 3.2, pvalue: 0.001},
  {gene: "TP53", log2fc: -1.5, pvalue: 0.04},
  {gene: "EGFR", log2fc: 0.3, pvalue: 0.5},
  {gene: "MYC", log2fc: 2.8, pvalue: 0.01},
  {gene: "KRAS", log2fc: -0.1, pvalue: 0.8}
]

# Filter significant, sort by fold change, report
data
  |> filter(|r| r.pvalue < 0.05)
  |> sort_by(|a, b| if b.log2fc > a.log2fc then 1 else -1)
  |> each(|r| println(f"{r.gene}: log2FC={r.log2fc}, p={r.pvalue}"))
# BRCA1: log2FC=3.2, p=0.001
# MYC: log2FC=2.8, p=0.01
# TP53: log2FC=-1.5, p=0.04

Bio-Specific Pipe Chains

Pipes are especially natural for biological sequence analysis:

# DNA sequence analysis pipeline
let seq = dna"ATCGATCGATCGATCG"
let rc = seq |> reverse_complement()
let gc = seq |> gc_content()
println(f"Sequence: {seq}")
println(f"Rev comp: {rc}")
println(f"GC content: {gc}")

# Analyze multiple sequences
let seqs = [dna"ATCGATCG", dna"GCGCGCGC", dna"AAATTTAA"]
let gc_values = seqs |> map(|s| gc_content(s))
println(f"GC values: {gc_values}")

Common Patterns

# Pipeline with error handling
let raw = [1, 0, 2, 0, 3]
let safe = raw
  |> filter(|x| x != 0)
  |> map(|x| 100 / x)
println(safe)   # [100, 50, 33]

# Forking a pipeline
let numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
let small = numbers |> filter(|x| x <= 5)
let large = numbers |> filter(|x| x > 5)
println(f"Small: {small}")
println(f"Large: {large}")