Hello Genomics
Your first steps with BioLang. In this tutorial you will create DNA and RNA sequences, perform basic operations like GC content and reverse complement, and build a small sequence-analysis script.
What you will learn
- Creating DNA and RNA sequences with the
dna"..."andrna"..."literals - Computing GC content, length, and base composition
- Reverse complement, transcription, and translation
- Using the pipe operator
|>to chain operations - Variables with
let
bl run examples/tutorials/hello-genomics.bl
Prerequisites
Make sure BioLang is installed. Open a terminal and check:
bl --version
# bl 0.1.0
If you do not have it yet, follow the Installation Guide.
Step 1 — Create Your First DNA Sequence
BioLang has first-class support for biological sequences. The dna"..."
literal creates a validated DNA sequence at parse time. Any invalid bases will cause
a compile error, not a runtime error.
# hello.bio — your first BioLang program
let seq = dna"ATGCGATCGATCGATCG"
print(seq) # ATGCGATCGATCGATCG
print(len(seq)) # 17
print(type(seq)) # DNA
Save the file as hello.bio and run it:
bl run hello.bio
The dna"..." literal accepts uppercase A, T, G, C and the IUPAC
ambiguity codes (N, R, Y, etc.). Lowercase letters are automatically uppercased.
Step 2 — Base Composition and GC Content
BioLang provides free functions for composition analysis. No imports needed.
let seq = dna"ATGCGATCGATCGATCG"
# Base counts
let counts = base_counts(seq)
print(counts)
# {A: 4, T: 4, G: 5, C: 4}
# GC content as a fraction 0..1
let gc = gc_content(seq)
print(gc) # 0.5294
print(gc * 100.0) # 52.94
# AT content is the complement
let at = 1.0 - gc
print(f"AT content: {at * 100.0}%") # AT content: 47.06%
The gc_content() function returns a Float. BioLang uses
f-string interpolation with f"...{expr}..." syntax.
Step 3 — Reverse Complement
The reverse complement is fundamental in molecular biology. BioLang makes it a single function call.
let forward = dna"ATGCGATCG"
let revcomp = reverse_complement(forward)
print(forward) # ATGCGATCG
print(revcomp) # CGATCGCAT
# Verify: reverse complement of reverse complement is the original
assert(reverse_complement(revcomp) == forward)
The reverse_complement() function handles IUPAC ambiguity codes
correctly: R becomes Y, S stays S, and so on.
Step 4 — Transcription: DNA to RNA
Transcription converts a DNA sequence to its RNA equivalent, replacing T with U.
let dna_seq = dna"ATGCGATCG"
# Transcribe DNA -> RNA
let rna_seq = transcribe(dna_seq)
print(rna_seq) # AUGCGAUCG
print(type(rna_seq)) # RNA
# You can also create RNA directly
let direct_rna = rna"AUGCGAUCG"
assert(rna_seq == direct_rna)
Notice how the return type changes from Dna to Rna.
BioLang's type system tracks molecule types so you cannot accidentally mix them
in operations that only accept one kind.
Step 5 — Translation: RNA to Protein
Translation converts an RNA sequence into a protein sequence using the standard genetic code. BioLang reads codons (three bases at a time) and maps each to an amino acid.
# A classic ORF: ATG ... stop
let gene = dna"ATGGCTAGCAAATGA"
# Transcribe then translate
let protein = gene
|> transcribe()
|> translate()
print(protein) # MASK*
print(type(protein)) # Protein
Here we introduced the pipe operator |>.
It takes the result of the left side and passes it as the first argument of the
right-side function call. You can chain as many pipes as you like.
Step 6 — Subsequences and Slicing
Extract portions of a sequence with slice syntax. Indices are zero-based and the end is exclusive, matching most programming languages.
let seq = dna"ATGCGATCGATCGATCG"
# Slice: [start..end)
let first_five = seq[0..5]
print(first_five) # ATGCG
# From position 5 to the end
let rest = seq[5..]
print(rest) # ATCGATCGATCG
# Last 4 bases
let tail = seq[-4..]
print(tail) # ATCG
# You can also use substr() for named clarity
let codon = substr(seq, 0, 3)
print(codon) # ATG
Step 7 — Pattern Matching on Sequences
BioLang's match expression works with sequence patterns, making it
easy to classify sequences.
let seq = dna"ATGCGATCG"
# Classify by GC content
let gc_class = match gc_content(seq) {
gc if gc > 0.6 => "high GC",
gc if gc > 0.4 => "medium GC",
_ => "low GC",
}
print(gc_class) # medium GC
# Check for a start codon
let has_start = match seq[0..3] {
dna"ATG" => true,
_ => false,
}
print(f"Starts with ATG: {has_start}") # true
Step 8 — Putting It All Together
Let us write a small analysis script that takes a DNA sequence, computes several properties, and prints a summary report.
# analysis.bio — basic sequence analysis report
let seq = dna"ATGGCTAGCAAATTTCCCGGGATCGATCGATCGATGA"
# Compute properties using pipes
let gc = seq |> gc_content()
let length = seq |> len()
let protein = seq |> transcribe() |> translate()
# Find all ATG positions (potential start codons)
let starts = find_motif(seq, dna"ATG")
# Print report
print("=== Sequence Analysis Report ===")
print(f"Length: {length} bp")
print(f"GC content: {round(gc * 100.0, 2)}%")
print(f"Protein: {protein}")
print(f"Protein len: {len(protein)} aa")
print(f"ATG sites: {starts}")
print("")
# Base composition bar chart
let counts = base_counts(seq)
for base in ["A", "T", "G", "C"] {
let count = counts[base]
let bar = repeat("#", count)
print(f"{base}: {bar} ({count})")
}
Run it:
bl run analysis.bio
Expected output:
=== Sequence Analysis Report ===
Length: 37 bp
GC content: 48.65%
Protein: MASKFPGIDRSM*
Protein len: 13 aa
ATG sites: [0]
A: ########## (10)
T: ######### (9)
G: ########## (10)
C: ######## (8)
Step 9 — Reassigning Variables
Variables declared with let can be reassigned freely.
let seq = dna"ATG"
print(seq) # ATG
seq = concat(seq, dna"CCC")
print(seq) # ATGCCC
seq = concat(seq, dna"GGG")
print(seq) # ATGCCCGGG
# Build a sequence in a loop
let result = dna""
for codon in [dna"ATG", dna"GCT", dna"TAA"] {
result = concat(result, codon)
}
print(result) # ATGGCTTAA
Step 10 — Writing Functions
You can define reusable functions with the fn keyword.
# Define a function to classify GC content
fn classify_gc(seq: Dna) -> String {
let gc = gc_content(seq)
match gc {
g if g >= 0.6 => "high",
g if g >= 0.4 => "medium",
_ => "low",
}
}
# Define a function to summarize a sequence
fn summarize(seq: Dna) -> Map {
{
length: len(seq),
gc_content: gc_content(seq),
gc_class: classify_gc(seq),
has_start: seq[0..3] == dna"ATG",
}
}
# Use the functions
let sequences = [
dna"ATGCCCGGGAAATTT",
dna"GGCGCGCGCGCGCGC",
dna"AAATTTAAATTTAAAT",
]
for seq in sequences {
let info = summarize(seq)
print(f"{seq[0..6]}... => GC: {info.gc_class}, len: {info.length}")
}
Next Steps
Now that you are comfortable with sequences and basic operations, move on to the FASTQ QC Pipeline tutorial to learn how to read files and build a real analysis pipeline.