Beginner ~15 minutes

Hello Genomics

Your first steps with BioLang. In this tutorial you will create DNA and RNA sequences, perform basic operations like GC content and reverse complement, and build a small sequence-analysis script.

What you will learn

Creating DNA and RNA sequences with the dna"..." and rna"..." literals
Computing GC content, length, and base composition
Reverse complement, transcription, and translation
Using the pipe operator |> to chain operations
Variables with let

Run this tutorial: Download hello-genomics.bl and run it with bl run examples/tutorials/hello-genomics.bl

Prerequisites

Make sure BioLang is installed. Open a terminal and check:

bl --version
# bl 0.1.0

If you do not have it yet, follow the Installation Guide.

Step 1 — Create Your First DNA Sequence

BioLang has first-class support for biological sequences. The dna"..." literal creates a validated DNA sequence at parse time. Any invalid bases will cause a compile error, not a runtime error.

# hello.bio — your first BioLang program

let seq = dna"ATGCGATCGATCGATCG"
print(seq)               # ATGCGATCGATCGATCG
print(len(seq))          # 17
print(type(seq))         # DNA

Save the file as hello.bio and run it:

bl run hello.bio

The dna"..." literal accepts uppercase A, T, G, C and the IUPAC ambiguity codes (N, R, Y, etc.). Lowercase letters are automatically uppercased.

Step 2 — Base Composition and GC Content

BioLang provides free functions for composition analysis. No imports needed.

let seq = dna"ATGCGATCGATCGATCG"

# Base counts
let counts = base_counts(seq)
print(counts)
# {A: 4, T: 4, G: 5, C: 4}

# GC content as a fraction 0..1
let gc = gc_content(seq)
print(gc)          # 0.5294
print(gc * 100.0)  # 52.94

# AT content is the complement
let at = 1.0 - gc
print(f"AT content: {at * 100.0}%")  # AT content: 47.06%

The gc_content() function returns a Float. BioLang uses f-string interpolation with f"...{expr}..." syntax.

Step 3 — Reverse Complement

The reverse complement is fundamental in molecular biology. BioLang makes it a single function call.

let forward = dna"ATGCGATCG"
let revcomp = reverse_complement(forward)

print(forward)  # ATGCGATCG
print(revcomp)  # CGATCGCAT

# Verify: reverse complement of reverse complement is the original
assert(reverse_complement(revcomp) == forward)

The reverse_complement() function handles IUPAC ambiguity codes correctly: R becomes Y, S stays S, and so on.

Step 4 — Transcription: DNA to RNA

Transcription converts a DNA sequence to its RNA equivalent, replacing T with U.

let dna_seq = dna"ATGCGATCG"

# Transcribe DNA -> RNA
let rna_seq = transcribe(dna_seq)
print(rna_seq)          # AUGCGAUCG
print(type(rna_seq))    # RNA

# You can also create RNA directly
let direct_rna = rna"AUGCGAUCG"
assert(rna_seq == direct_rna)

Notice how the return type changes from Dna to Rna. BioLang's type system tracks molecule types so you cannot accidentally mix them in operations that only accept one kind.

Step 5 — Translation: RNA to Protein

Translation converts an RNA sequence into a protein sequence using the standard genetic code. BioLang reads codons (three bases at a time) and maps each to an amino acid.

# A classic ORF: ATG ... stop
let gene = dna"ATGGCTAGCAAATGA"

# Transcribe then translate
let protein = gene
  |> transcribe()
  |> translate()

print(protein)          # MASK*
print(type(protein))    # Protein

Here we introduced the pipe operator |>. It takes the result of the left side and passes it as the first argument of the right-side function call. You can chain as many pipes as you like.

Step 6 — Subsequences and Slicing

Extract portions of a sequence with slice syntax. Indices are zero-based and the end is exclusive, matching most programming languages.

let seq = dna"ATGCGATCGATCGATCG"

# slice(sequence, start, end)
let first_five = slice(seq, 0, 5)
println(first_five)  # ATGCG

# From position 5 to the end
let rest = slice(seq, 5, seq_len(seq))
println(rest)        # ATCGATCGATCG

# First codon
let codon = slice(seq, 0, 3)
println(codon)       # ATG

println(f"Length: {seq_len(seq)}")

Step 7 — Pattern Matching on Sequences

BioLang's match expression works with sequence patterns, making it easy to classify sequences.

let seq = dna"ATGCGATCG"

# Classify by GC content
let gc = gc_content(seq)
let gc_class = if gc > 0.6 then "high GC"
  else if gc > 0.4 then "medium GC"
  else "low GC"
println(gc_class)  # medium GC

# Check for a start codon
let first_three = slice(seq, 0, 3)
let has_start = first_three == dna"ATG"
println(f"Starts with ATG: {has_start}")  # true

Step 8 — Putting It All Together

Let us write a small analysis script that takes a DNA sequence, computes several properties, and prints a summary report.

# analysis.bio — basic sequence analysis report

let seq = dna"ATGGCTAGCAAATTTCCCGGGATCGATCGATCGATGA"

# Compute properties using pipes
let gc      = seq |> gc_content()
let length  = seq |> len()
let protein = seq |> transcribe() |> translate()

# Find all ATG positions (potential start codons)
let starts = find_motif(seq, dna"ATG")

# Print report
print("=== Sequence Analysis Report ===")
print(f"Length:      {length} bp")
print(f"GC content:  {round(gc * 100.0, 2)}%")
print(f"Protein:     {protein}")
print(f"Protein len: {len(protein)} aa")
print(f"ATG sites:   {starts}")
print("")

# Base composition bar chart
let counts = base_counts(seq)
for base in ["A", "T", "G", "C"] {
  let count = counts[base]
  let bar = repeat("#", count)
  print(f"{base}: {bar} ({count})")
}

Run it:

bl run analysis.bio

Expected output:

=== Sequence Analysis Report ===
Length:      37 bp
GC content:  48.65%
Protein:     MASKFPGIDRSM*
Protein len: 13 aa
ATG sites:   [0]

A: ########## (10)
T: ######### (9)
G: ########## (10)
C: ######## (8)

Step 9 — Reassigning Variables

Variables declared with let can be reassigned freely.

let seq = dna"ATG"
println(seq)  # ATG

seq = seq ++ dna"CCC"
println(seq)  # ATGCCC

seq = seq ++ dna"GGG"
println(seq)  # ATGCCCGGG

# Build a sequence in a loop
let result = dna"ATG"
result = result ++ dna"GCT"
result = result ++ dna"TAA"
println(result)  # ATGGCTTAA

Step 10 — Writing Functions

You can define reusable functions with the fn keyword.

# Define a function to classify GC content
fn classify_gc(seq: Dna) -> String {
  let gc = gc_content(seq)
  match gc {
    g if g >= 0.6  => "high",
    g if g >= 0.4  => "medium",
    _ => "low",
  }
}

# Define a function to summarize a sequence
fn summarize(seq: Dna) -> Map {
  {
    length:     len(seq),
    gc_content: gc_content(seq),
    gc_class:   classify_gc(seq),
    has_start:  slice(seq, 0, 3) == dna"ATG",
  }
}

# Use the functions
let sequences = [
  dna"ATGCCCGGGAAATTT",
  dna"GGCGCGCGCGCGCGC",
  dna"AAATTTAAATTTAAAT",
]

for seq in sequences {
  let info = summarize(seq)
  println(f"len={info.length}, GC={info.gc_class}, start={info.has_start}")
}

Next Steps

Now that you are comfortable with sequences and basic operations, move on to the FASTQ QC Pipeline tutorial to learn how to read files and build a real analysis pipeline.