Beginner ~15 minutes

Hello Genomics

Your first steps with BioLang. In this tutorial you will create DNA and RNA sequences, perform basic operations like GC content and reverse complement, and build a small sequence-analysis script.

What you will learn

  • Creating DNA and RNA sequences with the dna"..." and rna"..." literals
  • Computing GC content, length, and base composition
  • Reverse complement, transcription, and translation
  • Using the pipe operator |> to chain operations
  • Variables with let
Run this tutorial: Download hello-genomics.bl and run it with bl run examples/tutorials/hello-genomics.bl

Prerequisites

Make sure BioLang is installed. Open a terminal and check:

bl --version
# bl 0.1.0

If you do not have it yet, follow the Installation Guide.

Step 1 — Create Your First DNA Sequence

BioLang has first-class support for biological sequences. The dna"..." literal creates a validated DNA sequence at parse time. Any invalid bases will cause a compile error, not a runtime error.

# hello.bio — your first BioLang program

let seq = dna"ATGCGATCGATCGATCG"
print(seq)               # ATGCGATCGATCGATCG
print(len(seq))          # 17
print(type(seq))         # DNA

Save the file as hello.bio and run it:

bl run hello.bio

The dna"..." literal accepts uppercase A, T, G, C and the IUPAC ambiguity codes (N, R, Y, etc.). Lowercase letters are automatically uppercased.

Step 2 — Base Composition and GC Content

BioLang provides free functions for composition analysis. No imports needed.

let seq = dna"ATGCGATCGATCGATCG"

# Base counts
let counts = base_counts(seq)
print(counts)
# {A: 4, T: 4, G: 5, C: 4}

# GC content as a fraction 0..1
let gc = gc_content(seq)
print(gc)          # 0.5294
print(gc * 100.0)  # 52.94

# AT content is the complement
let at = 1.0 - gc
print(f"AT content: {at * 100.0}%")  # AT content: 47.06%

The gc_content() function returns a Float. BioLang uses f-string interpolation with f"...{expr}..." syntax.

Step 3 — Reverse Complement

The reverse complement is fundamental in molecular biology. BioLang makes it a single function call.

let forward = dna"ATGCGATCG"
let revcomp = reverse_complement(forward)

print(forward)  # ATGCGATCG
print(revcomp)  # CGATCGCAT

# Verify: reverse complement of reverse complement is the original
assert(reverse_complement(revcomp) == forward)

The reverse_complement() function handles IUPAC ambiguity codes correctly: R becomes Y, S stays S, and so on.

Step 4 — Transcription: DNA to RNA

Transcription converts a DNA sequence to its RNA equivalent, replacing T with U.

let dna_seq = dna"ATGCGATCG"

# Transcribe DNA -> RNA
let rna_seq = transcribe(dna_seq)
print(rna_seq)          # AUGCGAUCG
print(type(rna_seq))    # RNA

# You can also create RNA directly
let direct_rna = rna"AUGCGAUCG"
assert(rna_seq == direct_rna)

Notice how the return type changes from Dna to Rna. BioLang's type system tracks molecule types so you cannot accidentally mix them in operations that only accept one kind.

Step 5 — Translation: RNA to Protein

Translation converts an RNA sequence into a protein sequence using the standard genetic code. BioLang reads codons (three bases at a time) and maps each to an amino acid.

# A classic ORF: ATG ... stop
let gene = dna"ATGGCTAGCAAATGA"

# Transcribe then translate
let protein = gene
  |> transcribe()
  |> translate()

print(protein)          # MASK*
print(type(protein))    # Protein

Here we introduced the pipe operator |>. It takes the result of the left side and passes it as the first argument of the right-side function call. You can chain as many pipes as you like.

Step 6 — Subsequences and Slicing

Extract portions of a sequence with slice syntax. Indices are zero-based and the end is exclusive, matching most programming languages.

let seq = dna"ATGCGATCGATCGATCG"

# Slice: [start..end)
let first_five = seq[0..5]
print(first_five)  # ATGCG

# From position 5 to the end
let rest = seq[5..]
print(rest)        # ATCGATCGATCG

# Last 4 bases
let tail = seq[-4..]
print(tail)        # ATCG

# You can also use substr() for named clarity
let codon = substr(seq, 0, 3)
print(codon)       # ATG

Step 7 — Pattern Matching on Sequences

BioLang's match expression works with sequence patterns, making it easy to classify sequences.

let seq = dna"ATGCGATCG"

# Classify by GC content
let gc_class = match gc_content(seq) {
  gc if gc > 0.6  => "high GC",
  gc if gc > 0.4  => "medium GC",
  _ => "low GC",
}
print(gc_class)  # medium GC

# Check for a start codon
let has_start = match seq[0..3] {
  dna"ATG" => true,
  _        => false,
}
print(f"Starts with ATG: {has_start}")  # true

Step 8 — Putting It All Together

Let us write a small analysis script that takes a DNA sequence, computes several properties, and prints a summary report.

# analysis.bio — basic sequence analysis report

let seq = dna"ATGGCTAGCAAATTTCCCGGGATCGATCGATCGATGA"

# Compute properties using pipes
let gc      = seq |> gc_content()
let length  = seq |> len()
let protein = seq |> transcribe() |> translate()

# Find all ATG positions (potential start codons)
let starts = find_motif(seq, dna"ATG")

# Print report
print("=== Sequence Analysis Report ===")
print(f"Length:      {length} bp")
print(f"GC content:  {round(gc * 100.0, 2)}%")
print(f"Protein:     {protein}")
print(f"Protein len: {len(protein)} aa")
print(f"ATG sites:   {starts}")
print("")

# Base composition bar chart
let counts = base_counts(seq)
for base in ["A", "T", "G", "C"] {
  let count = counts[base]
  let bar = repeat("#", count)
  print(f"{base}: {bar} ({count})")
}

Run it:

bl run analysis.bio

Expected output:

=== Sequence Analysis Report ===
Length:      37 bp
GC content:  48.65%
Protein:     MASKFPGIDRSM*
Protein len: 13 aa
ATG sites:   [0]

A: ########## (10)
T: ######### (9)
G: ########## (10)
C: ######## (8)

Step 9 — Reassigning Variables

Variables declared with let can be reassigned freely.

let seq = dna"ATG"
print(seq)  # ATG

seq = concat(seq, dna"CCC")
print(seq)  # ATGCCC

seq = concat(seq, dna"GGG")
print(seq)  # ATGCCCGGG

# Build a sequence in a loop
let result = dna""
for codon in [dna"ATG", dna"GCT", dna"TAA"] {
  result = concat(result, codon)
}
print(result)  # ATGGCTTAA

Step 10 — Writing Functions

You can define reusable functions with the fn keyword.

# Define a function to classify GC content
fn classify_gc(seq: Dna) -> String {
  let gc = gc_content(seq)
  match gc {
    g if g >= 0.6  => "high",
    g if g >= 0.4  => "medium",
    _ => "low",
  }
}

# Define a function to summarize a sequence
fn summarize(seq: Dna) -> Map {
  {
    length:     len(seq),
    gc_content: gc_content(seq),
    gc_class:   classify_gc(seq),
    has_start:  seq[0..3] == dna"ATG",
  }
}

# Use the functions
let sequences = [
  dna"ATGCCCGGGAAATTT",
  dna"GGCGCGCGCGCGCGC",
  dna"AAATTTAAATTTAAAT",
]

for seq in sequences {
  let info = summarize(seq)
  print(f"{seq[0..6]}... => GC: {info.gc_class}, len: {info.length}")
}

Next Steps

Now that you are comfortable with sequences and basic operations, move on to the FASTQ QC Pipeline tutorial to learn how to read files and build a real analysis pipeline.