Beginner ~20 min

Tutorial: Literate Notebooks

In this tutorial you'll create a BioLang notebook (.bln) that documents a complete GC content analysis. You'll learn the notebook format, cell directives, HTML export, and Jupyter conversion.

Prerequisites: BioLang installed (bl --version). Familiarity with basic BioLang syntax from Hello Genomics.

What you'll build

A reproducible analysis notebook that loads FASTA sequences, computes GC content statistics, identifies outlier contigs, and exports a shareable HTML report. The final notebook will use cell directives to keep the narrative clean.

Run this tutorial: Download notebooks.bl and run it with bl run examples/tutorials/notebooks.bl

Step 1: Create a basic notebook

Create a file called gc_analysis.bln. A .bln file is just text — Markdown prose interleaved with BioLang code blocks.

Start with a title and a single code block:

# GC Content Analysis

This notebook analyzes per-contig GC content from a FASTA file.

```biolang
let seq = dna"ATCGATCGATCG"
print(f"GC content: {gc_content(seq)}")
```

Run it:

bl notebook gc_analysis.bln

You'll see the heading rendered in bold, the prose text, then the output of the code block. The terminal uses ANSI colors for a clean reading experience.

Step 2: Add multiple cells

Notebooks carry state between code blocks. Variables defined in one cell are available in all later cells. Expand the notebook:

# GC Content Analysis

This notebook analyzes per-contig GC content from a FASTA file.

## Load Data

Read the sample FASTA file shipped with BioLang.

```biolang
let seqs = read_fasta("examples/sample-data/contigs.fa")
print(f"Loaded {len(seqs)} sequences")
```

## Compute Statistics

Calculate GC content for each sequence and summarize.

```biolang
let gc_values = seqs |> map(|s| gc_content(s.seq))
let mu = mean(gc_values)
let sigma = stdev(gc_values)
print(f"Mean GC: {mu:.3f} +/- {sigma:.4f}")
print(f"Range: {min(gc_values):.3f} - {max(gc_values):.3f}")
```

## Find Outliers

Flag contigs with GC more than 2 standard deviations from the mean.
These may indicate contamination or horizontal gene transfer.

```biolang
let outliers = seqs
  |> filter(|s| abs(gc_content(s.seq) - mu) > 2.0 * sigma)
print(f"Found {len(outliers)} outlier contigs")
outliers |> each(|s| print(f"  {s.id}: GC={gc_content(s.seq):.3f}"))
```

Run it again. Each section prints its heading, prose, and code output in sequence — a narrative that tells the story of your analysis.

Step 3: Use cell directives

Cell directives are special comments at the top of a code block that control how it behaves. Add them to clean up the notebook.

# @hide — silent setup

Configuration code that readers don't need to see. It runs but doesn't appear in the output:

## Setup

```biolang
# @hide
let threshold = 2.0
let min_length = 100
```

# @echo — show your work

For key analysis steps, show both the code and its output:

## Analysis

```biolang
# @echo
let gc_values = seqs |> map(|s| gc_content(s.seq))
let mu = mean(gc_values)
print(f"Mean GC: {mu:.3f}")
```

This prints the code first (dimmed), then executes it and shows the output. Readers see exactly what ran.

# @skip — draft cells

Temporarily disable a cell without deleting it. Useful during development:

```biolang
# @skip
# TODO: add k-mer analysis once data is ready
let kmers = seqs |> map(|s| kmer_count(s.seq, 21))
```

# @hide-output — quiet execution

Show the code but suppress printed output. Good for assignments that produce verbose intermediate results:

```biolang
# @hide-output
let gc_table = seqs
  |> map(|s| {id: s.id, length: seq_len(s.seq), gc: gc_content(s.seq)})
  |> table()
```

Step 4: Export to HTML

Generate a self-contained HTML report with syntax highlighting:

bl notebook gc_analysis.bln --export html > gc_report.html

Open gc_report.html in a browser. You'll see:

  • Rendered Markdown headings and prose
  • Syntax-highlighted code blocks (keywords in purple, strings in green, pipes in cyan)
  • Code output in a separate block
  • A dark-themed design with no external dependencies

The HTML is a single file — share it via email, put it on a web server, or include it in a lab notebook. No BioLang installation needed to view it.

Step 5: Jupyter interop

If you have existing Jupyter notebooks, convert them to .bln:

# Import: .ipynb to .bln
bl notebook experiment.ipynb --from-ipynb > experiment.bln

Markdown cells become prose sections. Code cells become fenced ```biolang blocks. Outputs are discarded (they'll regenerate when you run the notebook).

Going the other way:

# Export: .bln to .ipynb
bl notebook gc_analysis.bln --to-ipynb > gc_analysis.ipynb

The resulting .ipynb uses nbformat v4 and opens in JupyterLab, VS Code, or any notebook viewer. Code cells are tagged with "language": "biolang".

Step 6: Dash delimiters (alternative syntax)

Instead of fenced code blocks, you can use --- on its own line to delimit code. This is the original BioLang notebook format:

## Load Data
---
let seqs = read_fasta("contigs.fa")
print(f"Loaded {len(seqs)} sequences")
---
## Results
The output above shows the sequence count.

Both styles can be mixed in the same file. Fenced blocks are recommended for new notebooks since they're compatible with standard Markdown renderers.

Complete notebook

Here's the final version with all features combined:

# GC Content Analysis

A reproducible analysis of per-contig GC content.
Outlier contigs may indicate contamination or HGT events.

## Setup

```biolang
# @hide
let threshold = 2.0
```

## Load Data

Read the FASTA file and report basic counts.

```biolang
let seqs = read_fasta("examples/sample-data/contigs.fa")
print(f"Loaded {len(seqs)} sequences")
```

## Compute GC Statistics

```biolang
# @echo
let gc_values = seqs |> map(|s| gc_content(s.seq))
let mu = mean(gc_values)
let sigma = stdev(gc_values)
print(f"Mean GC: {mu:.3f} +/- {sigma:.4f}")
```

## Build Results Table

```biolang
# @hide-output
let gc_table = seqs
  |> map(|s| {id: s.id, length: seq_len(s.seq), gc: gc_content(s.seq)})
  |> table()
```

## Identify Outliers

Contigs more than **2 standard deviations** from the mean
may represent contamination or horizontal gene transfer.

```biolang
let outliers = gc_table
  |> filter(|row| abs(row.gc - mu) > threshold * sigma)
  |> arrange("-gc")

print(f"Found {len(outliers)} outlier contigs:")
outliers |> each(|row| print(f"  {row.id}: GC={row.gc:.3f}, length={row.length}"))
```

## Summary

> Review flagged contigs before downstream assembly.
> Consider BLAST against nt database to confirm contamination.

Run, export, or convert:

# Terminal
bl notebook gc_analysis.bln

# HTML report
bl notebook gc_analysis.bln --export html > report.html

# Jupyter
bl notebook gc_analysis.bln --to-ipynb > gc_analysis.ipynb

Tips and best practices

Practice Why
Use # @hide for setup Keeps the narrative focused on the science, not boilerplate
Use # @echo for key steps Readers see exactly what code produced each result
Use # @skip during development Disable expensive cells without deleting them
Prefer fenced blocks over --- Standard Markdown — GitHub, editors, and viewers render them correctly
One concept per cell Easier to understand, debug, and reorder
Version control .bln files They're plain text — git diff works perfectly
Export HTML for sharing Self-contained file, no BioLang needed to view

What's next