Tutorials

Step-by-step guides to help you learn BioLang from the ground up. Each tutorial builds a real project, so you write meaningful code from the very first lesson. Start with the beginner track and work your way through intermediate and advanced topics.

Sample data included: BioLang ships with sample FASTA, FASTQ, VCF, BED, GFF, CSV, TSV, and SAM files in examples/sample-data/. Run bl run examples/quickstart.bl to verify your setup. Tutorials reference these files where applicable.

Beginner

Beginner ~15 min

Hello Genomics

Your first BioLang program. Create DNA/RNA sequences, compute GC content, reverse complement, transcribe, and translate.

Beginner ~25 min

FASTQ QC Pipeline

Build a quality-control pipeline for FASTQ files. Filter reads, compute statistics, count k-mers, and generate a report.

Beginner ~20 min

Working with Tables

Master BioLang's dataframe type. Read CSV, select, filter, group, summarize, join, pivot, and write results.

Beginner ~20 min

Literate Notebooks

Write reproducible analyses in .bln notebooks. Combine Markdown prose with code, use cell directives, export to HTML, and convert to/from Jupyter.

Intermediate

Intermediate ~30 min

Variant Analysis

Read a VCF file, filter variants by quality, annotate with gene information, and summarize variants per chromosome.

Intermediate ~35 min

RNA-seq Differential Expression

Load a count matrix, normalize, find differentially expressed genes, create volcano plots, and run pathway enrichment.

Intermediate ~30 min

Protein Analysis

Fetch sequences from UniProt, analyze properties, search for motifs, explore PDB structures, and visualize results.

Intermediate ~25 min

Streaming Large Files

Process multi-gigabyte files in constant memory using lazy streams, parallel mapping, and chunked output.

Intermediate ~30 min

Querying Databases

Connect to NCBI, Ensembl, UniProt, and KEGG. Run cross-database queries to enrich your analysis.

Intermediate ~30 min

Statistical Analysis

Run t-tests, ANOVA, correlation, PCA, and clustering on experimental data with built-in stats functions.

Intermediate ~25 min

Visualization

Create Manhattan plots, volcano plots, heatmaps, and more. ASCII output for terminal, SVG for publication-quality figures.

Intermediate ~30 min

Knowledge Graphs

Build protein interaction networks, find shortest paths, extract subgraphs, and combine with STRING API data.

Intermediate ~30 min

Enrichment Analysis

Run ORA and GSEA on gene lists. Load GMT files, apply BH correction, and combine results with knowledge graphs.

Intermediate ~20 min

LLM Chat

Use chat() and chat_code() with Anthropic, OpenAI, or Ollama. Pass data context, generate code, and interpret results with AI.

Advanced

Advanced ~40 min

Multi-species Comparative Genomics

Fetch orthologs, align sequences across species, build phylogenetic trees, and perform synteny analysis.

Advanced ~45 min

Building Custom Plugins

Write a Python plugin from scratch. Learn the plugin.json format, testing strategies, and how to distribute your work.

Suggested Learning Path

1 Hello Genomics — Get comfortable with DNA/RNA literals and core operations.
2 FASTQ QC Pipeline — Learn the pipe operator and build your first analysis pipeline.
3 Tables — Master the dataframe for structured data analysis.
4 Variant Analysis + RNA-seq DE — Apply your skills to real-world genomics workflows.
5 Streaming + Databases — Scale up to production-sized datasets.
6 Multi-species + Plugins — Advanced workflows and extending BioLang.