Compiler Internals
BioLang uses a multi-stage execution pipeline. By default, code runs through a tree-walking interpreter for maximum flexibility and debuggability. For performance-critical workloads, BioLang includes a bytecode compiler and an experimental Cranelift JIT that can deliver significant speedups on numeric and k-mer-heavy computations.
Execution Modes
| Mode | Flag | Description | Use Case |
|---|---|---|---|
| Interpreter | (default) | Tree-walking AST interpreter | Development, debugging, REPL |
| Bytecode | --compile | Compile to bytecode, then execute on VM | Production scripts, moderate speedup |
| JIT | --jit | Cranelift JIT compilation to native code | Hot loops, numeric-heavy workloads |
# Default: tree-walking interpreter
bl run analysis.bl
# Bytecode compilation
bl run --compile analysis.bl
# JIT compilation (experimental)
bl run --jit analysis.bl
Compilation Pipeline
The compilation pipeline processes source code through several stages:
Source (.bl)
│
▼
┌──────────┐ Tokens ┌──────────┐ AST ┌──────────────┐
│ Lexer │ ──────────▶ │ Parser │ ─────────▶ │ Type Check │
│ bl-lexer │ │ bl-parser│ │ (optional) │
└──────────┘ └──────────┘ └──────────────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Interpreter│ │ Bytecode │ │ Cranelift │
│ bl-runtime │ │ bl-compiler│ │ bl-jit │
└────────────┘ └────────────┘ └────────────┘
│ │ │
▼ ▼ ▼
Result Bytecode Native Code
│ │
▼ ▼
┌──────────┐ ┌────────────┐
│ VM │ │ Execute │
│ bl-compiler │ directly │
└──────────┘ └────────────┘
Bytecode Compiler (bl-compiler)
The bytecode compiler translates the AST into a compact instruction set that runs on BioLang's stack-based virtual machine. This eliminates the overhead of AST traversal and provides 2-5x speedup for most workloads:
# Compile and run
bl run --compile analysis.bl
# Compile to bytecode file (for inspection)
bl compile analysis.bl -o analysis.blc
# Inspect bytecode
bl compile --disassemble analysis.bl
Bytecode Instruction Set
The VM uses a stack-based instruction set. Key instruction categories:
# Example disassembly
bl compile --disassemble example.bl
0000 CONST 0 # Push constant "ATCGATCG"
0002 MAKE_DNA # Create DNA value
0003 STORE 'seq' # Store in variable 'seq'
0005 LOAD 'seq' # Push seq onto stack
0007 CALL_BUILTIN gc_content 0 # Call gc_content()
0010 STORE 'gc' # Store result
0012 LOAD 'gc' # Push gc
0014 CALL_BUILTIN print 1 # Call print(gc)
| Category | Instructions | Description |
|---|---|---|
| Stack | CONST, POP, DUP, SWAP | Stack manipulation |
| Variables | LOAD, STORE, LOAD_GLOBAL | Variable access |
| Arithmetic | ADD, SUB, MUL, DIV, MOD, NEG | Numeric operations |
| Comparison | EQ, NE, LT, GT, LE, GE | Comparison operators |
| Logic | AND, OR, NOT | Boolean logic |
| Control | JMP, JMP_IF, JMP_IFNOT, LOOP | Control flow |
| Functions | CALL, CALL_BUILTIN, RETURN, CLOSURE | Function calls |
| Bio | MAKE_DNA, MAKE_RNA, MAKE_PROTEIN, PIPE | Bio type construction |
| Data | MAKE_LIST, MAKE_MAP, MAKE_TABLE, INDEX | Data structure ops |
Cranelift JIT (bl-jit)
The JIT compiler uses Cranelift to generate native machine code at runtime. It is
feature-gated behind --features jit at build time and activated with
the --jit flag at runtime. The JIT targets hot loops and numeric
functions, falling back to the interpreter for dynamic operations:
# Run with JIT enabled
bl run --jit kmer_analysis.bl
# JIT with verbose output (shows what gets compiled)
bl run --jit --verbose kmer_analysis.bl
What Gets JIT-Compiled
The JIT identifies and compiles these patterns:
- Numeric loops and arithmetic expressions
- K-mer encoding and counting inner loops
- Sequence iteration and character comparisons
- GC content computation
- Edit distance and Hamming distance calculations
- Array/list map and filter with simple closures
Operations that remain interpreted (too dynamic for efficient JIT):
- API calls (network I/O)
- File I/O operations
- Table operations (dplyr-style verbs)
- String interpolation
- Dynamic dispatch through plugins
Optimization Levels
Both the bytecode compiler and JIT support optimization levels:
# No optimization (fastest compile, slowest run)
bl run --compile -O0 analysis.bl
# Default optimization
bl run --compile -O1 analysis.bl
# Aggressive optimization (slowest compile, fastest run)
bl run --compile -O2 analysis.bl
Optimization passes include:
- O0: No optimization. Direct AST-to-bytecode translation.
- O1: Constant folding, dead code elimination, common subexpression elimination.
- O2: All of O1 plus loop unrolling, function inlining, and escape analysis for heap allocation avoidance.
Benchmarks
Performance comparison across execution modes on representative bioinformatics workloads (lower is better):
| Benchmark | Interpreter | Bytecode -O1 | JIT -O2 | Speedup (JIT) |
|---|---|---|---|---|
| GC content (1M bases) | 45ms | 18ms | 3ms | 15x |
| K-mer count (21-mer, 1M bases) | 320ms | 120ms | 28ms | 11x |
| Edit distance (10K pairs) | 890ms | 340ms | 95ms | 9.4x |
| FASTQ filter (1M reads) | 2.1s | 1.3s | 0.8s | 2.6x |
| Table groupby+summarize (100K rows) | 180ms | 95ms | 90ms | 2.0x |
| API calls (50 NCBI queries) | 12.5s | 12.4s | 12.4s | 1.0x (I/O bound) |
Key takeaway: the JIT provides the largest speedups for CPU-bound numerical operations (k-mer counting, distance computation). For I/O-bound workloads (file parsing, API calls), the execution mode makes little difference since the bottleneck is not computation.
Building with JIT Support
The JIT is an optional feature that requires Cranelift at build time:
# Build with JIT support
cargo build --release --features jit
# Build without JIT (default, smaller binary)
cargo build --release
# Check if your bl binary has JIT support
bl --version
# BioLang v0.1.0 (with JIT)