Compiler Internals

BioLang uses a multi-stage execution pipeline. By default, code runs through a tree-walking interpreter for maximum flexibility and debuggability. For performance-critical workloads, BioLang includes a bytecode compiler and an experimental Cranelift JIT that can deliver significant speedups on numeric and k-mer-heavy computations.

Execution Modes

Mode	Flag	Description	Use Case
Interpreter	(default)	Tree-walking AST interpreter	Development, debugging, REPL
Bytecode	--compile	Compile to bytecode, then execute on VM	Production scripts, moderate speedup
JIT	--jit	Cranelift JIT compilation to native code	Hot loops, numeric-heavy workloads

# Default: tree-walking interpreter
bl run analysis.bl

# Bytecode compilation
bl run --compile analysis.bl

# JIT compilation (experimental)
bl run --jit analysis.bl

Compilation Pipeline

The compilation pipeline processes source code through several stages:

Source (.bl)
  │
  ▼
┌──────────┐   Tokens    ┌──────────┐    AST     ┌──────────────┐
│  Lexer   │ ──────────▶ │  Parser  │ ─────────▶ │  Type Check  │
│ bl-lexer │             │ bl-parser│            │  (optional)  │
└──────────┘             └──────────┘            └──────────────┘
                                                       │
                                    ┌──────────────────┼──────────────────┐
                                    ▼                  ▼                  ▼
                             ┌────────────┐    ┌────────────┐    ┌────────────┐
                             │ Interpreter│    │  Bytecode  │    │  Cranelift │
                             │ bl-runtime │    │ bl-compiler│    │   bl-jit   │
                             └────────────┘    └────────────┘    └────────────┘
                                    │                  │                  │
                                    ▼                  ▼                  ▼
                                 Result            Bytecode          Native Code
                                                      │                  │
                                                      ▼                  ▼
                                                ┌──────────┐    ┌────────────┐
                                                │    VM    │    │  Execute   │
                                                │ bl-compiler   │  directly  │
                                                └──────────┘    └────────────┘

Bytecode Compiler (bl-compiler)

The bytecode compiler translates the AST into a compact instruction set that runs on BioLang's stack-based virtual machine. This eliminates the overhead of AST traversal and provides 2-5x speedup for most workloads:

# Compile and run
bl run --compile analysis.bl

# Compile to bytecode file (for inspection)
bl compile analysis.bl -o analysis.blc

# Inspect bytecode
bl compile --disassemble analysis.bl

Bytecode Instruction Set

The VM uses a stack-based instruction set. Key instruction categories:

# Example disassembly
bl compile --disassemble example.bl

0000  CONST       0        # Push constant "ATCGATCG"
0002  MAKE_DNA              # Create DNA value
0003  STORE       'seq'     # Store in variable 'seq'
0005  LOAD        'seq'     # Push seq onto stack
0007  CALL_BUILTIN gc_content 0  # Call gc_content()
0010  STORE       'gc'      # Store result
0012  LOAD        'gc'      # Push gc
0014  CALL_BUILTIN print 1  # Call print(gc)

Category	Instructions	Description
Stack	CONST, POP, DUP, SWAP	Stack manipulation
Variables	LOAD, STORE, LOAD_GLOBAL	Variable access
Arithmetic	ADD, SUB, MUL, DIV, MOD, NEG	Numeric operations
Comparison	EQ, NE, LT, GT, LE, GE	Comparison operators
Logic	AND, OR, NOT	Boolean logic
Control	JMP, JMP_IF, JMP_IFNOT, LOOP	Control flow
Functions	CALL, CALL_BUILTIN, RETURN, CLOSURE	Function calls
Bio	MAKE_DNA, MAKE_RNA, MAKE_PROTEIN, PIPE	Bio type construction
Data	MAKE_LIST, MAKE_MAP, MAKE_TABLE, INDEX	Data structure ops

Cranelift JIT (bl-jit)

The JIT compiler uses Cranelift to generate native machine code at runtime. It is feature-gated behind --features jit at build time and activated with the --jit flag at runtime. The JIT targets hot loops and numeric functions, falling back to the interpreter for dynamic operations:

# Run with JIT enabled
bl run --jit kmer_analysis.bl

# JIT with verbose output (shows what gets compiled)
bl run --jit --verbose kmer_analysis.bl

What Gets JIT-Compiled

The JIT identifies and compiles these patterns:

Numeric loops and arithmetic expressions
K-mer encoding and counting inner loops
Sequence iteration and character comparisons
GC content computation
Edit distance and Hamming distance calculations
Array/list map and filter with simple closures

Operations that remain interpreted (too dynamic for efficient JIT):

API calls (network I/O)
File I/O operations
Table operations (dplyr-style verbs)
String interpolation
Dynamic dispatch through plugins

Optimization Levels

Both the bytecode compiler and JIT support optimization levels:

# No optimization (fastest compile, slowest run)
bl run --compile -O0 analysis.bl

# Default optimization
bl run --compile -O1 analysis.bl

# Aggressive optimization (slowest compile, fastest run)
bl run --compile -O2 analysis.bl

Optimization passes include:

O0: No optimization. Direct AST-to-bytecode translation.
O1: Constant folding, dead code elimination, common subexpression elimination.
O2: All of O1 plus loop unrolling, function inlining, and escape analysis for heap allocation avoidance.

Benchmarks

Performance comparison across execution modes on representative bioinformatics workloads (lower is better):

Benchmark	Interpreter	Bytecode -O1	JIT -O2	Speedup (JIT)
GC content (1M bases)	45ms	18ms	3ms	15x
K-mer count (21-mer, 1M bases)	320ms	120ms	28ms	11x
Edit distance (10K pairs)	890ms	340ms	95ms	9.4x
FASTQ filter (1M reads)	2.1s	1.3s	0.8s	2.6x
Table groupby+summarize (100K rows)	180ms	95ms	90ms	2.0x
API calls (50 NCBI queries)	12.5s	12.4s	12.4s	1.0x (I/O bound)

Key takeaway: the JIT provides the largest speedups for CPU-bound numerical operations (k-mer counting, distance computation). For I/O-bound workloads (file parsing, API calls), the execution mode makes little difference since the bottleneck is not computation.

Building with JIT Support

The JIT is an optional feature that requires Cranelift at build time:

# Build with JIT support
cargo build --release --features jit

# Build without JIT (default, smaller binary)
cargo build --release

# Check if your bl binary has JIT support
bl --version
# BioLang v0.1.0 (with JIT)