Compiler Internals

BioLang uses a multi-stage execution pipeline. By default, code runs through a tree-walking interpreter for maximum flexibility and debuggability. For performance-critical workloads, BioLang includes a bytecode compiler and an experimental Cranelift JIT that can deliver significant speedups on numeric and k-mer-heavy computations.

Execution Modes

Mode Flag Description Use Case
Interpreter (default) Tree-walking AST interpreter Development, debugging, REPL
Bytecode --compile Compile to bytecode, then execute on VM Production scripts, moderate speedup
JIT --jit Cranelift JIT compilation to native code Hot loops, numeric-heavy workloads
# Default: tree-walking interpreter
bl run analysis.bl

# Bytecode compilation
bl run --compile analysis.bl

# JIT compilation (experimental)
bl run --jit analysis.bl

Compilation Pipeline

The compilation pipeline processes source code through several stages:

Source (.bl)
  │
  ▼
┌──────────┐   Tokens    ┌──────────┐    AST     ┌──────────────┐
│  Lexer   │ ──────────▶ │  Parser  │ ─────────▶ │  Type Check  │
│ bl-lexer │             │ bl-parser│            │  (optional)  │
└──────────┘             └──────────┘            └──────────────┘
                                                       │
                                    ┌──────────────────┼──────────────────┐
                                    ▼                  ▼                  ▼
                             ┌────────────┐    ┌────────────┐    ┌────────────┐
                             │ Interpreter│    │  Bytecode  │    │  Cranelift │
                             │ bl-runtime │    │ bl-compiler│    │   bl-jit   │
                             └────────────┘    └────────────┘    └────────────┘
                                    │                  │                  │
                                    ▼                  ▼                  ▼
                                 Result            Bytecode          Native Code
                                                      │                  │
                                                      ▼                  ▼
                                                ┌──────────┐    ┌────────────┐
                                                │    VM    │    │  Execute   │
                                                │ bl-compiler   │  directly  │
                                                └──────────┘    └────────────┘

Bytecode Compiler (bl-compiler)

The bytecode compiler translates the AST into a compact instruction set that runs on BioLang's stack-based virtual machine. This eliminates the overhead of AST traversal and provides 2-5x speedup for most workloads:

# Compile and run
bl run --compile analysis.bl

# Compile to bytecode file (for inspection)
bl compile analysis.bl -o analysis.blc

# Inspect bytecode
bl compile --disassemble analysis.bl

Bytecode Instruction Set

The VM uses a stack-based instruction set. Key instruction categories:

# Example disassembly
bl compile --disassemble example.bl

0000  CONST       0        # Push constant "ATCGATCG"
0002  MAKE_DNA              # Create DNA value
0003  STORE       'seq'     # Store in variable 'seq'
0005  LOAD        'seq'     # Push seq onto stack
0007  CALL_BUILTIN gc_content 0  # Call gc_content()
0010  STORE       'gc'      # Store result
0012  LOAD        'gc'      # Push gc
0014  CALL_BUILTIN print 1  # Call print(gc)
Category Instructions Description
StackCONST, POP, DUP, SWAPStack manipulation
VariablesLOAD, STORE, LOAD_GLOBALVariable access
ArithmeticADD, SUB, MUL, DIV, MOD, NEGNumeric operations
ComparisonEQ, NE, LT, GT, LE, GEComparison operators
LogicAND, OR, NOTBoolean logic
ControlJMP, JMP_IF, JMP_IFNOT, LOOPControl flow
FunctionsCALL, CALL_BUILTIN, RETURN, CLOSUREFunction calls
BioMAKE_DNA, MAKE_RNA, MAKE_PROTEIN, PIPEBio type construction
DataMAKE_LIST, MAKE_MAP, MAKE_TABLE, INDEXData structure ops

Cranelift JIT (bl-jit)

The JIT compiler uses Cranelift to generate native machine code at runtime. It is feature-gated behind --features jit at build time and activated with the --jit flag at runtime. The JIT targets hot loops and numeric functions, falling back to the interpreter for dynamic operations:

# Run with JIT enabled
bl run --jit kmer_analysis.bl

# JIT with verbose output (shows what gets compiled)
bl run --jit --verbose kmer_analysis.bl

What Gets JIT-Compiled

The JIT identifies and compiles these patterns:

  • Numeric loops and arithmetic expressions
  • K-mer encoding and counting inner loops
  • Sequence iteration and character comparisons
  • GC content computation
  • Edit distance and Hamming distance calculations
  • Array/list map and filter with simple closures

Operations that remain interpreted (too dynamic for efficient JIT):

  • API calls (network I/O)
  • File I/O operations
  • Table operations (dplyr-style verbs)
  • String interpolation
  • Dynamic dispatch through plugins

Optimization Levels

Both the bytecode compiler and JIT support optimization levels:

# No optimization (fastest compile, slowest run)
bl run --compile -O0 analysis.bl

# Default optimization
bl run --compile -O1 analysis.bl

# Aggressive optimization (slowest compile, fastest run)
bl run --compile -O2 analysis.bl

Optimization passes include:

  • O0: No optimization. Direct AST-to-bytecode translation.
  • O1: Constant folding, dead code elimination, common subexpression elimination.
  • O2: All of O1 plus loop unrolling, function inlining, and escape analysis for heap allocation avoidance.

Benchmarks

Performance comparison across execution modes on representative bioinformatics workloads (lower is better):

Benchmark Interpreter Bytecode -O1 JIT -O2 Speedup (JIT)
GC content (1M bases) 45ms 18ms 3ms 15x
K-mer count (21-mer, 1M bases) 320ms 120ms 28ms 11x
Edit distance (10K pairs) 890ms 340ms 95ms 9.4x
FASTQ filter (1M reads) 2.1s 1.3s 0.8s 2.6x
Table groupby+summarize (100K rows) 180ms 95ms 90ms 2.0x
API calls (50 NCBI queries) 12.5s 12.4s 12.4s 1.0x (I/O bound)

Key takeaway: the JIT provides the largest speedups for CPU-bound numerical operations (k-mer counting, distance computation). For I/O-bound workloads (file parsing, API calls), the execution mode makes little difference since the bottleneck is not computation.

Building with JIT Support

The JIT is an optional feature that requires Cranelift at build time:

# Build with JIT support
cargo build --release --features jit

# Build without JIT (default, smaller binary)
cargo build --release

# Check if your bl binary has JIT support
bl --version
# BioLang v0.1.0 (with JIT)