Troubleshooting
Common issues and their solutions. If you don't find your answer here, please open an issue.
Installation
bl: command not found
The BioLang binary is not on your PATH. After installing, ensure the install
directory is in your shell's path:
# Check where bl was installed
which bl # macOS/Linux
where bl # Windows (PowerShell)
# If installed via cargo:
export PATH="$HOME/.cargo/bin:$PATH" # add to ~/.bashrc or ~/.zshrc
# If installed to a custom location:
export PATH="/path/to/biolang/bin:$PATH"
After editing your shell config, restart the terminal or run source ~/.bashrc.
Build from source fails with rustc version error
BioLang requires Rust 1.75 or later. Check your version and update:
rustc --version
rustup update stable
Build fails with missing system libraries
On Linux, you may need development headers for OpenSSL and pkg-config:
# Debian/Ubuntu
sudo apt install pkg-config libssl-dev
# Fedora/RHEL
sudo dnf install openssl-devel pkg-config
# macOS (via Homebrew)
brew install openssl pkg-config
REPL Issues
REPL won't start / crashes on launch
If the REPL crashes immediately, try running with verbose output to see the error:
bl repl --verbose
On Windows, if you see errors about terminal capabilities, ensure you are using
Windows Terminal or PowerShell 7+, not the legacy cmd.exe console.
Arrow keys print escape codes instead of navigating
This happens in terminals that don't support ANSI escape sequences. Use a modern terminal emulator (Windows Terminal, iTerm2, Alacritty, kitty).
REPL history not persisting
History is saved to ~/.biolang/history. Ensure the directory exists and
is writable:
mkdir -p ~/.biolang
ls -la ~/.biolang/
File I/O
File not found when the file exists
BioLang resolves paths relative to the current working directory, not the script
location. Either use absolute paths or cd to the data directory before running:
# Relative path — depends on where you run bl from
let reads = read_fastq("data/sample.fastq")
# Absolute path — always works
let reads = read_fastq("/home/user/data/sample.fastq")
Reading large files runs out of memory
Use streaming instead of loading the entire file into memory:
# Bad: loads entire file into a list
let all = read_fastq("big.fastq")
# Good: stream and filter
let filtered = read_fastq("big.fastq")
|> filter_reads(30)
Streams process one record at a time, so memory usage stays constant regardless of
file size. Avoid collect() on large datasets unless you need random access.
Gzipped file not detected
BioLang auto-detects .gz extensions. If your file is gzipped but has a
non-standard extension, use the explicit decompression:
# Auto-detected
let reads = read_fastq("reads.fastq.gz")
# Manual decompression for non-standard names
let reads = read_fastq("reads.fq.compressed")
API Clients
NCBI queries are slow or return 429 Too Many Requests
Without an API key, NCBI limits you to 3 requests per second. Set your API key to increase to 10 req/s:
# Get a free key at https://www.ncbi.nlm.nih.gov/account/settings/
export NCBI_API_KEY="your-key-here"
Add this to your ~/.bashrc or ~/.zshrc to persist it.
COSMIC queries fail with unauthorized
COSMIC requires a separate API key. Register at COSMIC and set:
export COSMIC_API_KEY="your-cosmic-key"
Ensembl / UniProt queries timeout
Public APIs can be slow during peak hours. BioLang uses a default 30-second timeout. If you're behind a corporate or institutional proxy, ensure the proxy environment variables are set correctly.
HTTP proxy configuration (with authentication)
University, hospital, and corporate networks often require an authenticated proxy. All BioLang API clients (NCBI, Ensembl, UniProt, UCSC, etc.) respect standard proxy environment variables:
# Basic proxy (no auth)
export HTTP_PROXY="http://proxy.example.com:8080"
export HTTPS_PROXY="http://proxy.example.com:8080"
# Authenticated proxy (username:password)
export HTTP_PROXY="http://username:password@proxy.example.com:8080"
export HTTPS_PROXY="http://username:password@proxy.example.com:8080"
# Skip proxy for local addresses (system convention, used by curl/wget/etc.)
export NO_PROXY="localhost,127.0.0.1,.internal.university.edu"
Add these to your ~/.bashrc, ~/.zshrc, or
~/.profile to persist across sessions. On HPC clusters, you may also
need to include them in your SLURM job scripts.
Common proxy pitfalls:
-
HTTPS_PROXY value still uses
http://— this is correct. The variable name says which traffic to proxy; the value is the proxy's own address (which is usually plain HTTP). -
Special characters in password — URL-encode them.
@becomes%40,#becomes%23,:becomes%3A. -
SSL certificate errors behind proxy — institutional proxies often
use a custom CA certificate. BioLang uses compiled-in Mozilla root certificates,
so custom CAs need to be added to the system trust store:
# Linux (Debian/Ubuntu) — add your institution's CA sudo cp institution-ca.crt /usr/local/share/ca-certificates/ sudo update-ca-certificates # macOS — add to system keychain sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain institution-ca.crt - Proxy blocks certain ports — some proxies only allow ports 80 and 443. If an API uses a non-standard port, it may be silently blocked.
Common Type Errors
expected DNA, got String
Plain strings and DNA sequences are different types. Use a DNA literal or the
dna() constructor:
# Wrong: plain string
let seq = "ATCG"
reverse_complement(seq) # Error: expected DNA, got String
# Right: DNA literal
let seq = dna"ATCG"
reverse_complement(seq) # Works
# Right: runtime conversion
let seq = dna("ATCG")
reverse_complement(seq) # Works
cannot add Int and Float
BioLang does not implicitly coerce numeric types. Convert explicitly:
# Error
let x = 1 + 1.5
# Fix
let x = float(1) + 1.5 # 2.5
let y = 1 + int(1.5) # 2 (truncates)
stream already consumed
Streams can only be iterated once. Use read_fasta() / read_fastq()
for a reusable table, or collect() the stream:
# Error: second use fails
let s = fastq("data.fq")
let count = len(s)
let avg_q = s |> map(|r| mean_phred(r.quality)) |> mean() # Error!
# Fix: use read_fastq() for a reusable list
let records = read_fastq("data.fq")
let count = len(records)
let avg_q = records |> map(|r| mean_phred(r.quality)) |> mean()
Pipes
Pipe result is unexpected
Remember: |> inserts the left side as the first argument.
If the function you're calling expects the data in a different position, use a lambda:
# |> inserts as first arg
[1, 2, 3] |> map(|x| x * 2) # map([1,2,3], |x| x * 2)
# If you need it in a different position, use a lambda
"hello" |> (|s| replace(s, "h", "H"))
Newline breaks a pipe chain
BioLang uses newlines as statement terminators. The lexer automatically suppresses
newlines after |>, so this works:
# Works: newline after |>
data
|> filter(|x| x > 0)
|> map(|x| x * 2)
# Breaks: newline before |>
data
|> filter(|x| x > 0) # Error: |> at start of line
Always put |> at the end of the line, not the beginning.
Plugins
Plugin not found after installation
Plugins must be in ~/.biolang/plugins/<name>/ with a valid
plugin.json manifest. Verify the structure:
ls ~/.biolang/plugins/my-plugin/
# Should contain: plugin.json, and the executable/script
# Check if BioLang sees it
bl plugins
Python plugin fails with ModuleNotFoundError
Ensure the Python environment used by BioLang has the required packages. BioLang
uses the system python3 by default. If you use virtual environments:
# Activate your venv first, then run bl
source myenv/bin/activate
bl run script.bl
Performance
Script is slower than expected
Common performance pitfalls:
-
Unnecessary
collect(): Converting streams to lists forces all data into memory. Keep data in streams as long as possible. - Repeated file reads: Store the result in a variable instead of reading the same file multiple times.
-
Large string concatenation in loops: Use
join()on a list instead of repeatedly appending strings.
Use the REPL's :time command or :profile to identify bottlenecks:
# In the REPL
:time read_fastq("big.fq") |> filter(|r| mean_phred(r.quality) >= 30) |> len()
:profile my_function(data)
Parallel operations not faster
par_map and par_filter have overhead from thread spawning.
They're only beneficial when:
- The per-item computation is non-trivial (more than a simple field access)
- The dataset has enough items to amortize thread overhead (typically 1000+)
- The operation is CPU-bound, not I/O-bound
Coordinate Systems
Off-by-one errors between BED, VCF, and GFF
This is the single most common bug in bioinformatics. Different formats use different coordinate conventions:
| Format | System | Example (first 10 bases) |
|---|---|---|
| BED, BAM | 0-based, half-open | chr1 0 10 |
| VCF, GFF, SAM | 1-based, inclusive | chr1 1 10 |
BioLang's GenomicInterval uses 0-based half-open internally
(matching BED/BAM). When reading VCF or GFF, coordinates are automatically converted.
Be careful when constructing intervals manually:
# From a VCF position (1-based) — subtract 1 for 0-based
let vcf_pos = 12345
let start = vcf_pos - 1
# Built-in readers handle this for you
let variants = read_vcf("data.vcf") # positions auto-converted
let regions = read_bed("regions.bed") # already 0-based
Strand-unaware interval operations
By default, query_overlaps and query_nearest ignore strand.
If your analysis is strand-specific, filter afterwards:
# Strand-aware filtering
let hits = regions
|> filter(|iv| iv.strand == my_interval.strand)
For genes on the minus strand, remember that the "start" is still the leftmost
(smallest) coordinate — start < end always holds, regardless of strand.
Genome Builds
Mixing GRCh37/hg19 and GRCh38/hg38 data
Combining data from different genome builds silently produces wrong results — coordinates don't match, variants map to wrong genes, intervals don't overlap when they should. Common symptoms:
- Zero overlaps when you expect many
- Variants not found in annotation databases
- Chromosome names don't match (
chr1vs1)
Always verify your genome build before combining datasets:
# Check VCF header for assembly
# ##reference=GRCh38 or ##assembly=hg38
# Check BAM header
# @SQ SN:chr1 LN:248956422 ← GRCh38
# @SQ SN:chr1 LN:249250621 ← GRCh37
# Chromosome 1 length is a reliable indicator
Chromosome naming: chr1 vs 1
UCSC uses chr1 prefix, Ensembl/NCBI uses 1. Mixing them
causes zero matches in joins and overlaps. Normalize before combining:
# Strip "chr" prefix
let normalized = intervals |> map(|iv| replace(iv.chrom, "chr", ""))
replace("chr1", "chr", "") # "1"
replace("chrX", "chr", "") # "X"
# Add "chr" prefix
let ucsc_style = intervals |> map(|iv| "chr" + iv.chrom)
# Normalize chromosome names with string functions
lower("CHR1") |> replace("chr", "") |> (|c| "chr" + c) # "chr1"
Missing Index Files
BAM operations fail with "index not found"
Many BAM operations (region queries, random access) require an index file. The index must be in the same directory with the correct name:
| Data file | Index file | Created by |
|---|---|---|
sample.bam |
sample.bam.bai or sample.bai |
samtools index sample.bam |
ref.fa |
ref.fa.fai |
samtools faidx ref.fa |
variants.vcf.gz |
variants.vcf.gz.tbi |
tabix -p vcf variants.vcf.gz |
If you see index errors, check that the index exists and was built for the current version of the data file. Rebuilding the data file without re-indexing is a common mistake.
BAM must be coordinate-sorted
Index-based access and many downstream tools require coordinate-sorted BAM. Name-sorted BAM (from aligners) won't work:
# Check sort order — look for SO:coordinate in header
# @HD VN:1.6 SO:coordinate ← good
# @HD VN:1.6 SO:queryname ← needs re-sorting
# Sort and index
samtools sort input.bam -o sorted.bam
samtools index sorted.bam
Quality Score Encodings
Quality scores look wrong or out of range
FASTQ quality scores use ASCII encoding. Modern Illumina (1.8+) uses Phred+33 (ASCII 33–126, scores 0–93). Older Illumina (1.3–1.7) used Phred+64 (ASCII 64–126, scores 0–62).
BioLang assumes Phred+33 by default. If you see suspiciously high quality scores or encounter old data:
# Check the ASCII range in your FASTQ
read_fastq("old_data.fastq")
|> take(100)
|> map(|r| r.quality)
|> map(|q| print(q))
# Phred+33 quality chars start at '!' (ASCII 33)
# Phred+64 quality chars start at '@' (ASCII 64)
# If you see mostly uppercase letters and no '!' or '#', it's likely Phred+64
Almost all data generated after ~2011 uses Phred+33. If you're working with archival SRA data, check the original publication date.
VCF Gotchas
Multi-allelic records give unexpected results
A single VCF line can contain multiple alternate alleles (ALT=G,T).
This affects how you interpret genotypes, allele frequencies, and variant classification:
# A multi-allelic variant
# chr1 100 . A G,T . PASS AF=0.3,0.1
# Check before processing
let variants = read_vcf("data.vcf")
variants
|> filter(|v| len(split(v["alt"], ",")) > 1)
|> map(|v| print("Multi-allelic: " + v["chrom"] + ":" + v["pos"]))
INFO field parsing surprises
VCF INFO fields have multiple value types. Use parse_vcf_info() for
reliable parsing:
# VCF records from read_vcf have parsed INFO fields
let variants = read_vcf("data.vcf")
let v = first(variants)
# v["info"]["DP"] → 30 (Int)
# v["info"]["AF"] → 0.5 (Float)
Conda & HPC Clusters
Conda environment conflicts with BioLang
Conda can override system libraries (especially libssl, libz,
libcurl) causing unexpected link errors or crashes. If BioLang works
outside conda but fails inside:
# Test outside conda
conda deactivate
bl --version # works?
# If yes, the conda env is overriding a system library.
# Option 1: Install BioLang into the conda env
conda activate myenv
cargo install --path . --root $CONDA_PREFIX
# Option 2: Use a dedicated env
conda create -n biolang
conda activate biolang
# install bl here
Running BioLang on SLURM / PBS clusters
HPC job scripts need the correct PATH and environment. A minimal SLURM script:
#!/bin/bash
#SBATCH --job-name=biolang-qc
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=01:00:00
# Load modules or set PATH
export PATH="$HOME/.cargo/bin:$PATH"
# If using conda
source activate biolang
# Run your script
bl run qc_pipeline.bl
For interactive development on a compute node:
# Request an interactive session
srun --pty --cpus-per-task=4 --mem=8G --time=02:00:00 bash
# Then start the REPL
bl repl
Headless terminal / no color support
Some HPC login nodes and screen/tmux sessions don't
advertise color support. If the REPL output looks garbled:
# Force TERM type
export TERM=xterm-256color
# Or disable colors entirely
bl repl --no-color
Containers & External Tools
docker: command not found / podman: command not found
Container builtins require Docker or Podman installed and on your PATH.
BioLang auto-detects the available runtime. Verify:
docker --version # or podman --version
BioContainers image pull fails
Check your network connection and that you can reach quay.io:
curl -I https://quay.io/v2/
If you're behind a firewall, you may need to configure your container runtime's proxy settings separately from shell environment variables.
Environment Variables Reference
| Variable | Purpose | Required |
|---|---|---|
BIOLANG_PATH |
Additional module search paths (colon-separated) | No |
BIOLANG_DATA_DIR |
Default directory for file I/O (reads fall back here, writes go here) | No |
NCBI_API_KEY |
NCBI E-utilities API key (higher rate limits) | No |
COSMIC_API_KEY |
COSMIC database access | For COSMIC queries |
HTTP_PROXY / HTTPS_PROXY |
Proxy for API calls (supports user:pass@host auth) |
If behind proxy |
NO_PROXY |
Comma-separated hosts to bypass proxy (system convention, used by external tools) | If behind proxy |
ANTHROPIC_API_KEY |
Anthropic (Claude) for chat() / chat_code() |
For LLM features |
OPENAI_API_KEY |
OpenAI (GPT) for chat() / chat_code() |
For LLM features |
OLLAMA_MODEL |
Ollama local model name (no API key needed) | For LLM features |
Still Stuck?
- Search existing issues: GitHub Issues
- Ask in discussions: GitHub Discussions
-
Check the REPL: Use
:type exprto inspect types and:envto see all bindings in scope.