Troubleshooting

Common issues and their solutions. If you don't find your answer here, please open an issue.

Installation

`bl: command not found`

The BioLang binary is not on your PATH. After installing, ensure the install directory is in your shell's path:

# Check where bl was installed
which bl    # macOS/Linux
where bl    # Windows (PowerShell)

# If installed via cargo:
export PATH="$HOME/.cargo/bin:$PATH"    # add to ~/.bashrc or ~/.zshrc

# If installed to a custom location:
export PATH="/path/to/biolang/bin:$PATH"

After editing your shell config, restart the terminal or run source ~/.bashrc.

Build from source fails with `rustc` version error

BioLang requires Rust 1.75 or later. Check your version and update:

rustc --version
rustup update stable

Build fails with missing system libraries

On Linux, you may need development headers for OpenSSL and pkg-config:

# Debian/Ubuntu
sudo apt install pkg-config libssl-dev

# Fedora/RHEL
sudo dnf install openssl-devel pkg-config

# macOS (via Homebrew)
brew install openssl pkg-config

REPL Issues

REPL won't start / crashes on launch

If the REPL crashes immediately, try running with verbose output to see the error:

bl repl --verbose

On Windows, if you see errors about terminal capabilities, ensure you are using Windows Terminal or PowerShell 7+, not the legacy cmd.exe console.

Arrow keys print escape codes instead of navigating

This happens in terminals that don't support ANSI escape sequences. Use a modern terminal emulator (Windows Terminal, iTerm2, Alacritty, kitty).

REPL history not persisting

History is saved to ~/.biolang/history. Ensure the directory exists and is writable:

mkdir -p ~/.biolang
ls -la ~/.biolang/

File I/O

`File not found` when the file exists

BioLang resolves paths relative to the current working directory, not the script location. Either use absolute paths or cd to the data directory before running:

# Relative path — depends on where you run bl from
let reads = read_fastq("data/sample.fastq")

# Absolute path — always works
let reads = read_fastq("/home/user/data/sample.fastq")

Windows paths with backslashes

Backslashes in regular strings are escape characters (\n = newline, \t = tab, \r = carriage return). A path like "C:\Users\rajba\data" breaks because \r becomes a carriage return. Use raw strings with the r"..." prefix — backslashes are treated literally:

# Wrong — \r becomes carriage return, \n becomes newline
let reads = read_fastq("C:\Users\rajba\new_data\reads.fq.gz")

# Correct — raw string, backslashes are literal
let reads = read_fastq(r"C:\Users\rajba\new_data\reads.fq.gz")

# Also correct — forward slashes work on all platforms
let reads = read_fastq("C:/Users/rajba/new_data/reads.fq.gz")

Reading large files runs out of memory

Use streaming instead of loading the entire file into memory:

# Bad: loads entire file into a list
let all = read_fastq("big.fastq")

# Good: stream and filter
let filtered = read_fastq("big.fastq")
  |> filter_reads(30)

Streams process one record at a time, so memory usage stays constant regardless of file size. Avoid collect() on large datasets unless you need random access.

Gzipped file not detected

BioLang auto-detects .gz extensions. If your file is gzipped but has a non-standard extension, use the explicit decompression:

# Auto-detected
let reads = read_fastq("reads.fastq.gz")

# Manual decompression for non-standard names
let reads = read_fastq("reads.fq.compressed")

API Clients

NCBI queries are slow or return `429 Too Many Requests`

Without an API key, NCBI limits you to 3 requests per second. Set your API key to increase to 10 req/s:

# Get a free key at https://www.ncbi.nlm.nih.gov/account/settings/
export NCBI_API_KEY="your-key-here"

Add this to your ~/.bashrc or ~/.zshrc to persist it.

COSMIC queries fail with `unauthorized`

COSMIC requires a separate API key. Register at COSMIC and set:

export COSMIC_API_KEY="your-cosmic-key"

Ensembl / UniProt queries timeout

Public APIs can be slow during peak hours. BioLang uses a default 30-second timeout. If you're behind a corporate or institutional proxy, ensure the proxy environment variables are set correctly.

HTTP proxy configuration (with authentication)

University, hospital, and corporate networks often require an authenticated proxy. All BioLang API clients (NCBI, Ensembl, UniProt, UCSC, etc.) respect standard proxy environment variables:

# Basic proxy (no auth)
export HTTP_PROXY="http://proxy.example.com:8080"
export HTTPS_PROXY="http://proxy.example.com:8080"

# Authenticated proxy (username:password)
export HTTP_PROXY="http://username:password@proxy.example.com:8080"
export HTTPS_PROXY="http://username:password@proxy.example.com:8080"

# Skip proxy for local addresses (system convention, used by curl/wget/etc.)
export NO_PROXY="localhost,127.0.0.1,.internal.university.edu"

Add these to your ~/.bashrc, ~/.zshrc, or ~/.profile to persist across sessions. On HPC clusters, you may also need to include them in your SLURM job scripts.

Common proxy pitfalls:

HTTPS_PROXY value still uses http:// — this is correct. The variable name says which traffic to proxy; the value is the proxy's own address (which is usually plain HTTP).
Special characters in password — URL-encode them. @ becomes %40, # becomes %23, : becomes %3A.

SSL certificate errors behind proxy — institutional proxies often use a custom CA certificate. BioLang uses compiled-in Mozilla root certificates, so custom CAs need to be added to the system trust store:

# Linux (Debian/Ubuntu) — add your institution's CA
sudo cp institution-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates

# macOS — add to system keychain
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain institution-ca.crt

Proxy blocks certain ports — some proxies only allow ports 80 and 443. If an API uses a non-standard port, it may be silently blocked.

Common Type Errors

`expected DNA, got String`

Plain strings and DNA sequences are different types. Use a DNA literal or the dna() constructor:

# Wrong: plain string
let seq = "ATCG"
reverse_complement(seq)  # Error: expected DNA, got String

# Right: DNA literal
let seq = dna"ATCG"
reverse_complement(seq)  # Works

# Right: runtime conversion
let seq = dna("ATCG")
reverse_complement(seq)  # Works

`cannot add Int and Float`

BioLang does not implicitly coerce numeric types. Convert explicitly:

# Error
let x = 1 + 1.5

# Fix
let x = float(1) + 1.5   # 2.5
let y = 1 + int(1.5)     # 2 (truncates)

`stream already consumed`

Streams can only be iterated once. Use read_fasta() / read_fastq() for a reusable table, or collect() the stream:

# Error: second use fails
let s = read_fastq("data/reads.fastq")
let count = len(s)
let avg_q = s |> map(|r| mean_phred(r.quality)) |> mean()  # Error!

# Fix: use read_fastq() for a reusable list
let records = read_fastq("data/reads.fastq")
let count = len(records)
let avg_q = records |> map(|r| mean_phred(r.quality)) |> mean()

Pipes

Pipe result is unexpected

Remember: |> inserts the left side as the first argument. If the function you're calling expects the data in a different position, use a lambda:

# |> inserts as first arg
[1, 2, 3] |> map(|x| x * 2)    # map([1,2,3], |x| x * 2)

# If you need it in a different position, use a lambda
"hello" |> (|s| replace(s, "h", "H"))

Newline breaks a pipe chain

BioLang uses newlines as statement terminators. The lexer automatically suppresses newlines after |>, so this works:

# Works: newline after |>
data
  |> filter(|x| x > 0)
  |> map(|x| x * 2)

# Breaks: newline before |>
data
|> filter(|x| x > 0)    # Error: |> at start of line

Always put |> at the end of the line, not the beginning.

Plugins

Plugin not found after installation

Plugins must be in ~/.biolang/plugins/<name>/ with a valid plugin.json manifest. Verify the structure:

ls ~/.biolang/plugins/my-plugin/
# Should contain: plugin.json, and the executable/script

# Check if BioLang sees it
bl plugins

Python plugin fails with `ModuleNotFoundError`

Ensure the Python environment used by BioLang has the required packages. BioLang uses the system python3 by default. If you use virtual environments:

# Activate your venv first, then run bl
source myenv/bin/activate
bl run script.bl

Performance

Script is slower than expected

Common performance pitfalls:

Unnecessary collect(): Converting streams to lists forces all data into memory. Keep data in streams as long as possible.
Repeated file reads: Store the result in a variable instead of reading the same file multiple times.
Large string concatenation in loops: Use join() on a list instead of repeatedly appending strings.

Use the REPL's :time command or :profile to identify bottlenecks:

# In the REPL
:time read_fastq("big.fq") |> filter(|r| mean_phred(r.quality) >= 30) |> len()

:profile my_function(data)

Parallel operations not faster

par_map and par_filter have overhead from thread spawning. They're only beneficial when:

The per-item computation is non-trivial (more than a simple field access)
The dataset has enough items to amortize thread overhead (typically 1000+)
The operation is CPU-bound, not I/O-bound

Coordinate Systems

Off-by-one errors between BED, VCF, and GFF

This is the single most common bug in bioinformatics. Different formats use different coordinate conventions:

Format	System	Example (first 10 bases)
BED, BAM	0-based, half-open	`chr1 0 10`
VCF, GFF, SAM	1-based, inclusive	`chr1 1 10`

BioLang's GenomicInterval uses 0-based half-open internally (matching BED/BAM). When reading VCF or GFF, coordinates are automatically converted. Be careful when constructing intervals manually:

# From a VCF position (1-based) — subtract 1 for 0-based
let vcf_pos = 12345
let start = vcf_pos - 1

# Built-in readers handle this for you
let variants = read_vcf("data.vcf")  # positions auto-converted
let regions = read_bed("regions.bed") # already 0-based

Strand-unaware interval operations

By default, query_overlaps and query_nearest ignore strand. If your analysis is strand-specific, filter afterwards:

# Strand-aware filtering
let hits = regions
  |> filter(|iv| iv.strand == my_interval.strand)

For genes on the minus strand, remember that the "start" is still the leftmost (smallest) coordinate — start < end always holds, regardless of strand.

Genome Builds

Mixing GRCh37/hg19 and GRCh38/hg38 data

Combining data from different genome builds silently produces wrong results — coordinates don't match, variants map to wrong genes, intervals don't overlap when they should. Common symptoms:

Zero overlaps when you expect many
Variants not found in annotation databases
Chromosome names don't match (chr1 vs 1)

Always verify your genome build before combining datasets:

# Check VCF header for assembly
# ##reference=GRCh38 or ##assembly=hg38

# Check BAM header
# @SQ SN:chr1 LN:248956422  ← GRCh38
# @SQ SN:chr1 LN:249250621  ← GRCh37
# Chromosome 1 length is a reliable indicator

Chromosome naming: `chr1` vs `1`

UCSC uses chr1 prefix, Ensembl/NCBI uses 1. Mixing them causes zero matches in joins and overlaps. Normalize before combining:

# Strip "chr" prefix
let normalized = intervals |> map(|iv| replace(iv.chrom, "chr", ""))
replace("chr1", "chr", "")   # "1"
replace("chrX", "chr", "")   # "X"

# Add "chr" prefix
let ucsc_style = intervals |> map(|iv| "chr" + iv.chrom)

# Normalize chromosome names with string functions
lower("CHR1") |> replace("chr", "") |> (|c| "chr" + c)  # "chr1"

Missing Index Files

BAM operations fail with "index not found"

Many BAM operations (region queries, random access) require an index file. The index must be in the same directory with the correct name:

Data file	Index file	Created by
`sample.bam`	`sample.bam.bai` or `sample.bai`	`samtools index sample.bam`
`ref.fa`	`ref.fa.fai`	`samtools faidx ref.fa`
`variants.vcf.gz`	`variants.vcf.gz.tbi`	`tabix -p vcf variants.vcf.gz`

If you see index errors, check that the index exists and was built for the current version of the data file. Rebuilding the data file without re-indexing is a common mistake.

BAM must be coordinate-sorted

Index-based access and many downstream tools require coordinate-sorted BAM. Name-sorted BAM (from aligners) won't work:

# Check sort order — look for SO:coordinate in header
# @HD VN:1.6 SO:coordinate  ← good
# @HD VN:1.6 SO:queryname   ← needs re-sorting

# Sort and index
samtools sort input.bam -o sorted.bam
samtools index sorted.bam

Quality Score Encodings

Quality scores look wrong or out of range

FASTQ quality scores use ASCII encoding. Modern Illumina (1.8+) uses Phred+33 (ASCII 33–126, scores 0–93). Older Illumina (1.3–1.7) used Phred+64 (ASCII 64–126, scores 0–62).

BioLang assumes Phred+33 by default. If you see suspiciously high quality scores or encounter old data:

# Check the ASCII range in your FASTQ
read_fastq("old_data.fastq")
  |> take(100)
  |> map(|r| r.quality)
  |> map(|q| print(q))

# Phred+33 quality chars start at '!' (ASCII 33)
# Phred+64 quality chars start at '@' (ASCII 64)
# If you see mostly uppercase letters and no '!' or '#', it's likely Phred+64

Almost all data generated after ~2011 uses Phred+33. If you're working with archival SRA data, check the original publication date.

VCF Gotchas

Multi-allelic records give unexpected results

A single VCF line can contain multiple alternate alleles (ALT=G,T). This affects how you interpret genotypes, allele frequencies, and variant classification:

# A multi-allelic variant
# chr1  100  .  A  G,T  .  PASS  AF=0.3,0.1

# Check before processing
let variants = read_vcf("data.vcf")
variants
  |> filter(|v| len(split(v["alt"], ",")) > 1)
  |> map(|v| print("Multi-allelic: " + v["chrom"] + ":" + v["pos"]))

INFO field parsing surprises

VCF INFO fields have multiple value types. Use parse_vcf_info() for reliable parsing:

# VCF records from read_vcf have parsed INFO fields
let variants = read_vcf("data.vcf")
let v = first(variants)
# v["info"]["DP"] → 30 (Int)
# v["info"]["AF"] → 0.5 (Float)

Conda & HPC Clusters

Conda environment conflicts with BioLang

Conda can override system libraries (especially libssl, libz, libcurl) causing unexpected link errors or crashes. If BioLang works outside conda but fails inside:

# Test outside conda
conda deactivate
bl --version  # works?

# If yes, the conda env is overriding a system library.
# Option 1: Install BioLang into the conda env
conda activate myenv
cargo install --path . --root $CONDA_PREFIX

# Option 2: Use a dedicated env
conda create -n biolang
conda activate biolang
# install bl here

Running BioLang on SLURM / PBS clusters

HPC job scripts need the correct PATH and environment. A minimal SLURM script:

#!/bin/bash
#SBATCH --job-name=biolang-qc
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=01:00:00

# Load modules or set PATH
export PATH="$HOME/.cargo/bin:$PATH"

# If using conda
source activate biolang

# Run your script
bl run qc_pipeline.bl

For interactive development on a compute node:

# Request an interactive session
srun --pty --cpus-per-task=4 --mem=8G --time=02:00:00 bash

# Then start the REPL
bl repl

Headless terminal / no color support

Some HPC login nodes and screen/tmux sessions don't advertise color support. If the REPL output looks garbled:

# Force TERM type
export TERM=xterm-256color

# Or disable colors entirely
bl repl --no-color

Containers & External Tools

`docker: command not found` / `podman: command not found`

Container builtins require Docker or Podman installed and on your PATH. BioLang auto-detects the available runtime. Verify:

docker --version   # or podman --version

BioContainers image pull fails

Check your network connection and that you can reach quay.io:

curl -I https://quay.io/v2/

If you're behind a firewall, you may need to configure your container runtime's proxy settings separately from shell environment variables.

Environment Variables Reference

Variable	Purpose	Required
`BIOLANG_PATH`	Additional module search paths (colon-separated)	No
`BIOLANG_DATA_DIR`	Default directory for file I/O (reads fall back here, writes go here)	No
`NCBI_API_KEY`	NCBI E-utilities API key (higher rate limits)	No
`COSMIC_API_KEY`	COSMIC database access	For COSMIC queries
`HTTP_PROXY` / `HTTPS_PROXY`	Proxy for API calls (supports `user:pass@host` auth)	If behind proxy
`NO_PROXY`	Comma-separated hosts to bypass proxy (system convention, used by external tools)	If behind proxy
`ANTHROPIC_API_KEY`	Anthropic (Claude) for `chat()` / `chat_code()`	For LLM features
`OPENAI_API_KEY`	OpenAI (GPT) for `chat()` / `chat_code()`	For LLM features
`OLLAMA_MODEL`	Ollama local model name (no API key needed)	For LLM features

Still Stuck?

Search existing issues: GitHub Issues
Ask in discussions: GitHub Discussions
Check the REPL: Use :type expr to inspect types and :env to see all bindings in scope.