Tutorial: LLM Chat
BioLang has built-in LLM integration via chat() and chat_code().
Ask questions about your data, generate BioLang code from natural language, and
use AI to interpret analysis results — all from within your scripts.
Prerequisites: An API key for at least one supported provider: Anthropic (Claude), OpenAI, or a local Ollama instance.
What you'll learn
- Configuring LLM providers via environment variables
- Using
chat()for conversational questions - Using
chat_code()to generate BioLang code - Passing analysis context to the LLM
- Building AI-assisted analysis workflows
bl run examples/tutorials/llm-chat.bl
Step 1: Configure a provider
BioLang auto-detects your LLM provider from environment variables. Set one of these before running your script:
# Option A: Anthropic (Claude)
export ANTHROPIC_API_KEY="sk-ant-..."
# Option B: OpenAI
export OPENAI_API_KEY="sk-..."
# Option C: Ollama (local, no API key needed)
export OLLAMA_MODEL="llama3"
# Option D: Any OpenAI-compatible API
export LLM_BASE_URL="http://localhost:1234"
export LLM_MODEL="mistral"
Detection priority: Anthropic → OpenAI → Ollama → OpenAI-compatible. You can also override the model:
# Use a specific Anthropic model
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_MODEL="claude-sonnet-4-20250514"
# Use a specific OpenAI model
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o"
Step 2: Ask a question with chat()
The chat() function sends a message to the LLM and returns the
response as a string. It uses a bioinformatics-aware system prompt, so it
understands BioLang syntax and genomics concepts.
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Simple question
let answer = chat("What is the significance of a Ti/Tv ratio above 2.0 in human WGS?")
print(answer)
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Ask about a specific analysis approach
let advice = chat("How should I filter somatic variants from a tumor-normal pair VCF?")
print(advice)
Step 3: Pass context to the LLM
The real power of chat() is the optional second argument — you can
pass any BioLang value as context. The LLM sees it alongside your question.
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Pass a data summary as context
let stats = {
total_reads: 48_000_000,
passing_q30: 42_500_000,
mean_gc: 0.412,
duplicate_rate: 0.18,
mean_coverage: 32.5,
}
let interpretation = chat("Interpret these QC metrics. Any red flags?", stats)
print(interpretation)
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Pass a table of results
let de_results = [
{gene: "TP53", log2fc: 3.2, pvalue: 1.2e-8},
{gene: "BRCA1", log2fc: 2.8, pvalue: 4.5e-7},
{gene: "MYC", log2fc: -2.1, pvalue: 3.3e-6},
{gene: "CDK4", log2fc: 1.5, pvalue: 0.002},
]
let analysis = chat("What biological processes are suggested by these DE genes?", de_results)
print(analysis)
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Pass an error message for debugging help
let error_msg = "Error at line 42: type mismatch: expected DNA, got Str"
let fix = chat("How do I fix this BioLang error?", error_msg)
print(fix)
Step 4: Generate code with chat_code()
chat_code() returns pure BioLang code — no explanations, no markdown
fences. It's designed for code generation that you can evaluate or save.
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Generate a script from a description
let code = chat_code("Read a FASTQ file, filter reads with mean Phred > 30, and print the count")
print(code)
# Output (pure BioLang code):
# let reads = read_fastq("input.fastq")
# |> filter(|r| mean_phred(r.quality) > 30)
# print(f"Passing reads: {len(reads)}")
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Generate code with context about your data
let schema = {
file: "samples.tsv",
columns: ["sample_id", "condition", "replicate", "fastq_r1", "fastq_r2"],
rows: 24,
}
let pipeline_code = chat_code(
"Write a pipeline that processes each sample: filter reads, compute GC, and summarize by condition",
schema
)
print(pipeline_code)
Step 5: Check available models
Use llm_models() to see which provider and model are configured:
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
let models = llm_models()
print(models)
# {provider: "anthropic", model: "claude-sonnet-4-20250514"}
Step 6: Real-world patterns
Interpret enrichment results
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
let gene_sets = [
{name: "P53_PATHWAY", genes: ["TP53", "CDKN1A", "BAX", "MDM2"]},
{name: "DNA_REPAIR", genes: ["BRCA1", "RAD51", "ATM", "CHEK2"]},
]
let de_genes = ["TP53", "CDKN1A", "BAX", "BRCA1", "ATM", "RAD51"]
let ora = enrich(de_genes, gene_sets, 20000)
let summary = chat("Summarize these enrichment results in biological terms", ora)
print(summary)
Explain a variant
# requires: internet connection, LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
let v = variant("chr17", 7674220, "G", "A")
let vep = ensembl_vep("17:7674220:G:A")
let explanation = chat("What is the clinical significance of this TP53 variant?", vep)
print(explanation)
Generate a report summary
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# After running your analysis, ask the LLM to write a summary
let results = {
samples: 24,
de_genes: 342,
enriched_pathways: 8,
top_pathway: "P53_SIGNALING",
top_gene: "TP53",
species: "human",
tissue: "breast tumor",
}
let report = chat(
"Write a one-paragraph methods and results summary for a paper, based on these analysis results",
results
)
print(report)
Debug a pipeline
# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Capture an error and ask the LLM for help
let result = try_call(|| {
let reads = read_fastq("missing_file.fq.gz")
reads |> filter(|r| mean_phred(r.quality) > 30)
})
if result == nil {
let help = chat("I got an error reading a FASTQ file in BioLang. How do I debug missing file issues?")
print(help)
}
Context types
The second argument to chat() and chat_code() accepts
any BioLang value. Here's how each type is formatted for the LLM:
| Type | Format sent to LLM |
|---|---|
| Str | Passed as-is |
| Record / Map | Key-value lines (key: value) |
| List | One item per line |
| Table | TSV format (header + rows) |
| Other | String representation |
Builtin reference
| Function | Description |
|---|---|
| chat(message, context?) | Send a question to the LLM, get a natural language response |
| chat_code(message, context?) | Generate pure BioLang code from a description |
| llm_models() | Show configured provider and model |
Environment variables
| Variable | Provider | Description |
|---|---|---|
| ANTHROPIC_API_KEY | Anthropic | API key (highest priority) |
| ANTHROPIC_MODEL | Anthropic | Model override (default: claude-sonnet-4-20250514) |
| OPENAI_API_KEY | OpenAI | API key |
| OPENAI_MODEL | OpenAI | Model override (default: gpt-4o) |
| OLLAMA_MODEL | Ollama | Model name (e.g. llama3, mistral) |
| LLM_BASE_URL | OpenAI-compatible | API base URL for any compatible provider |
| LLM_MODEL | OpenAI-compatible | Model name |
| LLM_API_KEY | OpenAI-compatible | API key (optional for local servers) |
Tips
- Be specific in your prompts — "Explain this VCF" is worse than "Are any of these variants likely pathogenic? Focus on TP53 and BRCA1."
- Pass structured context — Records and Tables give the LLM more to work with than plain strings
- Use
chat_code()for automation — generate repetitive boilerplate or scaffold scripts from descriptions - Ollama is free — for local development, install Ollama and use models like
llama3ormistralwith no API costs - Context is appended, not replaced — the bioinformatics system prompt is always present, so the LLM understands BioLang by default
What's next
- LLM Chat reference — full builtin documentation
- Notebooks — combine LLM-generated insights with literate analysis
- Enrichment Analysis — run enrichment and interpret results with LLM