Intermediate ~20 min

Tutorial: LLM Chat

BioLang has built-in LLM integration via chat() and chat_code(). Ask questions about your data, generate BioLang code from natural language, and use AI to interpret analysis results — all from within your scripts.

Prerequisites: An API key for at least one supported provider: Anthropic (Claude), OpenAI, or a local Ollama instance.

What you'll learn

Configuring LLM providers via environment variables
Using chat() for conversational questions
Using chat_code() to generate BioLang code
Passing analysis context to the LLM
Building AI-assisted analysis workflows

Run this tutorial: Download llm-chat.bl and run it with bl run examples/tutorials/llm-chat.bl

Step 1: Configure a provider

BioLang auto-detects your LLM provider from environment variables. Set one of these before running your script:

# Option A: Anthropic (Claude)
export ANTHROPIC_API_KEY="sk-ant-..."

# Option B: OpenAI
export OPENAI_API_KEY="sk-..."

# Option C: Ollama (local, no API key needed)
export OLLAMA_MODEL="llama3"

# Option D: Any OpenAI-compatible API
export LLM_BASE_URL="http://localhost:1234"
export LLM_MODEL="mistral"

Detection priority: Anthropic → OpenAI → Ollama → OpenAI-compatible. You can also override the model:

# Use a specific Anthropic model
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_MODEL="claude-sonnet-4-20250514"

# Use a specific OpenAI model
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o"

Step 2: Ask a question with `chat()`

The chat() function sends a message to the LLM and returns the response as a string. It uses a bioinformatics-aware system prompt, so it understands BioLang syntax and genomics concepts.

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Simple question
let answer = chat("What is the significance of a Ti/Tv ratio above 2.0 in human WGS?")
print(answer)

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Ask about a specific analysis approach
let advice = chat("How should I filter somatic variants from a tumor-normal pair VCF?")
print(advice)

Step 3: Pass context to the LLM

The real power of chat() is the optional second argument — you can pass any BioLang value as context. The LLM sees it alongside your question.

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Pass a data summary as context
let stats = {
  total_reads: 48_000_000,
  passing_q30: 42_500_000,
  mean_gc: 0.412,
  duplicate_rate: 0.18,
  mean_coverage: 32.5,
}

let interpretation = chat("Interpret these QC metrics. Any red flags?", stats)
print(interpretation)

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Pass a table of results
let de_results = [
  {gene: "TP53", log2fc: 3.2, pvalue: 1.2e-8},
  {gene: "BRCA1", log2fc: 2.8, pvalue: 4.5e-7},
  {gene: "MYC", log2fc: -2.1, pvalue: 3.3e-6},
  {gene: "CDK4", log2fc: 1.5, pvalue: 0.002},
]

let analysis = chat("What biological processes are suggested by these DE genes?", de_results)
print(analysis)

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Pass an error message for debugging help
let error_msg = "Error at line 42: type mismatch: expected DNA, got Str"
let fix = chat("How do I fix this BioLang error?", error_msg)
print(fix)

Step 4: Generate code with `chat_code()`

chat_code() returns pure BioLang code — no explanations, no markdown fences. It's designed for code generation that you can evaluate or save.

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Generate a script from a description
let code = chat_code("Read a FASTQ file, filter reads with mean Phred > 30, and print the count")
print(code)
# Output (pure BioLang code):
# let reads = read_fastq("input.fastq")
#   |> filter(|r| mean_phred(r.quality) > 30)
# print(f"Passing reads: {len(reads)}")

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Generate code with context about your data
let schema = {
  file: "samples.tsv",
  columns: ["sample_id", "condition", "replicate", "fastq_r1", "fastq_r2"],
  rows: 24,
}

let pipeline_code = chat_code(
  "Write a pipeline that processes each sample: filter reads, compute GC, and summarize by condition",
  schema
)
print(pipeline_code)

Step 5: Check available models

Use llm_models() to see which provider and model are configured:

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
let models = llm_models()
print(models)
# {provider: "anthropic", model: "claude-sonnet-4-20250514"}

Step 6: Real-world patterns

Interpret enrichment results

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
let gene_sets = [
  {name: "P53_PATHWAY", genes: ["TP53", "CDKN1A", "BAX", "MDM2"]},
  {name: "DNA_REPAIR", genes: ["BRCA1", "RAD51", "ATM", "CHEK2"]},
]
let de_genes = ["TP53", "CDKN1A", "BAX", "BRCA1", "ATM", "RAD51"]

let ora = enrich(de_genes, gene_sets, 20000)
let summary = chat("Summarize these enrichment results in biological terms", ora)
print(summary)

Explain a variant

# requires: internet connection, LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
let v = variant("chr17", 7674220, "G", "A")
let vep = ensembl_vep("17:7674220:G:A")

let explanation = chat("What is the clinical significance of this TP53 variant?", vep)
print(explanation)

Generate a report summary

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# After running your analysis, ask the LLM to write a summary
let results = {
  samples: 24,
  de_genes: 342,
  enriched_pathways: 8,
  top_pathway: "P53_SIGNALING",
  top_gene: "TP53",
  species: "human",
  tissue: "breast tumor",
}

let report = chat(
  "Write a one-paragraph methods and results summary for a paper, based on these analysis results",
  results
)
print(report)

Debug a pipeline

# requires: LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OLLAMA_MODEL)
# Capture an error and ask the LLM for help
let result = try_call(|| {
  let reads = read_fastq("missing_file.fq.gz")
  reads |> filter(|r| mean_phred(r.quality) > 30)
})

if result == nil {
  let help = chat("I got an error reading a FASTQ file in BioLang. How do I debug missing file issues?")
  print(help)
}

Context types

The second argument to chat() and chat_code() accepts any BioLang value. Here's how each type is formatted for the LLM:

Type	Format sent to LLM
Str	Passed as-is
Record / Map	Key-value lines (`key: value`)
List	One item per line
Table	TSV format (header + rows)
Other	String representation

Builtin reference

Function	Description
chat(message, context?)	Send a question to the LLM, get a natural language response
chat_code(message, context?)	Generate pure BioLang code from a description
llm_models()	Show configured provider and model

Environment variables

Variable	Provider	Description
ANTHROPIC_API_KEY	Anthropic	API key (highest priority)
ANTHROPIC_MODEL	Anthropic	Model override (default: claude-sonnet-4-20250514)
OPENAI_API_KEY	OpenAI	API key
OPENAI_MODEL	OpenAI	Model override (default: gpt-4o)
OLLAMA_MODEL	Ollama	Model name (e.g. llama3, mistral)
LLM_BASE_URL	OpenAI-compatible	API base URL for any compatible provider
LLM_MODEL	OpenAI-compatible	Model name
LLM_API_KEY	OpenAI-compatible	API key (optional for local servers)

Tips

Be specific in your prompts — "Explain this VCF" is worse than "Are any of these variants likely pathogenic? Focus on TP53 and BRCA1."
Pass structured context — Records and Tables give the LLM more to work with than plain strings
Use chat_code() for automation — generate repetitive boilerplate or scaffold scripts from descriptions
Ollama is free — for local development, install Ollama and use models like llama3 or mistral with no API costs
Context is appended, not replaced — the bioinformatics system prompt is always present, so the LLM understands BioLang by default

What's next

LLM Chat reference — full builtin documentation
Notebooks — combine LLM-generated insights with literate analysis
Enrichment Analysis — run enrichment and interpret results with LLM