API Clients Overview
BioLang ships with built-in clients for 15 major bioinformatics databases, registries, and web services. These clients are available as global functions — no imports needed. They handle authentication, rate limiting, response parsing, and caching automatically, so you can focus on the biology rather than HTTP plumbing.
Available Clients
| Client | Service | Key Functions |
|---|---|---|
| NCBI | NCBI E-utilities | ncbi_search, ncbi_fetch, ncbi_summary, ncbi_gene, ncbi_pubmed, ncbi_sequence |
| Ensembl | Ensembl REST API | ensembl_gene, ensembl_symbol, ensembl_sequence, ensembl_vep |
| UniProt | UniProt REST API | uniprot_search, uniprot_entry, uniprot_fasta, uniprot_features, uniprot_go |
| UCSC | UCSC Genome Browser | ucsc_genomes, ucsc_sequence, ucsc_tracks |
| KEGG | KEGG Pathway Database | kegg_find, kegg_get, kegg_link |
| STRING | STRING Protein Network | string_network, string_enrichment |
| PDB | RCSB Protein Data Bank | pdb_entry, pdb_search |
| Reactome | Reactome Pathway DB | reactome_search, reactome_pathways |
| GO | Gene Ontology (QuickGO) | go_term, go_annotations, go_children, go_parents |
| COSMIC | COSMIC Somatic Mutations | cosmic_gene |
| BioMart | Ensembl BioMart | biomart_query |
| NCBI Datasets | NCBI Datasets v2 | datasets_gene |
| nf-core | nf-core Pipeline Catalog | nfcore_list, nfcore_search, nfcore_info, nfcore_params, nfcore_releases |
| BioContainers | BioContainers Registry | biocontainers_search, biocontainers_info, biocontainers_versions, biocontainers_popular |
| Galaxy ToolShed | Galaxy ToolShed Registry | galaxy_search, galaxy_popular, galaxy_categories, galaxy_tool |
Configuration
Most API clients work without any configuration. For services that offer higher rate limits with an API key, set the corresponding environment variable:
# NCBI: optional API key for higher rate limits (10 req/s vs 3 req/s)
# Set NCBI_API_KEY environment variable
# COSMIC: API key required
# Set COSMIC_API_KEY environment variable
# No configuration needed for:
# Ensembl, UniProt, UCSC, KEGG, STRING, PDB, Reactome, GO, BioMart
Environment Variables
| Variable | Service | Required | Effect |
|---|---|---|---|
| NCBI_API_KEY | NCBI | Optional | Increases rate limit from 3 to 10 requests/second |
| COSMIC_API_KEY | COSMIC | Required | Authentication for COSMIC database access |
Rate Limiting
BioLang automatically respects rate limits for each service. If a rate limit is exceeded, the client transparently retries with exponential backoff. You do not need to add any delay between calls:
# This works fine even though it makes many requests —
# BioLang handles rate limiting internally
let gene_ids = ["BRCA1", "TP53", "EGFR", "MYC", "KRAS", "PIK3CA"]
let results = gene_ids
|> map(|id| {
gene: id,
ncbi: ncbi_search("gene", id),
ensembl: ensembl_symbol("human", id),
uniprot: uniprot_search("gene:{id} AND organism_id:9606")
})
|> to_table()
|> print()
Caching
API responses are cached in memory for the duration of a script execution. Identical
requests return cached results instantly. For persistent caching across runs, use
the cache parameter:
# API responses are cached in memory for the duration of a script.
# Identical calls return cached results instantly.
let gene = ncbi_search("gene", "BRCA1")
let gene2 = ncbi_search("gene", "BRCA1") # cached — no network request
Error Handling
API functions return structured results. Network errors and API errors are propagated as BioLang errors that can be caught:
# Errors propagate naturally
let result = ncbi_search("gene", "BRCA1")
# Use try/catch for graceful error handling
result = try {
ncbi_search("gene", "BRCA1")
} catch err {
print("API error:", err)
nil
}
# Batch operations with error tolerance
let genes = ["BRCA1", "TP53", "INVALID_GENE", "EGFR"]
let results = genes |> map(|g| {
gene: g,
result: try { ncbi_gene(g) } catch _ { nil }
}) |> filter(|r| r.result != nil)
Quick Examples
# Search a gene from NCBI
let gene = ncbi_search("gene", "BRCA1")
print(gene)
# Get protein info from UniProt
let protein = uniprot_entry("P38398")
print(protein.name, protein.length)
# Query Ensembl for variant effects
let vep = ensembl_vep("1:g.230710048A>G")
vep |> first() |> print()
# Get KEGG pathway (returns raw text)
let pathway = kegg_get("hsa04110") # Cell cycle
print(pathway)
# Search protein structures
let structure = pdb_entry("4HHB") # Hemoglobin
print(structure.title, structure.resolution)
Explore Each Client
- NCBI — E-utilities, BLAST, Datasets
- Ensembl — Genes, variants, sequences, homology
- UniProt — Protein knowledge base
- UCSC — Genome browser, tracks, BLAT
- Others — KEGG, STRING, PDB, Reactome, GO, COSMIC, BioMart, NCBI Datasets
- nf-core — Browse, search, and inspect 100+ curated bioinformatics pipelines
- BioContainers — Search and discover 9,000+ containerized bioinformatics tools
- Galaxy ToolShed — Browse and search Galaxy ToolShed repositories
Workflow Parsing & Code Generation
BioLang can parse external workflow files and generate BioLang pipeline code:
See the nf-core page for details on browsing the pipeline catalog.
Configurable Endpoints
All API base URLs can be overridden for mirrors, proxies, or internal deployments:
- Environment variable:
BIOLANG_NCBI_URL,BIOLANG_ENSEMBL_URL, etc. - Config file:
~/.biolang/apis.yaml - Inspect current config:
api_endpoints()
# Check current API endpoints
let endpoints = api_endpoints()
print(endpoints.ncbi)
print(endpoints.biocontainers)