API Clients Overview

BioLang ships with built-in clients for 15 major bioinformatics databases, registries, and web services. These clients are available as global functions — no imports needed. They handle authentication, rate limiting, response parsing, and caching automatically, so you can focus on the biology rather than HTTP plumbing.

Available Clients

Client Service Key Functions
NCBI NCBI E-utilities ncbi_search, ncbi_fetch, ncbi_summary, ncbi_gene, ncbi_pubmed, ncbi_sequence
Ensembl Ensembl REST API ensembl_gene, ensembl_symbol, ensembl_sequence, ensembl_vep
UniProt UniProt REST API uniprot_search, uniprot_entry, uniprot_fasta, uniprot_features, uniprot_go
UCSC UCSC Genome Browser ucsc_genomes, ucsc_sequence, ucsc_tracks
KEGG KEGG Pathway Database kegg_find, kegg_get, kegg_link
STRING STRING Protein Network string_network, string_enrichment
PDB RCSB Protein Data Bank pdb_entry, pdb_search
Reactome Reactome Pathway DB reactome_search, reactome_pathways
GO Gene Ontology (QuickGO) go_term, go_annotations, go_children, go_parents
COSMIC COSMIC Somatic Mutations cosmic_gene
BioMart Ensembl BioMart biomart_query
NCBI Datasets NCBI Datasets v2 datasets_gene
nf-core nf-core Pipeline Catalog nfcore_list, nfcore_search, nfcore_info, nfcore_params, nfcore_releases
BioContainers BioContainers Registry biocontainers_search, biocontainers_info, biocontainers_versions, biocontainers_popular
Galaxy ToolShed Galaxy ToolShed Registry galaxy_search, galaxy_popular, galaxy_categories, galaxy_tool

Configuration

Most API clients work without any configuration. For services that offer higher rate limits with an API key, set the corresponding environment variable:

# NCBI: optional API key for higher rate limits (10 req/s vs 3 req/s)
# Set NCBI_API_KEY environment variable

# COSMIC: API key required
# Set COSMIC_API_KEY environment variable

# No configuration needed for:
# Ensembl, UniProt, UCSC, KEGG, STRING, PDB, Reactome, GO, BioMart

Environment Variables

Variable Service Required Effect
NCBI_API_KEY NCBI Optional Increases rate limit from 3 to 10 requests/second
COSMIC_API_KEY COSMIC Required Authentication for COSMIC database access

Rate Limiting

BioLang automatically respects rate limits for each service. If a rate limit is exceeded, the client transparently retries with exponential backoff. You do not need to add any delay between calls:

# This works fine even though it makes many requests —
# BioLang handles rate limiting internally
let gene_ids = ["BRCA1", "TP53", "EGFR", "MYC", "KRAS", "PIK3CA"]

let results = gene_ids
  |> map(|id| {
    gene: id,
    ncbi: ncbi_search("gene", id),
    ensembl: ensembl_symbol("human", id),
    uniprot: uniprot_search("gene:{id} AND organism_id:9606")
  })
  |> to_table()
  |> print()

Caching

API responses are cached in memory for the duration of a script execution. Identical requests return cached results instantly. For persistent caching across runs, use the cache parameter:

# API responses are cached in memory for the duration of a script.
# Identical calls return cached results instantly.
let gene = ncbi_search("gene", "BRCA1")
let gene2 = ncbi_search("gene", "BRCA1")   # cached — no network request

Error Handling

API functions return structured results. Network errors and API errors are propagated as BioLang errors that can be caught:

# Errors propagate naturally
let result = ncbi_search("gene", "BRCA1")

# Use try/catch for graceful error handling
result = try {
  ncbi_search("gene", "BRCA1")
} catch err {
  print("API error:", err)
  nil
}

# Batch operations with error tolerance
let genes = ["BRCA1", "TP53", "INVALID_GENE", "EGFR"]
let results = genes |> map(|g| {
  gene: g,
  result: try { ncbi_gene(g) } catch _ { nil }
}) |> filter(|r| r.result != nil)

Quick Examples

# Search a gene from NCBI
let gene = ncbi_search("gene", "BRCA1")
print(gene)

# Get protein info from UniProt
let protein = uniprot_entry("P38398")
print(protein.name, protein.length)

# Query Ensembl for variant effects
let vep = ensembl_vep("1:g.230710048A>G")
vep |> first() |> print()

# Get KEGG pathway (returns raw text)
let pathway = kegg_get("hsa04110")    # Cell cycle
print(pathway)

# Search protein structures
let structure = pdb_entry("4HHB")    # Hemoglobin
print(structure.title, structure.resolution)

Explore Each Client

  • NCBI — E-utilities, BLAST, Datasets
  • Ensembl — Genes, variants, sequences, homology
  • UniProt — Protein knowledge base
  • UCSC — Genome browser, tracks, BLAT
  • Others — KEGG, STRING, PDB, Reactome, GO, COSMIC, BioMart, NCBI Datasets
  • nf-core — Browse, search, and inspect 100+ curated bioinformatics pipelines
  • BioContainers — Search and discover 9,000+ containerized bioinformatics tools
  • Galaxy ToolShed — Browse and search Galaxy ToolShed repositories

Workflow Parsing & Code Generation

BioLang can parse external workflow files and generate BioLang pipeline code:

See the nf-core page for details on browsing the pipeline catalog.

Configurable Endpoints

All API base URLs can be overridden for mirrors, proxies, or internal deployments:

  • Environment variable: BIOLANG_NCBI_URL, BIOLANG_ENSEMBL_URL, etc.
  • Config file: ~/.biolang/apis.yaml
  • Inspect current config: api_endpoints()
# Check current API endpoints
let endpoints = api_endpoints()
print(endpoints.ncbi)
print(endpoints.biocontainers)