UniProt

The UniProt client provides access to the Universal Protein Resource, the most comprehensive protein knowledge base. It supports protein entry retrieval, full-text search, sequence BLAST, feature annotations, GO terms, and ID mapping between databases. No API key is required.

uniprot_entry

Fetch a complete UniProt entry by accession number. Returns detailed protein information including sequence, function, features, and cross-references:

# Fetch BRCA1 protein
let entry = uniprot_entry("P38398")

print(entry.accession)        # => "P38398"
print(entry.name)             # => "BRCA1_HUMAN"
print(entry.organism)         # => "Homo sapiens"
print(entry.sequence_length)  # => 1863
print(entry.gene_names)       # => ["BRCA1"]

# Function annotation
print(entry.function)

# Features are fetched separately with uniprot_features()
let features = uniprot_features("P38398")
features
  |> filter(|f| f.type == "Domain")
  |> map(|f| { type: f.type, location: f.location, description: f.description })
  |> to_table()
  |> print()

uniprot_search

Search UniProt using free text or structured queries. Takes a query string and an optional max results limit. Supports the full UniProt query syntax:

# Simple text search (query, [max_results])
let results = uniprot_search("BRCA1 AND organism_id:9606")
results |> map(|r| print(r.accession, r.name))

# Structured query with limit
results = uniprot_search("kinase AND reviewed:true AND organism_id:9606", 50)

# Search by gene name
results = uniprot_search("gene:TP53 AND organism_id:9606 AND reviewed:true")

# Search by GO term — Apoptosis
results = uniprot_search("go:0006915 AND organism_id:9606")
results
  |> map(|r| { accession: r.accession, name: r.name, length: r.sequence_length })
  |> to_table()
  |> print()

# Broad kinase search
results = uniprot_search("kinase AND organism_id:9606 AND reviewed:true", 100)
results |> to_table() |> print()

Features and Annotations

UniProt entries contain rich feature annotations including domains, active sites, binding sites, variants, and post-translational modifications:

# Get all features for a protein
let entry = uniprot_entry("P38398")

# Features are fetched with uniprot_features(accession)
let features = uniprot_features("P38398")

# Filter by feature type — each feature has: type, location, description
let domains = features |> filter(|f| f.type == "Domain")
domains |> map(|d| print(d.description, ":", d.location))

# Variants
let variants = features |> filter(|f| f.type == "Natural variant")
variants |> map(|v| print(v.description, "@", v.location))

# Post-translational modifications
let ptms = features |> filter(|f| f.type == "Modified residue")
ptms |> map(|p| print(p.type, "@", p.location, ":", p.description))

# GO terms are fetched with uniprot_go(accession)
let go = uniprot_go("P38398")
go
  |> filter(|g| g.aspect == "biological_process")
  |> map(|g| print(g.id, g.term))

Practical Example: Protein Comparison

# Compare properties of cancer-related proteins
let cancer_genes = ["BRCA1", "TP53", "EGFR", "KRAS", "PIK3CA", "PTEN"]

let comparison = cancer_genes |> map(|gene| {
  let results = uniprot_search("gene:{gene} AND organism_id:9606 AND reviewed:true")
  let entry = uniprot_entry(results[0].accession)

  {
    gene: gene,
    accession: entry.accession,
    length: entry.sequence_length,
    domains: uniprot_features(entry.accession) |> filter(|f| f.type == "Domain") |> len(),
    go_terms: uniprot_go(entry.accession) |> len()
  }
})

comparison |> to_table() |> print()
comparison |> to_table() |> write_csv("protein_comparison.csv")

ID Mapping

# Map between database identifiers
# UniProt accession → Ensembl gene ID
# Fetch entries and compare across accessions
let accessions = ["P38398", "P04637", "P00533"]
accessions |> map(|acc| {
  let entry = uniprot_entry(acc)
  {
    uniprot: acc,
    name: entry.name,
    genes: entry.gene_names,
    length: entry.sequence_length
  }
}) |> to_table() |> print()