Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 13: BioContainers Integration

Reproducible bioinformatics demands pinned software versions. A variant calling pipeline that works today must produce identical results next year, on a different machine, with the same tool versions. The BioContainers project addresses this by packaging over 9,000 bioinformatics tools as container images hosted on registries like quay.io and Docker Hub.

BioLang gives you native access to the BioContainers registry. Four built-in functions let you search for tools, discover popular packages, inspect version histories, and retrieve exact container image URIs – all without leaving your script. No imports are needed.

Searching for Tools

biocontainers_search queries the BioContainers registry by name or keyword. It returns a list of records, each describing a matching tool.

# Find all samtools-related containers
let results = biocontainers_search("samtools")
# results => [
#   {name: "samtools", description: "Tools for manipulating NGS alignments...",
#    organization: "biocontainers", version_count: 42,
#    latest_version: "1.19--h50ea8bc_0",
#    latest_image: "quay.io/biocontainers/samtools:1.19--h50ea8bc_0"},
#   {name: "htslib", description: "C library for high-throughput sequencing...", ...},
#   ...
# ]

results |> each(|tool| {
  print(tool.name + " (" + str(tool.version_count) + " versions)")
  print("  Latest: " + tool.latest_version)
})

The default limit is 25 results. Pass a second argument to narrow or widen the search.

# Top 5 matches for short-read aligners
let aligners = biocontainers_search("bwa", 5)

aligners |> each(|a| print(a.name + " => " + a.latest_image))

Search terms are matched against tool names and descriptions, so broader queries work too.

# Find tools related to RNA-seq quantification
let quant_tools = biocontainers_search("salmon rna-seq")

quant_tools
  |> filter(|t| t.version_count > 5)
  |> each(|t| print(t.name + ": " + t.description))

biocontainers_popular returns the most-pulled tools in the registry. This is useful for discovering which tools the community relies on and for auditing whether your pipeline uses well-maintained software.

let top20 = biocontainers_popular()

print("Top 20 BioContainers tools:")
top20 |> each(|t| print("  " + t.name + " - " + t.latest_version))

Pass a limit to retrieve more.

# Top 50 most popular bioinformatics containers
let top50 = biocontainers_popular(50)

# Which of our pipeline tools are in the top 50?
let our_tools = ["samtools", "bcftools", "bwa-mem2", "gatk4", "picard"]
let popular_names = top50 |> map(|t| t.name)

our_tools |> each(|tool| {
  let found = popular_names |> find(|n| n == tool)
  if found != None then
    print(tool + " is in the top 50")
  else
    print(tool + " is NOT in the top 50")
})

Tool Details

biocontainers_info returns a detailed record for a single tool, including its full version history with per-version container images.

let info = biocontainers_info("samtools")
# info => {
#   name: "samtools",
#   description: "Tools for manipulating NGS alignments...",
#   organization: "biocontainers",
#   aliases: ["samtools"],
#   versions: [
#     {version: "1.19--h50ea8bc_0",
#      images: [{registry: "quay.io", image: "quay.io/biocontainers/samtools:1.19--h50ea8bc_0",
#                type: "Docker", size: 14250000}]},
#     {version: "1.18--h50ea8bc_0", images: [...]},
#     ...
#   ]
# }

print(info.name + " by " + info.organization)
print(str(len(info.versions)) + " versions available")

# List the 5 most recent versions
info.versions
  |> take(5)
  |> each(|v| {
    let image = v.images |> first()
    print("  " + v.version + " (" + image.registry + ", "
          + str(image.size / 1000000) + " MB)")
  })

The images list for each version may contain entries from multiple registries or image types (Docker, Singularity). Filter by registry or type if your infrastructure requires a specific format.

# Find Singularity images for deepvariant
let dv = biocontainers_info("deepvariant")

dv.versions |> each(|v| {
  let singularity = v.images |> filter(|img| img.type == "Singularity")
  if len(singularity) > 0 then
    print(v.version + " has Singularity image")
})

Version Management

biocontainers_versions returns a flat list of all versions for a tool, each with a list of full image URI strings. This is the function to use when you need to pin a specific version in a pipeline manifest.

let versions = biocontainers_versions("gatk4")
# versions => [
#   {version: "4.5.0.0--py310hdfd78af_0",
#    images: ["quay.io/biocontainers/gatk4:4.5.0.0--py310hdfd78af_0"]},
#   {version: "4.4.0.0--py310hdfd78af_0",
#    images: ["quay.io/biocontainers/gatk4:4.4.0.0--py310hdfd78af_0"]},
#   ...
# ]

# Find the latest GATK 4.4.x release
let gatk44 = versions
  |> filter(|v| starts_with(v.version, "4.4"))
  |> first()

print("Pinning GATK to: " + gatk44.images[0])

You can use this to check whether a specific version exists before committing to it in a pipeline definition.

# Verify that bcftools 1.18 is available
let bc_versions = biocontainers_versions("bcftools")
let target = bc_versions |> find(|v| starts_with(v.version, "1.18"))

if target != None then
  print("bcftools 1.18 available: " + target.images[0])
else
  print("bcftools 1.18 not found in BioContainers")

Example: Building a Reproducible Tool Manifest

A variant calling pipeline needs exact container images for every tool. Use the BioContainers builtins to resolve each tool to a pinned image URI and export the manifest.

# Tools required for a germline variant calling pipeline
let required = [
  {name: "bwa-mem2",  min_version: "2.2"},
  {name: "samtools",  min_version: "1.18"},
  {name: "gatk4",     min_version: "4.4"},
  {name: "bcftools",  min_version: "1.18"},
]

let manifest = required |> map(|req| {
  let versions = biocontainers_versions(req.name)

  # Find the newest version that satisfies the minimum
  let matching = versions
    |> filter(|v| starts_with(v.version, req.min_version))

  if len(matching) == 0 then {
    print("WARNING: no " + req.name + " >= " + req.min_version + " found")
    {tool: req.name, version: "MISSING", image: "MISSING"}
  } else {
    let chosen = matching |> first()
    {tool: req.name, version: chosen.version, image: chosen.images[0]}
  }
})

# Print the resolved manifest
print("Variant Calling Pipeline - Tool Manifest")
print("=========================================")
manifest |> each(|m| {
  print(m.tool + ":")
  print("  version: " + m.version)
  print("  image:   " + m.image)
})

# Export as structured data
manifest |> write_json("pipeline_manifest.json")

This script produces a lockfile-style manifest that can be checked into version control alongside the pipeline definition.

Example: Tool Discovery for a New Analysis

When starting a new analysis type, you need to survey what tools are available. Here we explore the methylation analysis landscape.

# What methylation tools exist in BioContainers?
let methyl_tools = biocontainers_search("methylation", 50)

print(str(len(methyl_tools)) + " methylation-related tools found")
print("")

# Group by version count to find well-maintained tools
let mature = methyl_tools
  |> filter(|t| t.version_count >= 5)
  |> sort_by(|t| -t.version_count)

let new_tools = methyl_tools
  |> filter(|t| t.version_count < 3)

print("Mature tools (" + str(len(mature)) + "):")
mature |> each(|t| {
  print("  " + t.name + " - " + str(t.version_count) + " versions"
        + " (latest: " + t.latest_version + ")")
  print("    " + t.description)
})

print("")
print("Newer tools (" + str(len(new_tools)) + "):")
new_tools |> take(10) |> each(|t| {
  print("  " + t.name + " - " + t.latest_version)
})

# Deep dive into the top candidate
let bismark = biocontainers_info("bismark")
print("")
print("Bismark detail:")
print("  " + bismark.description)
print("  " + str(len(bismark.versions)) + " releases")
print("  Aliases: " + join(bismark.aliases, ", "))

# Check image sizes across versions
bismark.versions |> take(5) |> each(|v| {
  let docker = v.images |> filter(|img| img.type == "Docker") |> first()
  if docker != None then
    print("  " + v.version + ": " + str(docker.size / 1000000) + " MB")
})

Example: Container Image Audit

For an existing pipeline, verify that every tool has a valid BioContainers image and flag any that are outdated.

# Current pipeline tools and their pinned versions
let pinned = [
  {tool: "bwa-mem2",  version: "2.2.1--hd03093a_2"},
  {tool: "samtools",  version: "1.17--h50ea8bc_0"},
  {tool: "gatk4",     version: "4.3.0.0--py310hdfd78af_0"},
  {tool: "bcftools",  version: "1.17--h3cc50cf_1"},
  {tool: "multiqc",   version: "1.14--pyhdfd78af_0"},
  {tool: "fastp",     version: "0.23.2--hb7a2d85_2"},
]

let audit = pinned |> map(|entry| {
  let info = biocontainers_info(entry.tool)
  let all_versions = info.versions |> map(|v| v.version)

  # Check if pinned version still exists
  let exists = all_versions |> find(|v| v == entry.version) != None

  # Check if there is a newer version
  let latest = info.versions |> first()
  let is_latest = latest.version == entry.version

  # Count how many versions behind
  let versions_behind = if is_latest then
    0
  else {
    let idx = all_versions
      |> enumerate()
      |> find(|pair| pair.value == entry.version)
    if idx != None then idx.index else -1
  }

  {
    tool: entry.tool,
    pinned: entry.version,
    latest: latest.version,
    exists: exists,
    is_latest: is_latest,
    versions_behind: versions_behind,
  }
})

# Report
print("Pipeline Container Audit")
print("========================")

let missing = audit |> filter(|a| not a.exists)
let outdated = audit |> filter(|a| a.exists and not a.is_latest)
let current = audit |> filter(|a| a.is_latest)

if len(missing) > 0 then {
  print("")
  print("MISSING (pinned version no longer in registry):")
  missing |> each(|a| print("  " + a.tool + " " + a.pinned
                             + " => latest: " + a.latest))
}

if len(outdated) > 0 then {
  print("")
  print("OUTDATED:")
  outdated |> each(|a| print("  " + a.tool + " " + a.pinned
                              + " => " + a.latest
                              + " (" + str(a.versions_behind) + " versions behind)"))
}

if len(current) > 0 then {
  print("")
  print("CURRENT:")
  current |> each(|a| print("  " + a.tool + " " + a.pinned))
}

print("")
print(str(len(current)) + " current, "
      + str(len(outdated)) + " outdated, "
      + str(len(missing)) + " missing")

audit |> write_json("container_audit.json")

Summary

BioLang’s four BioContainers builtins – biocontainers_search, biocontainers_popular, biocontainers_info, and biocontainers_versions – bring the full BioContainers registry into your scripts as native data. Use them to discover tools, pin container images for reproducibility, audit existing pipelines, and explore new analysis domains. Combined with BioLang’s pipes and collection operations, a few lines of code replace manual registry browsing and ad hoc version tracking.