nf-core Integration
Why nf-core in a DSL
nf-core is the largest curated collection of bioinformatics pipelines, maintained by a global community and covering over 100 workflows – from bulk RNA-seq to rare-disease variant calling. Every pipeline follows strict standards: peer review, continuous integration, container packaging, and a machine-readable parameter schema.
BioLang exposes nf-core as a set of built-in functions. There is nothing to import
and nothing to install. You can browse, search, inspect, and compare nf-core
pipelines from the same scripts that process your FASTQ files and call variants.
The five builtins – nfcore_list, nfcore_search, nfcore_info, nfcore_releases,
and nfcore_params – turn the nf-core catalog into a queryable data source that
fits naturally into pipes, filters, and record operations.
Browsing the Catalog
nfcore_list() returns every pipeline in the nf-core organization. Each entry is
a record with fields like name, description, stars, and topics.
# List all nf-core pipelines
nfcore_list()
The result is a list of records. You can pipe it through any BioLang operation:
# Show names and star counts, sorted by popularity
nfcore_list(sort_by: "stars")
|> each |p| { name: p.name, stars: p.stars }
To limit the result set, pass limit. This is useful in exploratory sessions
where you want a quick overview rather than the full catalog:
# Top 10 pipelines by most recent release date
nfcore_list(sort_by: "release", limit: 10)
|> each |p| { name: p.name, updated: p.updated }
Sorting accepts three values:
"stars"– community popularity (default)"name"– alphabetical"release"– most recently updated first
Searching Pipelines
When you know what kind of analysis you need but not which pipeline implements it,
use nfcore_search. It matches against pipeline names, descriptions, and topic tags.
# Find RNA-seq related pipelines
nfcore_search("rnaseq")
Search accepts an optional limit to cap results:
# Find variant-calling pipelines, show top 5
nfcore_search("variant", 5)
The returned records have the same shape as nfcore_list entries, so you can
chain the same downstream operations:
# Search for methylation pipelines and extract their topics
nfcore_search("methylation")
|> each |p| { name: p.name, topics: p.topics }
Pipeline Details
nfcore_info returns a single record with comprehensive metadata for a named
pipeline. The returned record contains:
| Field | Type | Description |
|---|---|---|
name | String | Short pipeline name |
full_name | String | GitHub-qualified name (nf-core/…) |
description | String | One-line summary |
stars | Int | GitHub star count |
url | String | Repository URL |
license | String | License identifier (usually MIT) |
topics | List | GitHub topic tags |
open_issues | Int | Current open issue count |
created | String | Repository creation date |
updated | String | Last push date |
# Inspect the Sarek germline/somatic variant calling pipeline
let sarek = nfcore_info("sarek")
print("Pipeline: " + sarek.full_name)
print("License: " + sarek.license)
print("Stars: " + to_string(sarek.stars))
print("Topics: " + join(sarek.topics, ", "))
You can use the metadata to make decisions in scripts. For example, checking whether a pipeline is actively maintained before committing to it:
# Only proceed if the pipeline has been updated recently
let info = nfcore_info("taxprofiler")
if info.open_issues < 50 then
print(info.name + " looks actively maintained")
Releases and Versions
Production workflows should pin to a specific release rather than tracking the
development branch. nfcore_releases returns the full release history as a list
of records, each with tag and published_at fields.
# List all rnaseq releases
nfcore_releases("rnaseq")
The list is ordered newest-first, so the head element is the latest stable version:
# Get the latest stable release tag for rnaseq
let latest = nfcore_releases("rnaseq")
|> first
print("Latest release: " + latest.tag + " (" + latest.published_at + ")")
You can also search for a specific version range or find how many releases a pipeline has had – a rough proxy for maturity:
# Count total releases for sarek
let release_count = nfcore_releases("sarek") |> len
print("sarek has " + to_string(release_count) + " releases")
# Find all 3.x releases of rnaseq
nfcore_releases("rnaseq")
|> filter |r| starts_with(r.tag, "3.")
|> each |r| r.tag
Parameter Schemas
Every nf-core pipeline publishes a JSON schema describing its configurable
parameters – input paths, reference genomes, trimming options, resource limits,
and more. nfcore_params fetches and parses this schema, returning a record
whose keys are parameter group names and whose values are records of individual
parameters.
# Fetch the full parameter schema for rnaseq
let params = nfcore_params("rnaseq")
The top-level keys are group names such as "input_output_options",
"reference_genome_options", or "trimming_options". Each group is a record
of parameter entries:
# List all parameter groups
nfcore_params("rnaseq")
|> keys
# Inspect the reference genome options
nfcore_params("rnaseq").reference_genome_options
|> keys
This is particularly useful for validating that your configuration covers the required parameters before submitting a long-running pipeline:
# Check whether a specific parameter exists
let params = nfcore_params("sarek")
let genome_opts = params.reference_genome_options
print(keys(genome_opts))
Example: Finding the Right Pipeline
A common task: you need to process single-cell RNA-seq data but are unsure which nf-core pipeline to use. Search the catalog, compare candidates, and pick the best fit.
# Step 1: Search for single-cell pipelines
let candidates = nfcore_search("single-cell")
# Step 2: Show names, stars, and descriptions side by side
candidates
|> each |p| {
name: p.name,
stars: p.stars,
description: p.description
}
|> sort_by |a, b| b.stars - a.stars
|>> each |p| print(p.name + " (" + to_string(p.stars) + " stars): " + p.description)
# Step 3: Get detailed info on the top candidate
let top = candidates
|> sort_by |a, b| b.stars - a.stars
|> first
let info = nfcore_info(top.name)
print("Selected: " + info.full_name)
print("License: " + info.license)
print("Topics: " + join(info.topics, ", "))
# Step 4: Check the latest release
let latest = nfcore_releases(top.name) |> first
print("Latest release: " + latest.tag + " published " + latest.published_at)
# Step 5: Preview the parameter groups
nfcore_params(top.name)
|> keys
|>> each |group| print(" - " + group)
This entire exploration runs in a single script – no switching between a browser, the nf-core CLI tool, and your pipeline configuration files.
Example: Auditing Pipeline Parameters
Before launching a whole-genome sequencing analysis with Sarek, you want to confirm that your reference genome configuration covers every required parameter and that no deprecated options have crept into your config.
# Fetch the Sarek parameter schema
let params = nfcore_params("sarek")
# Extract all reference genome parameters
let ref_params = params.reference_genome_options
# Your current configuration
let my_config = {
genome: "GRCh38",
igenomes_base: "s3://ngi-igenomes/igenomes",
fasta: nil,
fasta_fai: nil
}
# Check for parameters in the schema that you have not configured
let schema_keys = keys(ref_params)
let config_keys = keys(my_config)
let missing = schema_keys
|> filter |k| not(contains(config_keys, k))
if len(missing) > 0 then
print("WARNING: unset reference genome parameters:")
missing |>> each |k| print(" - " + k)
else
print("All reference genome parameters are covered.")
# Check for keys in your config that are not in the schema (possibly deprecated)
let extra = config_keys
|> filter |k| not(contains(schema_keys, k))
if len(extra) > 0 then
print("WARNING: config keys not in current schema (possibly deprecated):")
extra |>> each |k| print(" - " + k)
This catches configuration drift early – before you discover it four hours into a whole-genome analysis run.
Example: Building a Pipeline Registry
For a core facility managing dozens of active projects, it helps to maintain a catalog of available pipelines organized by topic. This script builds a topic index from the full nf-core catalog.
# Fetch all pipelines
let all_pipelines = nfcore_list(sort_by: "stars")
# Build a topic-to-pipeline mapping
# Flatten: for each pipeline, emit one record per topic tag
let entries = all_pipelines
|> flat_map |p| p.topics |> each |t| { topic: t, name: p.name, stars: p.stars }
# Group by topic
let topics = entries
|> group_by |e| e.topic
# Print a summary: topics with 3 or more pipelines
keys(topics)
|> filter |t| len(topics[t]) >= 3
|> sort
|>> each |t| {
let pipelines = topics[t]
|> sort_by |a, b| b.stars - a.stars
|> each |p| p.name
print(t + " (" + to_string(len(pipelines)) + " pipelines): " + join(pipelines, ", "))
}
You can extend this to generate a Markdown report, push to a shared wiki, or feed into a lab notebook:
# Export the top 20 pipelines as a tab-separated table
let header = "name\tstars\ttopics\tlatest_release"
print(header)
nfcore_list(sort_by: "stars", limit: 20)
|>> each |p| {
let latest = nfcore_releases(p.name) |> first
let tag = if latest != nil then latest.tag else "unreleased"
let topic_str = join(p.topics, ";")
print(p.name + "\t" + to_string(p.stars) + "\t" + topic_str + "\t" + tag)
}
Redirect stdout to a file and you have a pipeline inventory that updates every time you run the script – no manual curation needed.
Parsing Nextflow Files
BioLang can parse Nextflow .nf files directly with nf_parse(). This function
reads the file and extracts its structure without requiring a Nextflow runtime.
# Parse a Nextflow pipeline file
let parsed = nf_parse("main.nf")
The result is a record with five fields:
| Field | Type | Content |
|---|---|---|
params | Record | Parameter name-value pairs |
processes | List[Record] | Process name, inputs, outputs, script, container, cpus, memory |
includes | List[Record] | Include name, alias, source path |
workflow | List[String] | Workflow block lines |
dsl | String | DSL version (“DSL1” or “DSL2”) |
You can pipe the parsed structure through any BioLang operation:
# List all processes and their containers
let parsed = nf_parse("variant_calling.nf")
parsed.processes
|> each |p| {
print(p.name + ": " + if p.container != nil then p.container else "no container")
}
# Extract all parameters with their default values
let parsed = nf_parse("rnaseq.nf")
keys(parsed.params)
|> sort
|>> each |k| print(k + " = " + to_string(parsed.params[k]))
Generating BioLang Code
The nf_to_bl function takes a parsed Nextflow record and generates BioLang
pipeline code as a string:
# Parse and convert a Nextflow pipeline
let parsed = nf_parse("rnaseq.nf")
let bl_code = nf_to_bl(parsed)
# Print the generated code
print(bl_code)
# Save to a file
write_text("rnaseq.bl", bl_code)
The generated code maps Nextflow constructs to BioLang equivalents:
params.*become top-level variablesprocessblocks becomepipeline { stage ... }blocks- Container, CPU, and memory directives are preserved as stage annotations
- Workflow call ordering is preserved as comments
This is a starting point – edit the generated skeleton to add BioLang-specific features like pipe chains, error handling, and parallel execution.
# Full workflow: parse, generate, and review
let parsed = nf_parse("sarek_main.nf")
# Show what was extracted
print("Found " + to_string(len(parsed.processes)) + " processes")
print("Found " + to_string(len(keys(parsed.params))) + " parameters")
# Generate BioLang code
let bl_code = nf_to_bl(parsed)
write_text("sarek.bl", bl_code)
print("Generated sarek.bl")