First Script
In this guide you will build a complete BioLang script that reads a FASTQ file, computes quality statistics, filters reads, and writes a summary report. This covers variables, functions, pipes, file I/O, and error handling.
The Script Structure
BioLang scripts are plain text files with the .bl extension. A script is a
sequence of expressions evaluated top-to-bottom. There is no main function
required.
# qc_report.bl — FASTQ quality report generator
# Usage: bl run qc_report.bl -- --input reads.fastq
BioLang's standard library functions are available globally — there are no module imports needed. File I/O, FASTQ parsing, and statistical functions are all built-in.
Reading Command-Line Arguments
Scripts can accept arguments using the built-in args module. Arguments after
-- on the command line are passed to the script.
# Parse command-line arguments
let input_path = args.get("--input") ?? error("Missing --input argument")
let min_quality = float(args.get("--min-quality") ?? "20.0")
let output_path = args.get("--output") ?? "qc_report.txt"
print("Processing: {input_path}")
print("Quality threshold: {min_quality}")
The ?? operator provides a default value when the left side is None.
String interpolation uses curly braces inside double-quoted strings.
Reading a FASTQ File
The built-in read_fastq function reads FASTQ records from a file.
# Open and read the FASTQ file
let records = read_fastq(input_path)
# Count total records
let total = len(records)
print("Total reads: {total}")
For large files, you can process records lazily using pipes. BioLang's pipe chains are evaluated lazily by default when reading from streams, meaning the file is read only as needed.
Computing Quality Statistics
Each FASTQ record has a quality field containing Phred scores. BioLang
makes it straightforward to compute statistics across all reads.
# Compute per-read average quality scores
let qualities = read_fastq(input_path)
|> map(|record| {
"name": record.id,
"length": record.length,
"mean_quality": mean_phred(record.quality),
"gc_content": gc_content(record.seq)
})
# Overall statistics
let mean_qual = qualities |> map(|r| r["mean_quality"]) |> mean()
let mean_len = qualities |> map(|r| r["length"]) |> mean()
let mean_gc = qualities |> map(|r| r["gc_content"]) |> mean()
print("Mean quality: {mean_qual:.2}")
print("Mean length: {mean_len:.0}")
print("Mean GC: {mean_gc:.3}")
Filtering Reads
Use filter to select reads meeting quality criteria. The pipe chain reads
naturally from top to bottom:
# Filter reads by quality and length
let passing_reads = qualities
|> filter(|r| r["mean_quality"] >= min_quality)
|> filter(|r| r["length"] >= 50)
let pass_count = len(passing_reads)
let pass_rate = pass_count / total * 100.0
print("Passing reads: {pass_count} ({pass_rate:.1}%)")
Defining Helper Functions
Functions are defined with fn. They can be placed anywhere in the file, but
by convention go after imports and before the main logic.
# Classify a read's quality
fn quality_tier(mean_q: Float) -> String {
if mean_q >= 35.0 {
"excellent"
} else if mean_q >= 25.0 {
"good"
} else if mean_q >= 20.0 {
"acceptable"
} else {
"poor"
}
}
# Count reads by quality tier
let tier_counts = passing_reads
|> map(|r| quality_tier(r["mean_quality"]))
|> group_by(|t| t)
|> map(|group| { "key": group["key"], "count": len(group["values"]) })
print("Quality tiers:")
for tier, count in tier_counts {
print(" {tier}: {count}")
}
Writing Output
BioLang can write to text files, TSV, CSV, and JSON. Here we generate a tab-delimited summary report:
# Build the report lines
let header = "read_name\tlength\tmean_quality\tgc_content\ttier"
let lines = passing_reads
|> map(|r| {
let tier = quality_tier(r["mean_quality"])
"{r['name']}\t{r['length']}\t{r['mean_quality']:.2}\t{r['gc_content']:.4}\t{tier}"
})
# Write to file
write_text(output_path, join([header] + lines, "\n"))
print("Report written to {output_path}")
The Complete Script
Here is the full script assembled together. Save it as qc_report.bl and
run it against any FASTQ file:
# qc_report.bl — FASTQ quality report generator
# Arguments
let input_path = args.get("--input") ?? error("Missing --input")
let min_quality = float(args.get("--min-quality") ?? "20.0")
let output_path = args.get("--output") ?? "qc_report.txt"
# Helper
fn quality_tier(mean_q: Float) -> String {
if mean_q >= 35.0 { "excellent" }
else if mean_q >= 25.0 { "good" }
else if mean_q >= 20.0 { "acceptable" }
else { "poor" }
}
# Read and compute stats
let records = read_fastq(input_path)
let total = len(records)
let qualities = read_fastq(input_path)
|> map(|record| {
{
"name": record.id,
"length": record.length,
"mean_quality": mean_phred(record.quality),
"gc_content": gc_content(record.seq)
}
})
# Filter
let passing = qualities
|> filter(|r| r["mean_quality"] >= min_quality)
|> filter(|r| r["length"] >= 50)
# Report
let header = "read_name\tlength\tmean_quality\tgc_content\ttier"
let lines = passing |> map(|r| {
let tier = quality_tier(r["mean_quality"])
"{r['name']}\t{r['length']}\t{r['mean_quality']:.2}\t{r['gc_content']:.4}\t{tier}"
})
write_text(output_path, join([header] + lines, "\n"))
print("Done. {len(passing)}/{total} reads passed. Report: {output_path}")
Running It
bl run qc_report.bl -- --input sample_R1.fastq --min-quality 25 --output report.tsv
Next Steps
You've built a realistic data processing script. Continue to Editor Setup to configure syntax highlighting and language server support in your editor.