First Script

In this guide you will build a complete BioLang script that reads a FASTQ file, computes quality statistics, filters reads, and writes a summary report. This covers variables, functions, pipes, file I/O, and error handling.

The Script Structure

BioLang scripts are plain text files with the .bl extension. A script is a sequence of expressions evaluated top-to-bottom. There is no main function required.

# qc_report.bl — FASTQ quality report generator
# Usage: bl run qc_report.bl -- --input reads.fastq

BioLang's standard library functions are available globally — there are no module imports needed. File I/O, FASTQ parsing, and statistical functions are all built-in.

Reading Command-Line Arguments

Scripts can accept arguments using the built-in args module. Arguments after -- on the command line are passed to the script.

# Parse command-line arguments
let input_path = args.get("--input") ?? error("Missing --input argument")
let min_quality = float(args.get("--min-quality") ?? "20.0")
let output_path = args.get("--output") ?? "qc_report.txt"

print("Processing: {input_path}")
print("Quality threshold: {min_quality}")

The ?? operator provides a default value when the left side is None. String interpolation uses curly braces inside double-quoted strings.

Reading a FASTQ File

The built-in read_fastq function reads FASTQ records from a file.

# Open and read the FASTQ file
let records = read_fastq(input_path)

# Count total records
let total = len(records)
print("Total reads: {total}")

For large files, you can process records lazily using pipes. BioLang's pipe chains are evaluated lazily by default when reading from streams, meaning the file is read only as needed.

Computing Quality Statistics

Each FASTQ record has a quality field containing Phred scores. BioLang makes it straightforward to compute statistics across all reads.

# Compute per-read average quality scores
let qualities = read_fastq(input_path)
  |> map(|record| {
    "name": record.id,
    "length": record.length,
    "mean_quality": mean_phred(record.quality),
    "gc_content": gc_content(record.seq)
  })

# Overall statistics
let mean_qual = qualities |> map(|r| r["mean_quality"]) |> mean()
let mean_len = qualities |> map(|r| r["length"]) |> mean()
let mean_gc = qualities |> map(|r| r["gc_content"]) |> mean()

print("Mean quality: {mean_qual:.2}")
print("Mean length:  {mean_len:.0}")
print("Mean GC:      {mean_gc:.3}")

Filtering Reads

Use filter to select reads meeting quality criteria. The pipe chain reads naturally from top to bottom:

# Filter reads by quality and length
let passing_reads = qualities
  |> filter(|r| r["mean_quality"] >= min_quality)
  |> filter(|r| r["length"] >= 50)

let pass_count = len(passing_reads)
let pass_rate = pass_count / total * 100.0

print("Passing reads: {pass_count} ({pass_rate:.1}%)")

Defining Helper Functions

Functions are defined with fn. They can be placed anywhere in the file, but by convention go after imports and before the main logic.

# Classify a read's quality
fn quality_tier(mean_q: Float) -> String {
  if mean_q >= 35.0 {
    "excellent"
  } else if mean_q >= 25.0 {
    "good"
  } else if mean_q >= 20.0 {
    "acceptable"
  } else {
    "poor"
  }
}

# Count reads by quality tier
let tier_counts = passing_reads
  |> map(|r| quality_tier(r["mean_quality"]))
  |> group_by(|t| t)
  |> map(|group| { "key": group["key"], "count": len(group["values"]) })

print("Quality tiers:")
for tier, count in tier_counts {
  print("  {tier}: {count}")
}

Writing Output

BioLang can write to text files, TSV, CSV, and JSON. Here we generate a tab-delimited summary report:

# Build the report lines
let header = "read_name\tlength\tmean_quality\tgc_content\ttier"
let lines = passing_reads
  |> map(|r| {
    let tier = quality_tier(r["mean_quality"])
    "{r['name']}\t{r['length']}\t{r['mean_quality']:.2}\t{r['gc_content']:.4}\t{tier}"
  })

# Write to file
write_text(output_path, join([header] + lines, "\n"))
print("Report written to {output_path}")

The Complete Script

Here is the full script assembled together. Save it as qc_report.bl and run it against any FASTQ file:

# qc_report.bl — FASTQ quality report generator

# Arguments
let input_path = args.get("--input") ?? error("Missing --input")
let min_quality = float(args.get("--min-quality") ?? "20.0")
let output_path = args.get("--output") ?? "qc_report.txt"

# Helper
fn quality_tier(mean_q: Float) -> String {
  if mean_q >= 35.0 { "excellent" }
  else if mean_q >= 25.0 { "good" }
  else if mean_q >= 20.0 { "acceptable" }
  else { "poor" }
}

# Read and compute stats
let records = read_fastq(input_path)
let total = len(records)

let qualities = read_fastq(input_path)
  |> map(|record| {
    {
      "name": record.id,
      "length": record.length,
      "mean_quality": mean_phred(record.quality),
      "gc_content": gc_content(record.seq)
    }
  })

# Filter
let passing = qualities
  |> filter(|r| r["mean_quality"] >= min_quality)
  |> filter(|r| r["length"] >= 50)

# Report
let header = "read_name\tlength\tmean_quality\tgc_content\ttier"
let lines = passing |> map(|r| {
  let tier = quality_tier(r["mean_quality"])
  "{r['name']}\t{r['length']}\t{r['mean_quality']:.2}\t{r['gc_content']:.4}\t{tier}"
})
write_text(output_path, join([header] + lines, "\n"))
print("Done. {len(passing)}/{total} reads passed. Report: {output_path}")

Running It

bl run qc_report.bl -- --input sample_R1.fastq --min-quality 25 --output report.tsv

Next Steps

You've built a realistic data processing script. Continue to Editor Setup to configure syntax highlighting and language server support in your editor.