SQLite

7 builtins for querying and writing SQLite databases. Annotation databases, sample tracking, result caching — all via embedded SQLite with zero server setup. Query results return as Tables, ready for piping through BioLang’s table ops.

Opening a Database

sqlite(path?) String? → DbHandle

Opens (or creates) a SQLite database. Returns a handle you pass to sql() and other builtins. Call with no arguments for an in-memory database.

# Open a file-based database
let db = sqlite("annotations.db")

# In-memory database (for temporary work)
let scratch = sqlite()

Querying

sql(db, query, params?) DbHandle, String, List? → Table | Int

Executes a SQL statement. SELECT / PRAGMA / WITH queries return a Table. Write statements (INSERT, UPDATE, DELETE, CREATE TABLE) return the number of affected rows as an integer. Use ? placeholders for parameterized queries.

let db = sqlite("clinvar.db")

# Parameterized SELECT — returns a Table
let genes = sql(db,
  "SELECT symbol, chrom, start, end FROM genes WHERE chrom = ? AND biotype = ?",
  ["chr17", "protein_coding"])

# Pipe into BioLang table ops
genes
  |> filter(|g| g.start > 40000000)
  |> sort_by(|g| g.start)
  |> print()

# CREATE TABLE
sql(db, "CREATE TABLE IF NOT EXISTS results (
  sample TEXT, snps INTEGER, indels INTEGER, ti_tv REAL
)")

# INSERT with parameters
sql(db, "INSERT INTO results VALUES (?, ?, ?, ?)",
    ["tumor_01", 1234, 456, 2.08])

# UPDATE — returns rows affected
let changed = sql(db, "UPDATE results SET snps = ? WHERE sample = ?",
                  [1300, "tumor_01"])
print(str(changed) + " rows updated")

Bulk Insert

sql_insert(db, table_name, data) DbHandle, String, Table | List → Int

Bulk inserts data into a table. Accepts a Table or a list of records. Uses a transaction for performance. Returns the number of rows inserted.

let db = sqlite("results.db")

# Insert from a Table (e.g., from read_tsv or sql query)
let variant_stats = read_tsv("variant_stats.tsv")
sql_insert(db, "variants", variant_stats)

# Insert from a list of records
let samples = [
  {id: "S1", depth: 42.3, mapped_pct: 98.1},
  {id: "S2", depth: 38.7, mapped_pct: 97.5},
  {id: "S3", depth: 45.1, mapped_pct: 99.0},
]
let inserted = sql_insert(db, "qc_results", samples)
print(str(inserted) + " rows inserted")

Metadata

sql_tables(db) DbHandle → List

Lists all table names in the database.

let db = sqlite("annotations.db")
let tables = sql_tables(db)
# ["genes", "transcripts", "exons", "variants"]
sql_schema(db, table_name) DbHandle, String → Table

Returns column metadata (name, type, not-null, primary key) as a Table.

let schema = sql_schema(db, "genes")
print(schema)
#  cid | name   | type    | notnull | pk
# -----+--------+---------+---------+-----
#  0   | symbol | TEXT    | false   | false
#  1   | chrom  | TEXT    | false   | false
#  2   | start  | INTEGER | false   | false
#  3   | end    | INTEGER | false   | false

Utility

sql_close(db) DbHandle → Nil

Hint to close the database connection. Connections auto-close when the handle goes out of scope, so this is optional.

is_db(value) Any → Bool

Returns true if the value is a SQLite database handle.

let db = sqlite("test.db")
is_db(db)     # true
is_db("hello") # false

Pipeline Example

Store pipeline results in SQLite for downstream analysis or cross-run comparison.

let db = sqlite("project_results.db")

sql(db, "CREATE TABLE IF NOT EXISTS qc (
  sample TEXT PRIMARY KEY,
  total_reads INTEGER,
  pass_reads INTEGER,
  pass_rate REAL,
  mean_quality REAL
)")

pipeline fastq_qc(sample_id, fastq_path) {
  stage stats {
    let reads = read_fastq(fastq_path) |> collect()
    let total = len(reads)
    let passing = reads |> filter(|r| mean_phred(r.quality) >= 25)
    let pass_count = len(passing)
    let mean_q = reads |> map(|r| mean_phred(r.quality)) |> mean()

    sql(db, "INSERT OR REPLACE INTO qc VALUES (?, ?, ?, ?, ?)",
        [sample_id, total, pass_count, pass_count / total * 100.0, mean_q])

    {total: total, passing: pass_count, mean_quality: mean_q}
  }
}

# Run for each sample
let samples = [
  {id: "S1", path: "S1_R1.fq.gz"},
  {id: "S2", path: "S2_R1.fq.gz"},
  {id: "S3", path: "S3_R1.fq.gz"},
]
samples |> each(|s| fastq_qc(s.id, s.path))

# Query aggregated results
let summary = sql(db, "SELECT * FROM qc ORDER BY pass_rate DESC")
print(summary)

Summary

BuiltinReturnsDescription
sqlite(path?)DbHandleOpen/create SQLite database
sql(db, query, params?)Table | IntExecute SQL; SELECT returns Table, writes return row count
sql_insert(db, table, data)IntBulk insert from Table or list of records
sql_tables(db)ListList table names
sql_schema(db, table)TableColumn metadata (name, type, constraints)
sql_close(db)NilClose connection (optional, auto-closes)
is_db(value)BoolType check for database handle