Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix E: BioLang Statistics Quick Reference

Every statistical function in BioLang, organized for quick lookup.

This appendix lists BioLang’s statistical builtins by category. For each function, you will find the signature, a brief description, and a one-liner example. Functions are listed within each category in roughly the order you would learn them.

Descriptive Statistics

FunctionDescriptionExample
mean(x)Arithmetic meanmean([2, 4, 6])4.0
median(x)Middle valuemedian([1, 3, 7])3.0
sd(x)Sample standard deviationsd([2, 4, 6])2.0
var(x)Sample variancevar([2, 4, 6])4.0
min(x)Minimum valuemin([3, 1, 4])1
max(x)Maximum valuemax([3, 1, 4])4
sum(x)Sum of all valuessum([1, 2, 3])6
len(x)Number of elementslen([1, 2, 3])3
quantile(x, p)p-th quantilequantile([1,2,3,4,5], 0.75)4.0
summary(x)Summary statisticssummary(data) → record with min, Q1, median, Q3, max, mean
round(x, digits)Round to n decimal placesround(3.14159, 2)3.14
abs(x)Absolute valueabs(-3.5)3.5
sqrt(x)Square rootsqrt(16)4.0
log2(x)Base-2 logarithmlog2(8)3.0
log10(x)Base-10 logarithmlog10(1000)3.0

Probability Distributions

Each distribution has four functions following the d/p/q/r convention: d (density/mass), p (cumulative probability), q (quantile/inverse CDF), r (random samples). All parameters are positional.

Continuous

# Normal: dnorm(x, mu, sigma), pnorm(x, mu, sigma), qnorm(p, mu, sigma), rnorm(n, mu, sigma)
dnorm(0, 0, 1)        # Density at x=0 for standard normal
pnorm(1.96, 0, 1)     # P(X <= 1.96) ≈ 0.975
qnorm(0.975, 0, 1)    # Quantile at p=0.975 ≈ 1.96
rnorm(100, 0, 1)      # Generate 100 standard normal values

# Student's t: dt(x, df), pt(x, df), qt(p, df), rt(n, df)
dt(0, 10)
pt(2.228, 10)
qt(0.975, 10)
rt(100, 10)

# F: df(x, df1, df2), pf(x, df1, df2), qf(p, df1, df2), rf(n, df1, df2)
df(1.5, 5, 20)
pf(3.0, 5, 20)
qf(0.95, 5, 20)
rf(100, 5, 20)

# Chi-square: dchisq(x, df), pchisq(x, df), qchisq(p, df), rchisq(n, df)
dchisq(5.0, 5)
pchisq(11.07, 5)
qchisq(0.95, 5)
rchisq(100, 5)

# Beta: dbeta(x, alpha, beta), pbeta(x, alpha, beta), qbeta(p, alpha, beta), rbeta(n, alpha, beta)
dbeta(0.3, 2, 5)
pbeta(0.5, 2, 5)
qbeta(0.95, 2, 5)
rbeta(100, 2, 5)

# Gamma: dgamma(x, shape, rate), pgamma(x, shape, rate), qgamma(p, shape, rate), rgamma(n, shape, rate)
dgamma(1.0, 2, 1)
pgamma(3.0, 2, 1)
qgamma(0.95, 2, 1)
rgamma(100, 2, 1)

# Exponential: dexp(x, rate), pexp(x, rate), qexp(p, rate), rexp(n, rate)
dexp(1.0, 0.5)
pexp(2.0, 0.5)
qexp(0.95, 0.5)
rexp(100, 0.5)

# Log-Normal: dlnorm(x, mu, sigma), plnorm(x, mu, sigma), qlnorm(p, mu, sigma), rlnorm(n, mu, sigma)
dlnorm(1.0, 0, 1)
plnorm(2.0, 0, 1)
qlnorm(0.95, 0, 1)
rlnorm(100, 0, 1)

# Uniform: dunif(x, min, max), punif(x, min, max), qunif(p, min, max), runif(n, min, max)
dunif(0.5, 0, 1)
punif(0.7, 0, 1)
qunif(0.5, 0, 1)
runif(100, 0, 1)

Discrete

# Binomial: dbinom(k, n, p), pbinom(k, n, p), qbinom(q, n, p), rbinom(size, n, p)
dbinom(10, 20, 0.5)       # P(X = 10)
pbinom(10, 20, 0.5)       # P(X <= 10)
qbinom(0.5, 20, 0.5)      # Smallest k with P(X <= k) >= 0.5
rbinom(100, 20, 0.5)      # Generate 100 random values

# Poisson: dpois(k, lambda), ppois(k, lambda), qpois(q, lambda), rpois(n, lambda)
dpois(5, 5.0)
ppois(7, 5.0)
qpois(0.95, 5.0)
rpois(100, 5.0)

# Negative Binomial: dnbinom(k, mu, size), pnbinom(k, mu, size), qnbinom(q, mu, size), rnbinom(n, mu, size)
dnbinom(8, 10, 5)
pnbinom(15, 10, 5)
qnbinom(0.95, 10, 5)
rnbinom(100, 10, 5)

# Hypergeometric: dhyper(k, K, N_minus_K, n), phyper(k, K, N_minus_K, n), qhyper(q, K, N_minus_K, n), rhyper(size, K, N_minus_K, n)
dhyper(5, 200, 19800, 500)
phyper(5, 200, 19800, 500)
qhyper(0.95, 200, 19800, 500)
rhyper(100, 200, 19800, 500)

Hypothesis Tests

Comparing Two Groups

FunctionDescriptionExample
ttest(a, b)Welch’s two-sample t-testttest(ctrl, treat)
ttest_paired(a, b)Paired t-testttest_paired(before, after)
ttest_one(x, mu)One-sample t-testttest_one(diffs, 0)
wilcoxon(a, b)Wilcoxon rank-sum / signed-rank testwilcoxon(ctrl, treat)

Comparing Multiple Groups

FunctionDescriptionExample
anova(groups)One-way ANOVA (auto-detects Welch/Kruskal-Wallis)anova([g1, g2, g3])

Post-hoc comparisons: Follow a significant ANOVA with pairwise tests and p-value correction:

# Pairwise t-tests with Bonferroni correction
let pvals = []
for i in 0..len(groups) {
  for j in (i+1)..len(groups) {
    let result = ttest(groups[i], groups[j])
    pvals = pvals + [result.p_value]
  }
}
let adjusted = p_adjust(pvals, "bonferroni")

Categorical Data

FunctionDescriptionExample
chi_square(observed, expected)Chi-square testchi_square(observed, expected)
fisher_exact(a, b, c, d)Fisher’s exact test (2x2)fisher_exact(10, 5, 3, 12)

Effect sizes for categorical data are computed inline:

# Odds ratio
let or = (a * d) / (b * c)

# Relative risk
let rr = (a / (a + b)) / (c / (c + d))

Correlation

FunctionDescriptionExample
cor(x, y)Pearson correlationcor(expr, meth)
spearman(x, y)Spearman rank correlationspearman(expr, meth)
kendall(x, y)Kendall tau correlationkendall(expr, meth)

Regression

FunctionDescriptionExample
lm(y, x)Simple linear regressionlm(expression, age)
lm(y, [x1, x2, ...])Multiple linear regressionlm(expression, [age, sex, batch])
glm(formula, table, family)Generalized linear modelglm("y ~ x", data, "binomial")

Supported GLM families: "binomial" (logistic), "poisson", "negbin" (negative binomial).

Access model results:

let model = lm(expression, [age, sex, batch])
print("R-squared: " + str(round(model.r_squared, 3)))
let residuals = model.residuals
qq_plot(residuals, {title: "Residual Normality Check"})

Multiple Testing Correction

FunctionDescriptionExample
p_adjust(pvals, method)Adjust p-valuesp_adjust(pvals, "BH")

Supported methods: "bonferroni", "holm", "BH" (Benjamini-Hochberg), "BY" (Benjamini-Yekutieli).

# Typical genomics workflow: test all genes, then correct
let pvals = genes |> map(|g| ttest(g.ctrl, g.treat).p_value)
let padj = p_adjust(pvals, "BH")
let sig_count = padj |> filter(|p| p < 0.05) |> len()
print("Significant genes (FDR < 0.05): " + str(sig_count))

Dimensionality Reduction and Clustering

FunctionDescriptionExample
pca(data)Principal Component Analysispca(expr_matrix)
kmeans(data, k)k-means clusteringkmeans(data, 3)
hclust(data, method)Hierarchical clusteringhclust(data, "ward")
dbscan(data, eps, min_pts)DBSCAN clusteringdbscan(data, 0.5, 5)
# PCA then clustering
let result = pca(expr_matrix)
pca_plot(result, {title: "Sample PCA"})

# Estimate optimal k via silhouette
let scores = range(2, 10) |> map(|k| {
  let km = kmeans(data, k)
  km.silhouette
})

Statistical Visualization

SVG Plots (file output)

FunctionDescriptionExample
histogram(data, options)Histogramhistogram(data, {bins: 30, title: "Distribution"})
density(data, options)Kernel density estimatedensity(data, {title: "Density"})
violin(data, options)Violin plotviolin([g1, g2], {labels: ["A", "B"], title: "Groups"})
heatmap(table, options)Heatmap with optional clusteringheatmap(matrix, {cluster_rows: true, title: "Expression"})
volcano(table, options)Volcano plot for DE resultsvolcano(de_results, {fc_threshold: 1.0, title: "DE"})
manhattan(table, options)Manhattan plot for GWASmanhattan(gwas_results, {significance_line: 5e-8, title: "GWAS"})
qq_plot(data, options)Q-Q plot (normality check)qq_plot(residuals, {title: "Normality Check"})
forest_plot(table, options)Forest plot for meta-analysisforest_plot(meta_tbl, {null_value: 0, title: "Meta-analysis"})
roc_curve(table, options)ROC curveroc_curve(roc_tbl, {title: "Classifier ROC"})
pca_plot(result, options)PCA scatter plotpca_plot(result, {title: "PCA"})
plot(table, options)General line/scatter plotplot(tbl, {type: "line", title: "Trend"})

ASCII Plots (terminal output)

FunctionDescriptionExample
scatter(x, y, options)ASCII scatter plotscatter(age, expr, {xlabel: "Age", ylabel: "Expr"})
boxplot(data, options)ASCII box plotboxplot(table({"Ctrl": g1, "Treat": g2}), {title: "Comparison"})
bar_chart(labels, values, options)ASCII bar chartbar_chart(names, counts, {title: "Counts"})
sparkline(data)Inline sparklinesparkline(timeseries)
hist(data, options)ASCII histogramhist(data, {bins: 20})

Note: All visualization options are passed as a record (second argument): fn(data, {key: value, ...}). Options are always optional — you can call any plot function with just the data argument.

Resampling and Simulation

BioLang provides building blocks for resampling methods rather than dedicated functions:

# Bootstrap confidence interval for the median
let data = [2.3, 4.1, 3.7, 5.2, 4.8, 3.1, 6.0, 4.4]
let n_boot = 10000
let boot_medians = range(0, n_boot) |> map(|i| {
  let resample = range(0, len(data)) |> map(|j| data[random_int(0, len(data) - 1)])
  median(resample)
})
let sorted = sort(boot_medians)
let lo = sorted[round(n_boot * 0.025, 0)]
let hi = sorted[round(n_boot * 0.975, 0)]
print("95% CI: [" + str(lo) + ", " + str(hi) + "]")

# Permutation test
let observed_diff = abs(mean(treated) - mean(control))
let combined = treated + control
let n_perm = 10000
let null_diffs = range(0, n_perm) |> map(|i| {
  let shuffled = shuffle(combined)
  let perm_a = shuffled[0..len(treated)]
  let perm_b = shuffled[len(treated)..len(combined)]
  abs(mean(perm_a) - mean(perm_b))
})
let p_value = len(null_diffs |> filter(|d| d >= observed_diff)) / n_perm

Utility Functions

FunctionDescriptionExample
shuffle(x)Random permutationshuffle(labels)
random_int(a, b)Random integer in [a, b]random_int(0, 99)
sort(x)Sort values ascendingsort([3, 1, 2])[1, 2, 3]
range(a, b)Integer sequence [a, b)range(0, 10)[0, 1, ..., 9]
len(x)Number of elementslen([1, 2, 3])3
str(x)Convert to stringstr(42)"42"
table(rows, cols, fill)Create a tabletable(10, 3, 0)

Power Analysis

BioLang does not have dedicated power analysis functions. Compute sample sizes using distribution quantiles and effect size formulas:

# Sample size for two-sample t-test
# H0: mu1 = mu2, H1: mu1 != mu2
let effect_size = 0.5     # Cohen's d
let alpha = 0.05
let power = 0.80
let z_alpha = qnorm(1 - alpha / 2, 0, 1)    # 1.96
let z_beta = qnorm(power, 0, 1)              # 0.842
let n_per_group = round(2 * ((z_alpha + z_beta) / effect_size) ** 2, 0)
print("Required n per group: " + str(n_per_group))

# Cohen's d (inline)
let d = abs(mean(a) - mean(b)) / sqrt(((len(a) - 1) * var(a) + (len(b) - 1) * var(b)) / (len(a) + len(b) - 2))

Bayesian Methods

BioLang supports Bayesian analysis through conjugate update formulas computed inline:

# Beta-Binomial conjugate update
# Prior: Beta(alpha, beta); Data: k successes in n trials
# Posterior: Beta(alpha + k, beta + n - k)
let prior_a = 1.0
let prior_b = 1.0
let k = 15
let n = 20
let post_a = prior_a + k
let post_b = prior_b + (n - k)
let post_mean = post_a / (post_a + post_b)
print("Posterior mean: " + str(round(post_mean, 3)))

# 95% credible interval via Beta quantiles
let ci_lo = qbeta(0.025, post_a, post_b)
let ci_hi = qbeta(0.975, post_a, post_b)
print("95% CI: [" + str(round(ci_lo, 3)) + ", " + str(round(ci_hi, 3)) + "]")

# Normal-Normal conjugate update
# Prior: N(mu0, sigma0^2); Data: n observations with mean x_bar and known sigma
let prior_mu = 0.0
let prior_prec = 1.0 / (10.0 ** 2)   # prior precision = 1/sigma0^2
let data_prec = len(data) / (sd(data) ** 2)
let post_prec = prior_prec + data_prec
let post_mu = (prior_prec * prior_mu + data_prec * mean(data)) / post_prec
let post_sd = sqrt(1.0 / post_prec)

Survival Analysis

BioLang provides basic building blocks for survival analysis. For simple comparisons:

# Compare survival times between two groups
let result = ttest(arm1_times, arm2_times)
print("p-value: " + str(round(result.p_value, 4)))

# Median survival
let med_a = sort(arm1_times)[len(arm1_times) / 2]
let med_b = sort(arm2_times)[len(arm2_times) / 2]
print("Median survival — Arm A: " + str(med_a) + ", Arm B: " + str(med_b))

# Approximate hazard ratio from median ratio
let hr = med_b / med_a

# Regression on survival times
let model = lm(survival_time, [age, stage, treatment])