Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix B: Statistical Decision Flowchart

The hardest part of statistics is choosing the right test. This appendix is your map.

When you have data and a question, the path to the correct statistical test follows a decision tree based on three things: what kind of data you have, how many groups you are comparing, and what assumptions your data meets. This appendix lays out that tree in a series of tables you can consult whenever you are unsure.

The Master Decision Guide

Start here. Find your question type, then follow the table to the right test.

What are you asking?Go to section
Are two groups different?Comparing Two Groups
Are three or more groups different?Comparing Multiple Groups
Are two variables related?Associations and Correlations
Does one variable predict another?Regression
Is there a relationship in categorical data?Categorical Data
How long until an event occurs?Time-to-Event Analysis
Do I need to reduce dimensionality?Dimensionality Reduction
Do I need to group similar observations?Clustering

Comparing Two Groups

Use this when you have one outcome variable and two groups (e.g., control vs. treated, male vs. female, wildtype vs. knockout).

Step 1: What type is your outcome variable?

Outcome typeNext step
Continuous (expression level, concentration, weight)Step 2
Counts (number of mutations, colony counts)Consider Poisson or negative binomial test
Binary (alive/dead, present/absent)See Categorical Data
Ordinal (severity scale, Likert scores)Use non-parametric test

Step 2: Are the observations paired or independent?

DesignPaired?Example
Same subjects measured before and after treatmentYesPre/post drug expression
Different subjects in each groupNoTreated vs. control mice
Matched pairs (e.g., tumor vs. adjacent normal from same patient)YesTumor/normal tissue pairs

Step 3: Choose your test

Paired?Normal distribution?Equal variance?TestBioLang
NoYesYesStudent’s t-testttest(a, b)
NoYesNoWelch’s t-testttest(a, b)
NoNoMann-Whitney Uwilcoxon(a, b)
YesYesPaired t-testttest_paired(a, b)
YesNoWilcoxon signed-rankwilcoxon(a, b)

Key insight: Welch’s t-test is almost always preferred over Student’s t-test because it does not assume equal variances. When variances are actually equal, Welch’s test gives nearly identical results. When they are not, Student’s test can be dangerously wrong. BioLang uses Welch’s by default.

How to check normality

let data = [2.3, 4.1, 3.7, 5.2, 4.8, 3.1, 6.0, 4.4]

# Visual check — Q-Q plot (best for small samples)
qq_plot(data, {title: "Normality Check"})

Common pitfall: With small samples (n < 30), normality tests have low power and may fail to reject normality even when the data is non-normal. With large samples (n > 5000), normality tests reject normality for trivially small deviations. Use Q-Q plots as a visual supplement.

Comparing Multiple Groups

Use this when you have three or more groups (e.g., three drug doses, four tissue types, five time points).

Normal?Equal variance?DesignTestBioLang
YesYesIndependent groupsOne-way ANOVAanova(groups)
YesNoIndependent groupsWelch’s ANOVAanova(groups)
NoIndependent groupsKruskal-Wallisanova(groups)
YesRepeated measuresRepeated-measures ANOVAanova(groups)
NoRepeated measuresFriedman testanova(groups)
YesTwo factorsTwo-way ANOVAanova(groups)

Post-hoc Tests

When ANOVA is significant, you know some groups differ but not which ones. Use post-hoc tests:

TestWhen to useBioLang
Tukey HSDAll pairwise comparisonsPairwise ttest() + p_adjust(pvals, "bonferroni")
DunnettCompare all groups to a single controlPairwise ttest() vs control + p_adjust()
Dunn testPost-hoc for Kruskal-WallisPairwise wilcoxon() + p_adjust()
Bonferroni-corrected pairwiseConservative, any designPairwise ttest() + p_adjust(pvals, "bonferroni")

Key insight: ANOVA is an omnibus test — it tells you that at least one group differs, but not which one. Always follow a significant ANOVA with post-hoc comparisons. Reporting only the ANOVA p-value is incomplete.

Associations and Correlations

Use this when you have two continuous variables and want to know if they are related (e.g., gene expression vs. methylation, age vs. telomere length).

Data characteristicsTestBioLang
Both variables roughly normal, linear relationshipPearson correlationcor(x, y)
Non-normal or ordinal data, monotonic relationshipSpearman correlationspearman(x, y)
Ordinal data with tiesKendall taukendall(x, y)
Partial correlation (controlling for a third variable)Partial correlationcor(x, y) after residualizing on z

Interpreting Correlation Strength

| |r| value | Interpretation | |—|—| | 0.0 - 0.1 | Negligible | | 0.1 - 0.3 | Weak | | 0.3 - 0.5 | Moderate | | 0.5 - 0.7 | Strong | | 0.7 - 1.0 | Very strong |

Common pitfall: Correlation does not imply causation, but more subtly, absence of Pearson correlation does not imply absence of relationship. Pearson only detects linear associations. Two variables can have a perfect quadratic relationship with r = 0. Always plot your data.

Categorical Data

Use this when both your variables are categorical (e.g., mutation status vs. disease outcome, genotype vs. phenotype).

DesignExpected cell countsTestBioLang
2x2 table, large samplesAll expected >= 5Chi-square testchi_square(observed, expected)
2x2 table, small samplesAny expected < 5Fisher’s exact testfisher_exact(a, b, c, d)
Larger than 2x2All expected >= 5Chi-square testchi_square(observed, expected)
Larger than 2x2, small samplesAny expected < 5Fisher-Freeman-Haltonfisher_exact(a, b, c, d)
Paired categorical dataMcNemar’s testchi_square(observed, expected)
Trend across ordered categoriesCochran-Armitage trend testchi_square(observed, expected)

Measures of Association for Categorical Data

MeasureUse caseBioLang
Odds ratio2x2 tables, case-control studies(a*d) / (b*c) (inline)
Relative risk2x2 tables, cohort studies(a/(a+b)) / (c/(c+d)) (inline)
Cramer’s VAny size contingency tableCompute from chi-square statistic

Regression

Use this when you want to predict an outcome from one or more predictor variables.

Outcome typeNumber of predictorsTestBioLang
Continuous1Simple linear regressionlm(y, x)
ContinuousMultipleMultiple linear regressionlm(y, [x1, x2, x3])
Binary (0/1)AnyLogistic regressionglm("y ~ x", table, "binomial")
CountAnyPoisson regressionglm("y ~ x", table, "poisson")
Count, overdispersedAnyNegative binomial regressionglm("y ~ x", table, "negbin")
Continuous, clustered dataAnyMixed-effects modellm(y, x) (per group)

Checking Regression Assumptions

let model = lm(expression, [age, sex, batch])

# Check residuals with Q-Q plot
let residuals = model.residuals
qq_plot(residuals, {title: "Residual Normality Check"})
print("R-squared: " + str(round(model.r_squared, 3)))

Common pitfall: Adding more predictors always improves R-squared, even if the predictors are noise. Use adjusted R-squared or AIC/BIC for model comparison. Report both R-squared and adjusted R-squared.

Time-to-Event Analysis

Use this when your outcome is the time until something happens (death, relapse, response) and some observations are censored (the event has not yet occurred).

QuestionMethodBioLang
Estimate survival curveKaplan-MeierSort event times, compute stepwise survival
Compare survival between two groupsLog-rank testttest(times_a, times_b) as proxy
Compare survival, multiple groupsLog-rank testanova([group1_times, group2_times, ...])
Adjust for covariatesCox proportional hazardslm(time, [covariates])
Estimate median survivalFrom sorted timessort(times)[len(times) / 2]

Clinical relevance: In clinical trials, the hazard ratio from a Cox model is the primary efficacy endpoint. A hazard ratio of 0.65 means the treatment group has a 35% lower instantaneous risk of the event at any time point. Always report the 95% confidence interval alongside the point estimate.

Dimensionality Reduction

Use this when you have many variables (genes, proteins, metabolites) and want to find the main patterns.

GoalMethodBioLang
Find linear combinations that maximize variancePCApca(data)
Visualize PCA resultsPCA plotpca_plot(result, {title: "PCA"})

Key insight: PCA is deterministic — you get the same answer every time. t-SNE and UMAP are stochastic — different runs give different layouts. Always set a random seed before running stochastic methods for reproducibility.

Clustering

Use this when you want to group similar observations (samples, genes, cells) together.

What you knowMethodBioLang
Number of clusters (k)k-meanskmeans(data, 3)
Want a hierarchy of clustersHierarchical clusteringhclust(data, "ward")
Irregular cluster shapesDBSCANdbscan(data, 0.5, 5)
Want to estimate kSilhouette / ElbowLoop over k, compute kmeans(data, k).silhouette

Multiple Testing Correction

Use this whenever you perform more than one statistical test on the same dataset.

MethodControlsStrictnessBioLang
BonferroniFamily-wise error rateMost conservativep_adjust(pvals, "bonferroni")
HolmFamily-wise error rateLess conservativep_adjust(pvals, "holm")
Benjamini-HochbergFalse discovery rateModeratep_adjust(pvals, "BH")
Benjamini-YekutieliFDR under dependenceConservative FDRp_adjust(pvals, "BY")
PermutationEmpirical nullGold standardInline loop with shuffle()

Key insight: For genomics (testing thousands of genes), Benjamini-Hochberg FDR correction at q = 0.05 is the standard. Bonferroni is too conservative for genome-wide studies — it controls the family-wise error rate, which is the wrong quantity when you expect hundreds of true positives.

Quick Reference: Common Biological Scenarios

ScenarioRecommended testBioLang
Gene expression, treated vs. controlWelch’s t-testttest(treated, control)
Gene expression across 4 tissuesOne-way ANOVAanova([tissue1, tissue2, tissue3, tissue4])
Mutation frequency in cases vs. controlsFisher’s exact testfisher_exact(a, b, c, d)
Survival by treatment armCompare survival timesttest(arm1_times, arm2_times)
20,000 gene differential expressiont-test + BH correctionp_adjust(pvals, "BH")
Sample clustering from RNA-seqPCA + hierarchical clusteringpca(data) then hclust(scores)
Correlation: expression vs. methylationSpearman (often non-linear)spearman(expr, meth)
GWAS: genotype vs. phenotypeLogistic regression + BHglm("pheno ~ geno", tbl, "binomial")
Clinical outcome predictorsRegression modellm(outcome, [age, stage, treatment])
Sample size for planned experimentPower analysisCompute with qnorm() and effect size