Appendix B: Statistical Decision Flowchart

The hardest part of statistics is choosing the right test. This appendix is your map.

When you have data and a question, the path to the correct statistical test follows a decision tree based on three things: what kind of data you have, how many groups you are comparing, and what assumptions your data meets. This appendix lays out that tree in a series of tables you can consult whenever you are unsure.

The Master Decision Guide

Start here. Find your question type, then follow the table to the right test.

What are you asking?	Go to section
Are two groups different?	Comparing Two Groups
Are three or more groups different?	Comparing Multiple Groups
Are two variables related?	Associations and Correlations
Does one variable predict another?	Regression
Is there a relationship in categorical data?	Categorical Data
How long until an event occurs?	Time-to-Event Analysis
Do I need to reduce dimensionality?	Dimensionality Reduction
Do I need to group similar observations?	Clustering

Comparing Two Groups

Use this when you have one outcome variable and two groups (e.g., control vs. treated, male vs. female, wildtype vs. knockout).

Step 1: What type is your outcome variable?

Outcome type	Next step
Continuous (expression level, concentration, weight)	Step 2
Counts (number of mutations, colony counts)	Consider Poisson or negative binomial test
Binary (alive/dead, present/absent)	See Categorical Data
Ordinal (severity scale, Likert scores)	Use non-parametric test

Step 2: Are the observations paired or independent?

Design	Paired?	Example
Same subjects measured before and after treatment	Yes	Pre/post drug expression
Different subjects in each group	No	Treated vs. control mice
Matched pairs (e.g., tumor vs. adjacent normal from same patient)	Yes	Tumor/normal tissue pairs

Step 3: Choose your test

Paired?	Normal distribution?	Equal variance?	Test	BioLang
No	Yes	Yes	Student’s t-test	`ttest(a, b)`
No	Yes	No	Welch’s t-test	`ttest(a, b)`
No	No	—	Mann-Whitney U	`wilcoxon(a, b)`
Yes	Yes	—	Paired t-test	`ttest_paired(a, b)`
Yes	No	—	Wilcoxon signed-rank	`wilcoxon(a, b)`

Key insight: Welch’s t-test is almost always preferred over Student’s t-test because it does not assume equal variances. When variances are actually equal, Welch’s test gives nearly identical results. When they are not, Student’s test can be dangerously wrong. BioLang uses Welch’s by default.

How to check normality

let data = [2.3, 4.1, 3.7, 5.2, 4.8, 3.1, 6.0, 4.4]

# Visual check — Q-Q plot (best for small samples)
qq_plot(data, {title: "Normality Check"})

Common pitfall: With small samples (n < 30), normality tests have low power and may fail to reject normality even when the data is non-normal. With large samples (n > 5000), normality tests reject normality for trivially small deviations. Use Q-Q plots as a visual supplement.

Comparing Multiple Groups

Use this when you have three or more groups (e.g., three drug doses, four tissue types, five time points).

Normal?	Equal variance?	Design	Test	BioLang
Yes	Yes	Independent groups	One-way ANOVA	`anova(groups)`
Yes	No	Independent groups	Welch’s ANOVA	`anova(groups)`
No	—	Independent groups	Kruskal-Wallis	`anova(groups)`
Yes	—	Repeated measures	Repeated-measures ANOVA	`anova(groups)`
No	—	Repeated measures	Friedman test	`anova(groups)`
Yes	—	Two factors	Two-way ANOVA	`anova(groups)`

Post-hoc Tests

When ANOVA is significant, you know some groups differ but not which ones. Use post-hoc tests:

Test	When to use	BioLang
Tukey HSD	All pairwise comparisons	Pairwise `ttest()` + `p_adjust(pvals, "bonferroni")`
Dunnett	Compare all groups to a single control	Pairwise `ttest()` vs control + `p_adjust()`
Dunn test	Post-hoc for Kruskal-Wallis	Pairwise `wilcoxon()` + `p_adjust()`
Bonferroni-corrected pairwise	Conservative, any design	Pairwise `ttest()` + `p_adjust(pvals, "bonferroni")`

Key insight: ANOVA is an omnibus test — it tells you that at least one group differs, but not which one. Always follow a significant ANOVA with post-hoc comparisons. Reporting only the ANOVA p-value is incomplete.

Associations and Correlations

Use this when you have two continuous variables and want to know if they are related (e.g., gene expression vs. methylation, age vs. telomere length).

Data characteristics	Test	BioLang
Both variables roughly normal, linear relationship	Pearson correlation	`cor(x, y)`
Non-normal or ordinal data, monotonic relationship	Spearman correlation	`spearman(x, y)`
Ordinal data with ties	Kendall tau	`kendall(x, y)`
Partial correlation (controlling for a third variable)	Partial correlation	`cor(x, y)` after residualizing on z

Interpreting Correlation Strength

| |r| value | Interpretation | |—|—| | 0.0 - 0.1 | Negligible | | 0.1 - 0.3 | Weak | | 0.3 - 0.5 | Moderate | | 0.5 - 0.7 | Strong | | 0.7 - 1.0 | Very strong |

Common pitfall: Correlation does not imply causation, but more subtly, absence of Pearson correlation does not imply absence of relationship. Pearson only detects linear associations. Two variables can have a perfect quadratic relationship with r = 0. Always plot your data.

Categorical Data

Use this when both your variables are categorical (e.g., mutation status vs. disease outcome, genotype vs. phenotype).

Design	Expected cell counts	Test	BioLang
2x2 table, large samples	All expected >= 5	Chi-square test	`chi_square(observed, expected)`
2x2 table, small samples	Any expected < 5	Fisher’s exact test	`fisher_exact(a, b, c, d)`
Larger than 2x2	All expected >= 5	Chi-square test	`chi_square(observed, expected)`
Larger than 2x2, small samples	Any expected < 5	Fisher-Freeman-Halton	`fisher_exact(a, b, c, d)`
Paired categorical data	—	McNemar’s test	`chi_square(observed, expected)`
Trend across ordered categories	—	Cochran-Armitage trend test	`chi_square(observed, expected)`

Measures of Association for Categorical Data

Measure	Use case	BioLang
Odds ratio	2x2 tables, case-control studies	`(ad) / (bc)` (inline)
Relative risk	2x2 tables, cohort studies	`(a/(a+b)) / (c/(c+d))` (inline)
Cramer’s V	Any size contingency table	Compute from chi-square statistic

Regression

Use this when you want to predict an outcome from one or more predictor variables.

Outcome type	Number of predictors	Test	BioLang
Continuous	1	Simple linear regression	`lm(y, x)`
Continuous	Multiple	Multiple linear regression	`lm(y, [x1, x2, x3])`
Binary (0/1)	Any	Logistic regression	`glm("y ~ x", table, "binomial")`
Count	Any	Poisson regression	`glm("y ~ x", table, "poisson")`
Count, overdispersed	Any	Negative binomial regression	`glm("y ~ x", table, "negbin")`
Continuous, clustered data	Any	Mixed-effects model	`lm(y, x)` (per group)

Checking Regression Assumptions

let model = lm(expression, [age, sex, batch])

# Check residuals with Q-Q plot
let residuals = model.residuals
qq_plot(residuals, {title: "Residual Normality Check"})
print("R-squared: " + str(round(model.r_squared, 3)))

Common pitfall: Adding more predictors always improves R-squared, even if the predictors are noise. Use adjusted R-squared or AIC/BIC for model comparison. Report both R-squared and adjusted R-squared.

Time-to-Event Analysis

Use this when your outcome is the time until something happens (death, relapse, response) and some observations are censored (the event has not yet occurred).

Question	Method	BioLang
Estimate survival curve	Kaplan-Meier	Sort event times, compute stepwise survival
Compare survival between two groups	Log-rank test	`ttest(times_a, times_b)` as proxy
Compare survival, multiple groups	Log-rank test	`anova([group1_times, group2_times, ...])`
Adjust for covariates	Cox proportional hazards	`lm(time, [covariates])`
Estimate median survival	From sorted times	`sort(times)[len(times) / 2]`

Clinical relevance: In clinical trials, the hazard ratio from a Cox model is the primary efficacy endpoint. A hazard ratio of 0.65 means the treatment group has a 35% lower instantaneous risk of the event at any time point. Always report the 95% confidence interval alongside the point estimate.

Dimensionality Reduction

Use this when you have many variables (genes, proteins, metabolites) and want to find the main patterns.

Goal	Method	BioLang
Find linear combinations that maximize variance	PCA	`pca(data)`
Visualize PCA results	PCA plot	`pca_plot(result, {title: "PCA"})`

Key insight: PCA is deterministic — you get the same answer every time. t-SNE and UMAP are stochastic — different runs give different layouts. Always set a random seed before running stochastic methods for reproducibility.

Clustering

Use this when you want to group similar observations (samples, genes, cells) together.

What you know	Method	BioLang
Number of clusters (k)	k-means	`kmeans(data, 3)`
Want a hierarchy of clusters	Hierarchical clustering	`hclust(data, "ward")`
Irregular cluster shapes	DBSCAN	`dbscan(data, 0.5, 5)`
Want to estimate k	Silhouette / Elbow	Loop over k, compute `kmeans(data, k).silhouette`

Multiple Testing Correction

Use this whenever you perform more than one statistical test on the same dataset.

Method	Controls	Strictness	BioLang
Bonferroni	Family-wise error rate	Most conservative	`p_adjust(pvals, "bonferroni")`
Holm	Family-wise error rate	Less conservative	`p_adjust(pvals, "holm")`
Benjamini-Hochberg	False discovery rate	Moderate	`p_adjust(pvals, "BH")`
Benjamini-Yekutieli	FDR under dependence	Conservative FDR	`p_adjust(pvals, "BY")`
Permutation	Empirical null	Gold standard	Inline loop with `shuffle()`

Key insight: For genomics (testing thousands of genes), Benjamini-Hochberg FDR correction at q = 0.05 is the standard. Bonferroni is too conservative for genome-wide studies — it controls the family-wise error rate, which is the wrong quantity when you expect hundreds of true positives.

Quick Reference: Common Biological Scenarios

Scenario	Recommended test	BioLang
Gene expression, treated vs. control	Welch’s t-test	`ttest(treated, control)`
Gene expression across 4 tissues	One-way ANOVA	`anova([tissue1, tissue2, tissue3, tissue4])`
Mutation frequency in cases vs. controls	Fisher’s exact test	`fisher_exact(a, b, c, d)`
Survival by treatment arm	Compare survival times	`ttest(arm1_times, arm2_times)`
20,000 gene differential expression	t-test + BH correction	`p_adjust(pvals, "BH")`
Sample clustering from RNA-seq	PCA + hierarchical clustering	`pca(data)` then `hclust(scores)`
Correlation: expression vs. methylation	Spearman (often non-linear)	`spearman(expr, meth)`
GWAS: genotype vs. phenotype	Logistic regression + BH	`glm("pheno ~ geno", tbl, "binomial")`
Clinical outcome predictors	Regression model	`lm(outcome, [age, stage, treatment])`
Sample size for planned experiment	Power analysis	Compute with `qnorm()` and effect size

Keyboard shortcuts

Practical Biostatistics in 30 Days