Day 6: Confidence Intervals — The Range of Truth
The Problem
Dr. Amara Chen’s pharmacology team has spent six months developing a novel kinase inhibitor for triple-negative breast cancer. After extensive optimization, they measure the half-maximal inhibitory concentration (IC50) across eight independent replicates: 11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, and 12.5 nanomolar. The mean is 12.3 nM — an excellent result that would place their compound among the most potent in its class.
But when Dr. Chen presents these results to the medicinal chemistry team, the lead chemist asks the uncomfortable question: “If you ran the experiment again tomorrow, would you get 12.3 nM? Or could it be 15? Or 9?” The point estimate of 12.3 nM tells them where the center of their data is, but it says nothing about how confident they should be in that number. They need a range — a confidence interval — that captures the uncertainty inherent in measuring anything biological.
This chapter introduces the confidence interval: a range of plausible values for a population parameter, built from sample data. It is one of the most important and most misunderstood tools in all of biostatistics.
What Is a Confidence Interval?
Imagine you are trying to measure the height of a building, but your measuring tape is slightly stretchy. Each time you measure, you get a slightly different answer. A confidence interval is like saying: “Based on my eight measurements, I am 95% confident the true height is somewhere between 48.2 and 52.1 meters.”
More precisely: if you repeated your experiment 100 times and computed a 95% confidence interval each time, about 95 of those 100 intervals would contain the true population parameter. The remaining 5 would miss it entirely.
Common pitfall: A 95% CI does NOT mean “there is a 95% probability the true value is in this interval.” Once you compute a specific interval, the true value is either in it or it isn’t. The 95% refers to the procedure’s long-run success rate, not the probability for any single interval.
Point Estimates Are Not Enough
A point estimate is a single number — a sample mean, a proportion, a median. It is our best guess, but it carries no information about precision.
| Scenario | Point Estimate | What’s Missing? |
|---|---|---|
| Drug IC50 from 8 replicates | 12.3 nM | Could be 8-16 nM or 11.9-12.7 nM |
| Mutation frequency in 50 patients | 34% | Could be 21-47% or 30-38% |
| Mean tumor volume after treatment | 180 mm³ | How variable was the response? |
The confidence interval supplements the point estimate with a measure of uncertainty. Narrow intervals mean precise estimates; wide intervals mean the data leaves much room for doubt.
CI for a Mean: x-bar plus-or-minus t times SE
The most common confidence interval is for a population mean. The formula is:
CI = x-bar +/- t(alpha/2, df) x SE
Where:
- x-bar is the sample mean
- SE = s / sqrt(n) is the standard error of the mean
- t(alpha/2, df) is the critical value from the t-distribution with df = n - 1
- alpha = 1 - confidence level (for 95% CI, alpha = 0.05)
Why the t-Distribution for Small Samples?
When n is large (say, n > 30), the t-distribution closely resembles the normal distribution. But for small n — common in biology where each replicate is expensive — the t-distribution has heavier tails, producing wider intervals that honestly reflect our greater uncertainty.
| Sample Size (n) | t-critical (95%) | z-critical (95%) | Difference |
|---|---|---|---|
| 5 | 2.776 | 1.960 | 42% wider |
| 10 | 2.262 | 1.960 | 15% wider |
| 30 | 2.045 | 1.960 | 4% wider |
| 100 | 1.984 | 1.960 | ~1% wider |
| 1000 | 1.962 | 1.960 | Negligible |
Key insight: For biological experiments with n < 30, always use the t-distribution. Using z would give falsely narrow intervals that overstate your precision.
CI for a Proportion
When the variable is binary — mutation present/absent, responder/non-responder — we need a CI for a proportion p-hat = x/n.
Wald Interval (Simple but Flawed)
CI = p-hat +/- z x sqrt(p-hat(1 - p-hat) / n)
This is the textbook formula, but it performs poorly when p is near 0 or 1, or when n is small. It can even produce intervals that extend below 0 or above 1.
Wilson Interval (Preferred)
The Wilson score interval adjusts the center and width, and is recommended for most biological applications:
CI = (p-hat + z²/2n +/- z x sqrt(p-hat(1-p-hat)/n + z²/4n²)) / (1 + z²/n)
Clinical relevance: When reporting mutation carrier frequencies, drug response rates, or diagnostic sensitivity/specificity, always use Wilson intervals. Regulatory agencies expect intervals that behave properly even at extreme proportions.
CI for the Difference Between Two Means
Often the real question is not “what is the mean?” but “how much do two groups differ?” The CI for the difference between two independent means is:
CI = (x-bar1 - x-bar2) +/- t x SE_diff
Where SE_diff = sqrt(s1²/n1 + s2²/n2) for Welch’s approach.
The critical interpretation: If the CI for the difference includes zero, the data are consistent with no difference between the groups. If it excludes zero, the difference is statistically significant.
| CI for Difference | Interpretation |
|---|---|
| [1.2, 4.8] | Groups differ; difference is between 1.2 and 4.8 units |
| [-0.5, 3.1] | Includes zero; cannot rule out no difference |
| [-4.2, -1.1] | Groups differ; group 2 is higher by 1.1 to 4.2 units |
Bootstrap Confidence Intervals
What if your statistic is a median, a ratio, or something with no tidy formula? The bootstrap is a computer-intensive method that works for any statistic:
- Resample your data with replacement, same size as original
- Compute the statistic on the resample
- Repeat 10,000 times
- Take the 2.5th and 97.5th percentiles of the bootstrap distribution
This is called the percentile method. No assumptions about normality or distribution shape are required.
Key insight: Bootstrap CIs are the Swiss army knife of interval estimation. When in doubt, bootstrap it.
What Controls CI Width?
Three factors determine how wide or narrow your confidence interval will be:
| Factor | Effect on Width | Biological Implication |
|---|---|---|
| Sample size (n) | Width ~ 1/sqrt(n) | Doubling n cuts width by ~30% |
| Variability (s) | Width ~ s | High biological variability = wider CIs |
| Confidence level | 99% > 95% > 90% | Higher confidence = wider interval |
This is why power calculations matter: before an experiment, you choose n to achieve a CI narrow enough to be scientifically useful.
Confidence Intervals in BioLang
IC50 Confidence Interval — Parametric
# IC50 measurements (nM) from 8 replicates
let ic50 = [11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5]
let n = len(ic50)
let x_bar = mean(ic50)
let se = stdev(ic50) / sqrt(n)
# 95% CI using normal approximation (for small n, t > z)
let t_crit = qnorm(0.975)
let ci_lower = x_bar - t_crit * se
let ci_upper = x_bar + t_crit * se
print("IC50 mean: {x_bar:.2} nM")
print("95% CI: [{ci_lower:.2}, {ci_upper:.2}] nM")
print("Standard error: {se:.3} nM")
print("Critical value: {t_crit:.3}")
IC50 Confidence Interval — Bootstrap
set_seed(42)
# Bootstrap CI: no distributional assumptions
let ic50 = [11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5]
# Bootstrap: resample 10,000 times, compute mean each time
let n_boot = 10000
let boot_means = []
for i in range(0, n_boot) {
let resample = []
for j in range(0, len(ic50)) {
resample = append(resample, ic50[random_int(0, len(ic50) - 1)])
}
boot_means = append(boot_means, mean(resample))
}
# Percentile method
let boot_lower = quantile(boot_means, 0.025)
let boot_upper = quantile(boot_means, 0.975)
print("Bootstrap 95% CI: [{boot_lower:.2}, {boot_upper:.2}] nM")
# Visualize the bootstrap distribution
histogram(boot_means, {bins: 50, title: "Bootstrap Distribution of IC50 Mean", x_label: "Mean IC50 (nM)"})
Bootstrap CI for Median (No Parametric Formula Exists)
set_seed(42)
# Gene expression values (FPKM) — skewed distribution
let expression = [0.1, 0.3, 0.8, 1.2, 1.5, 2.1, 3.4, 8.7, 12.1, 45.6]
let obs_median = median(expression)
# Bootstrap the median
let n_boot = 10000
let boot_medians = []
for i in range(0, n_boot) {
let resample = []
for j in range(0, len(expression)) {
resample = append(resample, expression[random_int(0, len(expression) - 1)])
}
boot_medians = append(boot_medians, median(resample))
}
let ci_lower = quantile(boot_medians, 0.025)
let ci_upper = quantile(boot_medians, 0.975)
print("Observed median: {obs_median:.2} FPKM")
print("Bootstrap 95% CI for median: [{ci_lower:.2}, {ci_upper:.2}] FPKM")
Error Bar Plot: Comparing Drug Concentrations
# IC50 values for three drug candidates
let drug_a = [12.3, 11.8, 13.1, 12.0, 11.5, 12.7, 13.4, 12.1]
let drug_b = [25.1, 28.3, 22.7, 26.9, 24.5, 27.1, 23.8, 25.6]
let drug_c = [8.2, 9.1, 7.5, 8.8, 10.2, 8.0, 9.5, 7.8]
let drugs = ["Drug A", "Drug B", "Drug C"]
let means = [mean(drug_a), mean(drug_b), mean(drug_c)]
# Compute 95% CIs for each
let compute_ci = |data| {
let n = len(data)
let se = stdev(data) / sqrt(n)
let t_crit = qnorm(0.975)
[mean(data) - t_crit * se, mean(data) + t_crit * se]
}
let ci_a = compute_ci(drug_a)
let ci_b = compute_ci(drug_b)
let ci_c = compute_ci(drug_c)
print("Drug A: {means[0]:.1} nM, 95% CI [{ci_a[0]:.1}, {ci_a[1]:.1}]")
print("Drug B: {means[1]:.1} nM, 95% CI [{ci_b[0]:.1}, {ci_b[1]:.1}]")
print("Drug C: {means[2]:.1} nM, 95% CI [{ci_c[0]:.1}, {ci_c[1]:.1}]")
# Bar chart with error bars
bar_chart(drugs, means, {title: "IC50 Comparison with 95% CIs", y_label: "IC50 (nM)", error_bars: [ci_a, ci_b, ci_c]})
CI for Difference Between Two Means
# Compare tumor volume between treated and control mice
let treated = [180, 210, 165, 225, 195, 172, 218, 198]
let control = [485, 512, 468, 530, 495, 478, 521, 503]
let diff = mean(treated) - mean(control)
let se_diff = sqrt(variance(treated) / len(treated) + variance(control) / len(control))
let df = len(treated) + len(control) - 2
let t_crit = qnorm(0.975) # approximate for moderate df
let ci_lower = diff - t_crit * se_diff
let ci_upper = diff + t_crit * se_diff
print("Mean difference: {diff:.1} mm^3")
print("95% CI for difference: [{ci_lower:.1}, {ci_upper:.1}] mm^3")
if ci_upper < 0 {
print("CI excludes zero: treatment significantly reduces tumor volume")
} else {
print("CI includes zero: cannot rule out no difference")
}
Vaccine Efficacy CI (Proportion)
set_seed(42)
# Clinical trial: 15 of 200 vaccinated got infected vs 60 of 200 placebo
let p_vacc = 15 / 200
let p_plac = 60 / 200
let efficacy = 1.0 - (p_vacc / p_plac)
print("Vaccine efficacy: {efficacy * 100:.1}%")
# CI for proportion (vaccinated group infection rate)
let n = 200
let z = 1.96
let se_p = sqrt(p_vacc * (1.0 - p_vacc) / n)
let ci_lower_p = p_vacc - z * se_p
let ci_upper_p = p_vacc + z * se_p
print("Infection rate (vaccinated): {p_vacc*100:.1}%")
print("95% CI for infection rate: [{ci_lower_p*100:.1}%, {ci_upper_p*100:.1}%]")
# Bootstrap CI for vaccine efficacy itself
let vacc_outcomes = flatten([repeat(1, 15), repeat(0, 185)])
let plac_outcomes = flatten([repeat(1, 60), repeat(0, 140)])
let n_boot = 10000
let boot_eff = []
for i in range(0, n_boot) {
let v_resample = []
let p_resample = []
for j in range(0, len(vacc_outcomes)) {
v_resample = append(v_resample, vacc_outcomes[random_int(0, len(vacc_outcomes) - 1)])
p_resample = append(p_resample, plac_outcomes[random_int(0, len(plac_outcomes) - 1)])
}
let pv = mean(v_resample)
let pp = mean(p_resample)
let eff = if pp == 0.0 then 0.0 else 1.0 - (pv / pp)
boot_eff = append(boot_eff, eff)
}
let eff_ci = [quantile(boot_eff, 0.025), quantile(boot_eff, 0.975)]
print("Bootstrap 95% CI for efficacy: [{eff_ci[0]*100:.1}%, {eff_ci[1]*100:.1}%]")
Python:
import numpy as np
from scipy import stats
ic50 = [11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5]
ci = stats.t.interval(0.95, df=len(ic50)-1,
loc=np.mean(ic50),
scale=stats.sem(ic50))
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")
# Bootstrap
boot = [np.mean(np.random.choice(ic50, len(ic50))) for _ in range(10000)]
print(f"Bootstrap CI: [{np.percentile(boot, 2.5):.2f}, {np.percentile(boot, 97.5):.2f}]")
R:
ic50 <- c(11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5)
t.test(ic50)$conf.int
# Bootstrap
library(boot)
boot_fn <- function(data, i) mean(data[i])
b <- boot(ic50, boot_fn, R = 10000)
boot.ci(b, type = "perc")
Exercises
Exercise 1: Compute a CI by Hand
Eight mice on a high-fat diet had cholesterol levels: 215, 228, 197, 241, 209, 233, 220, 212 mg/dL. Compute the 95% CI for the mean cholesterol.
let cholesterol = [215, 228, 197, 241, 209, 233, 220, 212]
# TODO: Compute mean, SE, critical value, and the 95% CI
# Hint: df = n - 1, use qnorm(0.975) as approximate critical value
Exercise 2: Bootstrap a Ratio
Gene A has FPKM values [2.1, 3.4, 1.8, 4.2, 2.9] in tumor and [1.0, 1.2, 0.9, 1.5, 1.1] in normal. Bootstrap a 95% CI for the tumor/normal fold change of medians.
let tumor = [2.1, 3.4, 1.8, 4.2, 2.9]
let normal = [1.0, 1.2, 0.9, 1.5, 1.1]
# TODO: Bootstrap the ratio median(tumor) / median(normal)
# Use n_boot = 10000, then extract 2.5th and 97.5th percentiles with quantile()
Exercise 3: Overlapping CIs
Compute 95% CIs for Drug X (IC50: [5.2, 6.1, 4.8, 5.5, 6.3, 5.0]) and Drug Y (IC50: [5.8, 6.5, 7.2, 6.0, 5.9, 6.8]). Do the CIs overlap? What does this suggest?
let drug_x = [5.2, 6.1, 4.8, 5.5, 6.3, 5.0]
let drug_y = [5.8, 6.5, 7.2, 6.0, 5.9, 6.8]
# TODO: Compute CIs for both, then compute CI for the difference
# Note: overlapping CIs do NOT necessarily mean non-significant difference
Exercise 4: Effect of Sample Size
Starting with n = 5 replicates drawn from the IC50 data, increase to n = 10, 20, 50, and 100 (use bootstrap resampling to simulate larger samples). Plot CI width vs sample size.
let ic50 = [11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5]
# TODO: For each sample size, bootstrap to simulate, compute CI width
# Plot sample size vs CI width using line_plot
Key Takeaways
- A confidence interval gives a range of plausible values for a population parameter, not just a point estimate
- The 95% in “95% CI” refers to the long-run coverage rate of the procedure, not the probability for a specific interval
- For small samples (n < 30), always use the t-distribution — it accounts for extra uncertainty
- Bootstrap CIs work for any statistic (median, ratio, fold change) without distributional assumptions
- CI width shrinks with larger n, lower variability, and lower confidence level
- A CI for the difference that includes zero means the data are consistent with no difference
- CIs are more informative than p-values alone: they tell you both significance AND the plausible magnitude of an effect
What’s Next
Tomorrow we formalize the logic behind “ruling out chance” with hypothesis testing. You will learn to frame biological questions as null and alternative hypotheses, compute p-values, and understand the courtroom analogy that makes the whole framework click. Confidence intervals and hypothesis tests are two sides of the same coin — a 95% CI that excludes zero corresponds exactly to a p-value less than 0.05.