Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Day 6: Confidence Intervals — The Range of Truth

The Problem

Dr. Amara Chen’s pharmacology team has spent six months developing a novel kinase inhibitor for triple-negative breast cancer. After extensive optimization, they measure the half-maximal inhibitory concentration (IC50) across eight independent replicates: 11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, and 12.5 nanomolar. The mean is 12.3 nM — an excellent result that would place their compound among the most potent in its class.

But when Dr. Chen presents these results to the medicinal chemistry team, the lead chemist asks the uncomfortable question: “If you ran the experiment again tomorrow, would you get 12.3 nM? Or could it be 15? Or 9?” The point estimate of 12.3 nM tells them where the center of their data is, but it says nothing about how confident they should be in that number. They need a range — a confidence interval — that captures the uncertainty inherent in measuring anything biological.

This chapter introduces the confidence interval: a range of plausible values for a population parameter, built from sample data. It is one of the most important and most misunderstood tools in all of biostatistics.

What Is a Confidence Interval?

Imagine you are trying to measure the height of a building, but your measuring tape is slightly stretchy. Each time you measure, you get a slightly different answer. A confidence interval is like saying: “Based on my eight measurements, I am 95% confident the true height is somewhere between 48.2 and 52.1 meters.”

More precisely: if you repeated your experiment 100 times and computed a 95% confidence interval each time, about 95 of those 100 intervals would contain the true population parameter. The remaining 5 would miss it entirely.

20 Confidence Intervals from Repeated Experiments ~19 capture the true parameter (blue), ~1 misses it (red) True value (mu) Misses! Contains true value (19/20) Misses true value (1/20)

Common pitfall: A 95% CI does NOT mean “there is a 95% probability the true value is in this interval.” Once you compute a specific interval, the true value is either in it or it isn’t. The 95% refers to the procedure’s long-run success rate, not the probability for any single interval.

Point Estimates Are Not Enough

A point estimate is a single number — a sample mean, a proportion, a median. It is our best guess, but it carries no information about precision.

ScenarioPoint EstimateWhat’s Missing?
Drug IC50 from 8 replicates12.3 nMCould be 8-16 nM or 11.9-12.7 nM
Mutation frequency in 50 patients34%Could be 21-47% or 30-38%
Mean tumor volume after treatment180 mm³How variable was the response?

The confidence interval supplements the point estimate with a measure of uncertainty. Narrow intervals mean precise estimates; wide intervals mean the data leaves much room for doubt.

CI for a Mean: x-bar plus-or-minus t times SE

The most common confidence interval is for a population mean. The formula is:

CI = x-bar +/- t(alpha/2, df) x SE

Where:

  • x-bar is the sample mean
  • SE = s / sqrt(n) is the standard error of the mean
  • t(alpha/2, df) is the critical value from the t-distribution with df = n - 1
  • alpha = 1 - confidence level (for 95% CI, alpha = 0.05)

Why the t-Distribution for Small Samples?

When n is large (say, n > 30), the t-distribution closely resembles the normal distribution. But for small n — common in biology where each replicate is expensive — the t-distribution has heavier tails, producing wider intervals that honestly reflect our greater uncertainty.

Sample Size (n)t-critical (95%)z-critical (95%)Difference
52.7761.96042% wider
102.2621.96015% wider
302.0451.9604% wider
1001.9841.960~1% wider
10001.9621.960Negligible

Key insight: For biological experiments with n < 30, always use the t-distribution. Using z would give falsely narrow intervals that overstate your precision.

CI for a Proportion

When the variable is binary — mutation present/absent, responder/non-responder — we need a CI for a proportion p-hat = x/n.

Wald Interval (Simple but Flawed)

CI = p-hat +/- z x sqrt(p-hat(1 - p-hat) / n)

This is the textbook formula, but it performs poorly when p is near 0 or 1, or when n is small. It can even produce intervals that extend below 0 or above 1.

Wilson Interval (Preferred)

The Wilson score interval adjusts the center and width, and is recommended for most biological applications:

CI = (p-hat + z²/2n +/- z x sqrt(p-hat(1-p-hat)/n + z²/4n²)) / (1 + z²/n)

Clinical relevance: When reporting mutation carrier frequencies, drug response rates, or diagnostic sensitivity/specificity, always use Wilson intervals. Regulatory agencies expect intervals that behave properly even at extreme proportions.

CI for the Difference Between Two Means

Often the real question is not “what is the mean?” but “how much do two groups differ?” The CI for the difference between two independent means is:

CI = (x-bar1 - x-bar2) +/- t x SE_diff

Where SE_diff = sqrt(s1²/n1 + s2²/n2) for Welch’s approach.

The critical interpretation: If the CI for the difference includes zero, the data are consistent with no difference between the groups. If it excludes zero, the difference is statistically significant.

CI for DifferenceInterpretation
[1.2, 4.8]Groups differ; difference is between 1.2 and 4.8 units
[-0.5, 3.1]Includes zero; cannot rule out no difference
[-4.2, -1.1]Groups differ; group 2 is higher by 1.1 to 4.2 units

Bootstrap Confidence Intervals

What if your statistic is a median, a ratio, or something with no tidy formula? The bootstrap is a computer-intensive method that works for any statistic:

  1. Resample your data with replacement, same size as original
  2. Compute the statistic on the resample
  3. Repeat 10,000 times
  4. Take the 2.5th and 97.5th percentiles of the bootstrap distribution

This is called the percentile method. No assumptions about normality or distribution shape are required.

Key insight: Bootstrap CIs are the Swiss army knife of interval estimation. When in doubt, bootstrap it.

What Controls CI Width?

Three factors determine how wide or narrow your confidence interval will be:

FactorEffect on WidthBiological Implication
Sample size (n)Width ~ 1/sqrt(n)Doubling n cuts width by ~30%
Variability (s)Width ~ sHigh biological variability = wider CIs
Confidence level99% > 95% > 90%Higher confidence = wider interval

This is why power calculations matter: before an experiment, you choose n to achieve a CI narrow enough to be scientifically useful.

CI Width Narrows with Larger Sample Size Same population, same true mean -- only n changes True mean n = 10 Wide: +/- 2.1 nM n = 50 Medium: +/- 0.9 nM n = 200 Narrow: +/- 0.5 nM Width ~ 1/sqrt(n): doubling n cuts width by ~30%, quadrupling cuts by ~50%

Confidence Intervals in BioLang

IC50 Confidence Interval — Parametric

# IC50 measurements (nM) from 8 replicates
let ic50 = [11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5]

let n = len(ic50)
let x_bar = mean(ic50)
let se = stdev(ic50) / sqrt(n)

# 95% CI using normal approximation (for small n, t > z)
let t_crit = qnorm(0.975)
let ci_lower = x_bar - t_crit * se
let ci_upper = x_bar + t_crit * se

print("IC50 mean: {x_bar:.2} nM")
print("95% CI: [{ci_lower:.2}, {ci_upper:.2}] nM")
print("Standard error: {se:.3} nM")
print("Critical value: {t_crit:.3}")

IC50 Confidence Interval — Bootstrap

set_seed(42)
# Bootstrap CI: no distributional assumptions

let ic50 = [11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5]

# Bootstrap: resample 10,000 times, compute mean each time
let n_boot = 10000
let boot_means = []
for i in range(0, n_boot) {
    let resample = []
    for j in range(0, len(ic50)) {
        resample = append(resample, ic50[random_int(0, len(ic50) - 1)])
    }
    boot_means = append(boot_means, mean(resample))
}

# Percentile method
let boot_lower = quantile(boot_means, 0.025)
let boot_upper = quantile(boot_means, 0.975)

print("Bootstrap 95% CI: [{boot_lower:.2}, {boot_upper:.2}] nM")

# Visualize the bootstrap distribution
histogram(boot_means, {bins: 50, title: "Bootstrap Distribution of IC50 Mean", x_label: "Mean IC50 (nM)"})

Bootstrap CI for Median (No Parametric Formula Exists)

set_seed(42)
# Gene expression values (FPKM) — skewed distribution
let expression = [0.1, 0.3, 0.8, 1.2, 1.5, 2.1, 3.4, 8.7, 12.1, 45.6]

let obs_median = median(expression)

# Bootstrap the median
let n_boot = 10000
let boot_medians = []
for i in range(0, n_boot) {
    let resample = []
    for j in range(0, len(expression)) {
        resample = append(resample, expression[random_int(0, len(expression) - 1)])
    }
    boot_medians = append(boot_medians, median(resample))
}
let ci_lower = quantile(boot_medians, 0.025)
let ci_upper = quantile(boot_medians, 0.975)

print("Observed median: {obs_median:.2} FPKM")
print("Bootstrap 95% CI for median: [{ci_lower:.2}, {ci_upper:.2}] FPKM")

Error Bar Plot: Comparing Drug Concentrations

Error Bar Plot: 3 Drug Candidates with 95% CIs 0 5 10 15 20 25 IC50 (nM) Drug A 12.3 nM Drug B 25.5 nM Drug C 8.6 nM Lower IC50 = more potent. Non-overlapping CIs suggest significant differences.
# IC50 values for three drug candidates
let drug_a = [12.3, 11.8, 13.1, 12.0, 11.5, 12.7, 13.4, 12.1]
let drug_b = [25.1, 28.3, 22.7, 26.9, 24.5, 27.1, 23.8, 25.6]
let drug_c = [8.2, 9.1, 7.5, 8.8, 10.2, 8.0, 9.5, 7.8]

let drugs = ["Drug A", "Drug B", "Drug C"]
let means = [mean(drug_a), mean(drug_b), mean(drug_c)]

# Compute 95% CIs for each
let compute_ci = |data| {
  let n = len(data)
  let se = stdev(data) / sqrt(n)
  let t_crit = qnorm(0.975)
  [mean(data) - t_crit * se, mean(data) + t_crit * se]
}

let ci_a = compute_ci(drug_a)
let ci_b = compute_ci(drug_b)
let ci_c = compute_ci(drug_c)

print("Drug A: {means[0]:.1} nM, 95% CI [{ci_a[0]:.1}, {ci_a[1]:.1}]")
print("Drug B: {means[1]:.1} nM, 95% CI [{ci_b[0]:.1}, {ci_b[1]:.1}]")
print("Drug C: {means[2]:.1} nM, 95% CI [{ci_c[0]:.1}, {ci_c[1]:.1}]")

# Bar chart with error bars
bar_chart(drugs, means, {title: "IC50 Comparison with 95% CIs", y_label: "IC50 (nM)", error_bars: [ci_a, ci_b, ci_c]})

CI for Difference Between Two Means

# Compare tumor volume between treated and control mice
let treated = [180, 210, 165, 225, 195, 172, 218, 198]
let control = [485, 512, 468, 530, 495, 478, 521, 503]

let diff = mean(treated) - mean(control)
let se_diff = sqrt(variance(treated) / len(treated) + variance(control) / len(control))
let df = len(treated) + len(control) - 2
let t_crit = qnorm(0.975)  # approximate for moderate df

let ci_lower = diff - t_crit * se_diff
let ci_upper = diff + t_crit * se_diff

print("Mean difference: {diff:.1} mm^3")
print("95% CI for difference: [{ci_lower:.1}, {ci_upper:.1}] mm^3")

if ci_upper < 0 {
  print("CI excludes zero: treatment significantly reduces tumor volume")
} else {
  print("CI includes zero: cannot rule out no difference")
}

Vaccine Efficacy CI (Proportion)

set_seed(42)
# Clinical trial: 15 of 200 vaccinated got infected vs 60 of 200 placebo
let p_vacc = 15 / 200
let p_plac = 60 / 200
let efficacy = 1.0 - (p_vacc / p_plac)

print("Vaccine efficacy: {efficacy * 100:.1}%")

# CI for proportion (vaccinated group infection rate)
let n = 200
let z = 1.96
let se_p = sqrt(p_vacc * (1.0 - p_vacc) / n)
let ci_lower_p = p_vacc - z * se_p
let ci_upper_p = p_vacc + z * se_p

print("Infection rate (vaccinated): {p_vacc*100:.1}%")
print("95% CI for infection rate: [{ci_lower_p*100:.1}%, {ci_upper_p*100:.1}%]")

# Bootstrap CI for vaccine efficacy itself
let vacc_outcomes = flatten([repeat(1, 15), repeat(0, 185)])
let plac_outcomes = flatten([repeat(1, 60), repeat(0, 140)])

let n_boot = 10000
let boot_eff = []
for i in range(0, n_boot) {
    let v_resample = []
    let p_resample = []
    for j in range(0, len(vacc_outcomes)) {
        v_resample = append(v_resample, vacc_outcomes[random_int(0, len(vacc_outcomes) - 1)])
        p_resample = append(p_resample, plac_outcomes[random_int(0, len(plac_outcomes) - 1)])
    }
    let pv = mean(v_resample)
    let pp = mean(p_resample)
    let eff = if pp == 0.0 then 0.0 else 1.0 - (pv / pp)
    boot_eff = append(boot_eff, eff)
}

let eff_ci = [quantile(boot_eff, 0.025), quantile(boot_eff, 0.975)]
print("Bootstrap 95% CI for efficacy: [{eff_ci[0]*100:.1}%, {eff_ci[1]*100:.1}%]")

Python:

import numpy as np
from scipy import stats

ic50 = [11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5]
ci = stats.t.interval(0.95, df=len(ic50)-1,
                       loc=np.mean(ic50),
                       scale=stats.sem(ic50))
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")

# Bootstrap
boot = [np.mean(np.random.choice(ic50, len(ic50))) for _ in range(10000)]
print(f"Bootstrap CI: [{np.percentile(boot, 2.5):.2f}, {np.percentile(boot, 97.5):.2f}]")

R:

ic50 <- c(11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5)
t.test(ic50)$conf.int

# Bootstrap
library(boot)
boot_fn <- function(data, i) mean(data[i])
b <- boot(ic50, boot_fn, R = 10000)
boot.ci(b, type = "perc")

Exercises

Exercise 1: Compute a CI by Hand

Eight mice on a high-fat diet had cholesterol levels: 215, 228, 197, 241, 209, 233, 220, 212 mg/dL. Compute the 95% CI for the mean cholesterol.

let cholesterol = [215, 228, 197, 241, 209, 233, 220, 212]

# TODO: Compute mean, SE, critical value, and the 95% CI
# Hint: df = n - 1, use qnorm(0.975) as approximate critical value

Exercise 2: Bootstrap a Ratio

Gene A has FPKM values [2.1, 3.4, 1.8, 4.2, 2.9] in tumor and [1.0, 1.2, 0.9, 1.5, 1.1] in normal. Bootstrap a 95% CI for the tumor/normal fold change of medians.

let tumor = [2.1, 3.4, 1.8, 4.2, 2.9]
let normal = [1.0, 1.2, 0.9, 1.5, 1.1]

# TODO: Bootstrap the ratio median(tumor) / median(normal)
# Use n_boot = 10000, then extract 2.5th and 97.5th percentiles with quantile()

Exercise 3: Overlapping CIs

Compute 95% CIs for Drug X (IC50: [5.2, 6.1, 4.8, 5.5, 6.3, 5.0]) and Drug Y (IC50: [5.8, 6.5, 7.2, 6.0, 5.9, 6.8]). Do the CIs overlap? What does this suggest?

let drug_x = [5.2, 6.1, 4.8, 5.5, 6.3, 5.0]
let drug_y = [5.8, 6.5, 7.2, 6.0, 5.9, 6.8]

# TODO: Compute CIs for both, then compute CI for the difference
# Note: overlapping CIs do NOT necessarily mean non-significant difference

Exercise 4: Effect of Sample Size

Starting with n = 5 replicates drawn from the IC50 data, increase to n = 10, 20, 50, and 100 (use bootstrap resampling to simulate larger samples). Plot CI width vs sample size.

let ic50 = [11.2, 13.1, 12.8, 10.9, 14.2, 12.0, 11.7, 12.5]

# TODO: For each sample size, bootstrap to simulate, compute CI width
# Plot sample size vs CI width using line_plot

Key Takeaways

  • A confidence interval gives a range of plausible values for a population parameter, not just a point estimate
  • The 95% in “95% CI” refers to the long-run coverage rate of the procedure, not the probability for a specific interval
  • For small samples (n < 30), always use the t-distribution — it accounts for extra uncertainty
  • Bootstrap CIs work for any statistic (median, ratio, fold change) without distributional assumptions
  • CI width shrinks with larger n, lower variability, and lower confidence level
  • A CI for the difference that includes zero means the data are consistent with no difference
  • CIs are more informative than p-values alone: they tell you both significance AND the plausible magnitude of an effect

What’s Next

Tomorrow we formalize the logic behind “ruling out chance” with hypothesis testing. You will learn to frame biological questions as null and alternative hypotheses, compute p-values, and understand the courtroom analogy that makes the whole framework click. Confidence intervals and hypothesis tests are two sides of the same coin — a 95% CI that excludes zero corresponds exactly to a p-value less than 0.05.