Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Day 26: Meta-Analysis — Combining Studies

Day 26 of 30 Prerequisites: Days 6-8, 14, 19 ~60 min reading Evidence Synthesis

The Problem

Three independent clinical trials have tested PCSK9 inhibitors for lowering LDL cholesterol in patients with familial hypercholesterolemia:

StudyNMean LDL Reduction (mg/dL)SE
Trial A (Europe, 2019)250-52.34.1
Trial B (USA, 2020)180-48.75.3
Trial C (Asia, 2021)320-55.13.8

Each study alone has a confidence interval that overlaps with the others. None is definitive. But together, the evidence is overwhelming. The question is: how do you formally combine them? You cannot just average the means — the studies have different sample sizes, different precisions, and were conducted in different populations. You need a method that respects these differences.

Meta-analysis is that method. It provides a rigorous framework for pooling results across studies, weighting each by its precision, quantifying heterogeneity, and assessing whether the pooled estimate is trustworthy. It sits at the top of the evidence hierarchy — above individual RCTs — because it synthesizes all available evidence.

What Is Meta-Analysis?

Meta-analysis is the statistical combination of results from two or more separate studies to produce a single, more precise estimate of an effect. It is not simply “averaging” — it is a weighted combination that accounts for each study’s precision.

Think of it as a vote among experts. If three experts estimate a quantity, you would give more weight to the expert who measured most precisely (smallest uncertainty), and less weight to the expert whose estimate is vague. Meta-analysis formalizes this intuition.

Key insight: Meta-analysis does not combine raw data — it combines summary statistics (effect sizes and their standard errors). This means you can conduct a meta-analysis using published results alone, without accessing any original data.

Why Combine Studies?

  1. Increased precision: Pooling 750 patients across three trials gives a tighter CI than any single trial.
  2. Resolving contradictions: If Study A finds an effect and Study B does not, meta-analysis can determine whether this reflects true heterogeneity or sampling variability.
  3. Generalizability: Studies from Europe, USA, and Asia together provide evidence across populations.
  4. Detecting small effects: An individual study may be underpowered; the pooled analysis may cross the significance threshold.

Fixed-Effects Model

The fixed-effects model assumes that all studies estimate the same true effect. Differences between study results are due to sampling variability alone.

Weighting

Each study is weighted by the inverse of its variance:

w_i = 1 / SE_i^2

The pooled estimate is the weighted mean:

Pooled = Sum(w_i x estimate_i) / Sum(w_i)

The pooled SE is:

SE_pooled = 1 / sqrt(Sum(w_i))

When to Use Fixed Effects

Use fixed effects when you believe the true effect is the same across all studies — for instance, highly standardized lab assays or studies using identical protocols. In practice, this assumption is often too strong.

# Fixed-effects meta-analysis
let studies = ["Trial A", "Trial B", "Trial C"]
let effects = [-52.3, -48.7, -55.1]
let se = [4.1, 5.3, 3.8]

# Manual calculation
let weights = se |> map(|s| 1.0 / (s * s))
let total_weight = sum(weights)
let pooled_effect = sum(zip(weights, effects) |> map(|we| we[0] * we[1])) / total_weight
let pooled_se = 1.0 / sqrt(total_weight)

print("=== Fixed-Effects Meta-Analysis ===")
print("Study weights: " + str(weights |> map(|w| round(w, 2))))
print("Pooled effect: " + str(round(pooled_effect, 2)) + " mg/dL")
print("Pooled SE: " + str(round(pooled_se, 2)))
print("95% CI: [" +
  str(round(pooled_effect - 1.96 * pooled_se, 2)) + ", " +
  str(round(pooled_effect + 1.96 * pooled_se, 2)) + "]")

Random-Effects Model

The random-effects model assumes that studies estimate different but related true effects. Each study’s true effect is drawn from a distribution of effects. The between-study variance (tau-squared) captures how much the true effects vary.

When to Use Random Effects

Use random effects when studies differ in population, dosing, protocol, or outcome definition — which is almost always in biomedical research. Random effects produces wider CIs than fixed effects, reflecting the additional uncertainty from between-study variability.

ModelAssumptionCIsWhen to use
Fixed-effectsSame true effect across studiesNarrowerIdentical protocols, homogeneous studies
Random-effectsTrue effects vary across studiesWiderDifferent populations, protocols, settings

Common pitfall: Some researchers choose between fixed and random effects based on which gives a smaller p-value. This is a form of p-hacking. Choose the model before seeing the results, based on the study designs and populations.

Heterogeneity: Q and I-Squared

Heterogeneity quantifies how much the studies disagree beyond what sampling variability would explain.

Cochran’s Q Statistic

Q = Sum(w_i x (estimate_i - pooled)^2)

Under the null hypothesis of no heterogeneity, Q follows a chi-square distribution with k-1 degrees of freedom (where k is the number of studies). A significant Q (p < 0.10, using a lenient threshold because the test has low power) suggests heterogeneity.

I-Squared

I-squared quantifies the proportion of total variation due to between-study heterogeneity:

I^2 = max(0, (Q - df) / Q) x 100%

I^2Heterogeneity
0-25%Low — studies are consistent
25-50%Moderate — some inconsistency
50-75%Substantial — investigate sources
75-100%Considerable — pooling may be inappropriate
I-Squared: Low vs High Heterogeneity Low I² (10%) — Studies Agree Pooled = -52.5 Study A Study B Study C Study D Study E CIs overlap tightly High I² (82%) — Studies Disagree Pooled = -48.0 Study A Study B Study C Study D Study E CIs are spread widely
# Heterogeneity assessment
let Q = sum(zip(weights, effects) |> map(|we| we[0] * (we[1] - pooled_effect) * (we[1] - pooled_effect)))
let df = len(studies) - 1
let I_squared = max(0, (Q - df) / Q) * 100

print("=== Heterogeneity ===")
print("Q statistic: " + str(round(Q, 2)) + " (df=" + str(df) + ")")
print("I-squared: " + str(round(I_squared, 1)) + "%")

The Forest Plot

The forest plot is the signature visualization of meta-analysis. Each study is a row showing its point estimate (square, sized by weight) and confidence interval (horizontal line). The pooled estimate is a diamond at the bottom. A vertical line at the null (0 for mean differences, 1 for ratios) allows quick assessment of significance.

Forest Plot — PCSK9 Inhibitor Meta-Analysis No Effect (0) -70 -60 0 -40 -30 Mean LDL Reduction (mg/dL) Study Weight Effect Hoffmann 2019 24% -52.3 Martinez 2020 14% -48.7 Chen 2020 28% -55.1 Kumar 2021 18% -50.2 Larsson 2021 16% -53.8 Pooled (RE) 100% -52.5 Study estimate Pooled estimate Favors Drug Favors Placebo
let studies = ["Trial A (2019)", "Trial B (2020)", "Trial C (2021)",
               "Chen (2020)", "Kumar (2021)", "Pooled"]
let effects = [-52.3, -48.7, -55.1, -50.2, -53.8, -52.5]
let ci_lower = [-60.3, -59.1, -62.5, -57.4, -61.2, -55.8]
let ci_upper = [-44.3, -38.3, -47.7, -43.0, -46.4, -49.2]
let weights = [24, 14, 28, 18, 16, 100]

let forest_tbl = zip(studies, effects, ci_lower, ci_upper, weights) |> map(|r| {
  study: r[0], estimate: r[1], ci_lower: r[2], ci_upper: r[3], weight: r[4]
}) |> to_table()

forest_plot(forest_tbl,
  {null_value: 0,
  title: "PCSK9 Inhibitor — LDL Reduction (mg/dL)",
  xlabel: "Mean LDL Reduction (95% CI)"})

Reading the forest plot:

  • If a study’s CI does not cross the null line, that study alone is significant.
  • If the pooled diamond does not cross the null, the combined evidence is significant.
  • Study squares vary in size — larger squares mean more weight (more precise studies).
  • The diamond width shows the CI of the pooled estimate.

Publication Bias and the Funnel Plot

Publication bias occurs when studies with significant results are more likely to be published than studies with null results. This biases meta-analyses toward overestimating effects.

The funnel plot detects this. It plots each study’s effect size (x-axis) against its precision (y-axis, typically 1/SE or sample size). In the absence of bias, the plot should look like an inverted funnel — symmetric around the pooled estimate, with more scatter at the bottom (less precise studies).

Asymmetry suggests bias. If small studies with negative or null results are missing (they were not published), the funnel will be asymmetric — missing studies from the lower-left.

Funnel Plots — Detecting Publication Bias No Bias (Symmetric) Effect Size Standard Error 2 4 6 Publication Bias (Asymmetric) Effect Size Missing studies
let effect_sizes = [-52.3, -48.7, -55.1, -50.2, -53.8]
let standard_errors = [4.1, 5.3, 3.8, 3.7, 3.9]

# Funnel plot: scatter of effect size vs SE
scatter(effect_sizes, standard_errors,
  {title: "Funnel Plot — Publication Bias Assessment",
  xlabel: "LDL Reduction (mg/dL)",
  ylabel: "Standard Error"})

Clinical relevance: Publication bias is a serious concern in pharmaceutical research. A meta-analysis of published antidepressant trials found a pooled effect size of 0.37 (moderate). When unpublished trials obtained through FDA records were included, the effect dropped to 0.15 (small). Publication bias had inflated the apparent efficacy by more than double.

When Meta-Analysis Is Inappropriate

Meta-analysis is not appropriate when:

  1. Studies measure fundamentally different things: Combining a study of aspirin with a study of statins because both are “cardiovascular interventions” is meaningless.
  2. Heterogeneity is too high (I^2 > 75%): If studies genuinely disagree, pooling them hides important differences. Investigate subgroups instead.
  3. Studies are not independent: If three papers report on overlapping patient cohorts, they are not independent studies.
  4. Publication bias is severe: A pooled estimate from biased studies is itself biased — garbage in, garbage out.
  5. Too few studies: Meta-analysis of 2 studies with opposite results tells you very little. At minimum, 3-5 studies are needed.

Common pitfall: “Combining apples and oranges” is the classic criticism. Meta-analysis is appropriate when studies address the same question with similar methods. If studies differ fundamentally, no amount of statistical sophistication makes the pooled result meaningful.

Meta-Analysis in BioLang — Complete Pipeline


# ============================================
# Five studies of PCSK9 inhibitor effect on LDL
# ============================================

let studies = ["Hoffmann 2019", "Martinez 2020", "Chen 2020",
               "Kumar 2021", "Larsson 2021"]
let effects = [-52.3, -48.7, -55.1, -50.2, -53.8]
let se = [4.1, 5.3, 3.8, 3.7, 3.9]
let n_patients = [250, 180, 320, 290, 260]

# ============================================
# 1. Fixed-effects meta-analysis
# ============================================
let weights_fe = se |> map(|s| 1.0 / (s * s))
let total_w = sum(weights_fe)
let pooled_fe = sum(zip(weights_fe, effects) |> map(|p| p[0] * p[1])) / total_w
let se_fe = 1.0 / sqrt(total_w)

print("=== Fixed-Effects ===")
print("Pooled: " + str(round(pooled_fe, 2)) + " [" +
  str(round(pooled_fe - 1.96 * se_fe, 2)) + ", " +
  str(round(pooled_fe + 1.96 * se_fe, 2)) + "]")

# ============================================
# 2. Heterogeneity
# ============================================
let Q = sum(zip(weights_fe, effects) |>
  map(|p| p[0] * (p[1] - pooled_fe) * (p[1] - pooled_fe)))
let df = len(studies) - 1
let I_sq = max(0, (Q - df) / Q) * 100

print("\n=== Heterogeneity ===")
print("Q = " + str(round(Q, 2)) + ", df = " + str(df))
print("I-squared = " + str(round(I_sq, 1)) + "%")

# Estimate tau-squared (between-study variance)
let C = total_w - sum(weights_fe |> map(|w| w * w)) / total_w
let tau_sq = max(0, (Q - df) / C)
print("tau-squared = " + str(round(tau_sq, 2)))

# ============================================
# 3. Random-effects meta-analysis
# ============================================
let weights_re = se |> map(|s| 1.0 / (s * s + tau_sq))
let total_w_re = sum(weights_re)
let pooled_re = sum(zip(weights_re, effects) |> map(|p| p[0] * p[1])) / total_w_re
let se_re = 1.0 / sqrt(total_w_re)

print("\n=== Random-Effects ===")
print("Pooled: " + str(round(pooled_re, 2)) + " [" +
  str(round(pooled_re - 1.96 * se_re, 2)) + ", " +
  str(round(pooled_re + 1.96 * se_re, 2)) + "]")

# ============================================
# 4. Forest plot
# ============================================
let all_ci_lo = range(0, len(effects)) |> map(|i| effects[i] - 1.96 * se[i])
let all_ci_hi = range(0, len(effects)) |> map(|i| effects[i] + 1.96 * se[i])
let all_w_pct = weights_re |> map(|w| round(w / total_w_re * 100, 1))

# Build forest plot table
let rows = range(0, len(studies)) |> map(|i| {
  study: studies[i], estimate: effects[i],
  ci_lower: all_ci_lo[i], ci_upper: all_ci_hi[i], weight: all_w_pct[i]
})
let pooled_row = [{study: "Pooled (RE)", estimate: pooled_re,
  ci_lower: pooled_re - 1.96 * se_re,
  ci_upper: pooled_re + 1.96 * se_re, weight: 100}]
let forest_tbl = concat(rows, pooled_row) |> to_table()

forest_plot(forest_tbl,
  {null_value: 0,
  title: "PCSK9 Inhibitor Meta-Analysis — LDL Reduction",
  xlabel: "Mean LDL Reduction, mg/dL (95% CI)"})

# ============================================
# 5. Funnel plot
# ============================================
# Funnel plot: scatter of effect vs SE
scatter(effects, se,
  {title: "Funnel Plot — Publication Bias",
  xlabel: "LDL Reduction (mg/dL)",
  ylabel: "Standard Error"})

# ============================================
# 6. Study-level summary
# ============================================
print("\n=== Study Summary ===")
print("Study                | Effect  | SE   | Weight(RE)")
print("---------------------|---------|------|----------")
for i in 0..len(studies) {
  let w_pct = round(weights_re[i] / total_w_re * 100, 1)
  print(studies[i] + " | " + str(effects[i]) + " | " +
    str(se[i]) + " | " + str(w_pct) + "%")
}

# ============================================
# 7. Interpretation
# ============================================
print("\n=== Interpretation ===")
if I_sq < 25 {
  print("Heterogeneity is low (I^2 = " + str(round(I_sq, 1)) + "%).")
  print("Studies are consistent. Fixed and random effects agree.")
} else if I_sq < 50 {
  print("Moderate heterogeneity (I^2 = " + str(round(I_sq, 1)) + "%).")
  print("Random-effects model is preferred.")
} else {
  print("Substantial heterogeneity (I^2 = " + str(round(I_sq, 1)) + "%).")
  print("Investigate sources of heterogeneity before trusting the pooled estimate.")
}

Python:

import numpy as np
import matplotlib.pyplot as plt

effects = np.array([-52.3, -48.7, -55.1, -50.2, -53.8])
se = np.array([4.1, 5.3, 3.8, 3.7, 3.9])

# Fixed effects
w = 1 / se**2
pooled_fe = np.average(effects, weights=w)
se_fe = 1 / np.sqrt(w.sum())

# Heterogeneity
Q = np.sum(w * (effects - pooled_fe)**2)
df = len(effects) - 1
I2 = max(0, (Q - df) / Q) * 100

# Random effects (DerSimonian-Laird)
C = w.sum() - (w**2).sum() / w.sum()
tau2 = max(0, (Q - df) / C)
w_re = 1 / (se**2 + tau2)
pooled_re = np.average(effects, weights=w_re)
se_re = 1 / np.sqrt(w_re.sum())

print(f"Fixed: {pooled_fe:.1f} [{pooled_fe-1.96*se_fe:.1f}, {pooled_fe+1.96*se_fe:.1f}]")
print(f"Random: {pooled_re:.1f} [{pooled_re-1.96*se_re:.1f}, {pooled_re+1.96*se_re:.1f}]")
print(f"I²: {I2:.1f}%")

R:

library(meta)

m <- metagen(TE = c(-52.3, -48.7, -55.1, -50.2, -53.8),
             seTE = c(4.1, 5.3, 3.8, 3.7, 3.9),
             studlab = c("Hoffmann", "Martinez", "Chen", "Kumar", "Larsson"),
             sm = "MD")
summary(m)
forest(m)
funnel(m)

# Alternative with metafor
library(metafor)
res <- rma(yi = effects, sei = se, method = "DL")
summary(res)
forest(res)
funnel(res)

Exercises

  1. Fixed vs random. Given the five PCSK9 studies above, compute both fixed-effects and random-effects pooled estimates. How different are they? Based on I-squared, which model is more appropriate?
# Your code: both models, compare, interpret I-squared
  1. Adding a contradictory study. A sixth study (Nakamura 2022, N=150) finds a much smaller effect: -30.5 mg/dL, SE=6.2. Add it to the meta-analysis. How do the pooled estimate, CI width, and I-squared change? Create the updated forest plot.
# Your code: add study, re-run meta-analysis, compare
  1. Publication bias simulation. Simulate 20 studies: true effect = -50, SE drawn from Uniform(3, 8). Then “suppress” all studies with p > 0.05 (simulating publication bias). Run meta-analysis on the remaining studies. Is the pooled estimate biased? Check with a funnel plot.
# Your code: simulate, suppress, meta-analyze, funnel plot
  1. Subgroup analysis. The five studies come from different continents (Europe, USA, Asia). Compute pooled estimates separately for Western (Trials A, B) and Asian (Trials C, D, E) studies. Is there a meaningful difference?
# Your code: subgroup meta-analysis, compare pooled estimates
  1. Hazard ratio meta-analysis. Five survival studies report hazard ratios (log scale) for a new chemotherapy vs standard-of-care. Combine them using random effects and create a forest plot.
let log_hr = [-0.33, -0.22, -0.41, -0.28, -0.35]
let se_log_hr = [0.12, 0.15, 0.10, 0.11, 0.13]
# Your code: meta-analysis on log(HR), forest plot, back-transform to HR

Key Takeaways

  • Meta-analysis formally combines results across studies to produce a more precise pooled estimate, weighted by each study’s precision.
  • Fixed-effects models assume one true effect across studies; random-effects models allow the true effect to vary. Random effects is almost always more appropriate in biomedical research.
  • Cochran’s Q tests for heterogeneity; I-squared quantifies its magnitude. I-squared above 50% warrants investigation before pooling.
  • The forest plot is the standard meta-analysis visualization: study estimates with CIs arranged vertically, pooled estimate as a diamond.
  • The funnel plot assesses publication bias: asymmetry suggests that small negative studies are missing from the literature.
  • Meta-analysis is inappropriate when studies measure different things, heterogeneity is extreme, studies are not independent, or publication bias is severe.
  • Meta-analysis sits at the top of the evidence hierarchy because it synthesizes all available evidence — but it is only as good as the studies it includes.

What’s Next

We have learned to analyze data, but can we trust our analysis? Tomorrow we confront the reproducibility crisis head-on: how to structure your statistical analysis so that it can be re-run perfectly by anyone, at any time. Random seeds, modular scripts, parameter files, and version tracking — the practices that separate publishable science from a pile of scattered scripts.