Day 17: Survival Analysis — Time-to-Event Data
The Problem
Dr. Elena Volkov is a cancer genomicist analyzing overall survival in 250 lung adenocarcinoma patients. She has tumor sequencing data and wants to answer: Do TP53-mutant patients survive longer than TP53 wild-type patients?
Her first attempt: compute the mean survival time for each group and run a t-test. But she immediately hits a wall. 40% of patients are still alive at the end of the study. Their survival time is at least 36 months — but she doesn’t know their actual survival time. Dropping these patients biases the analysis (the longest survivors are removed). Using 36 months as their survival time underestimates it.
This is the problem of censoring, and it requires survival analysis.
What Is Censoring?
Right-censoring occurs when the event of interest (death, relapse, progression) has not yet happened at the time of observation. The patient is lost to follow-up, the study ends, or they die of an unrelated cause.
| Patient | Follow-up | Status | What We Know |
|---|---|---|---|
| A | 24 months | Dead | Survived exactly 24 months |
| B | 36 months | Alive | Survived at least 36 months |
| C | 12 months | Lost | Survived at least 12 months |
| D | 48 months | Dead | Survived exactly 48 months |
Patients B and C are censored — we know a lower bound on their survival, but not the actual value.
Key insight: Censored observations are NOT missing data. They contain real information (“this patient survived at least X months”). Throwing them away wastes data and biases results. Including them as if the event occurred underestimates survival.
Why Standard Methods Fail
| Method | Problem with Censored Data |
|---|---|
| Mean survival | Can’t compute — don’t know censored patients’ true times |
| t-test | Assumes complete observations |
| Linear regression | Can’t handle “at least” values |
| Simple proportions | Ignores timing of events |
The Kaplan-Meier Estimator
The Kaplan-Meier (KM) estimator is the workhorse of survival analysis. It estimates the survival function S(t) = P(survival > t) as a step function that drops at each observed event.
How it works: At each event time tⱼ:
$$\hat{S}(t) = \prod_{t_j \leq t} \left(1 - \frac{d_j}{n_j}\right)$$
Where:
- dⱼ = number of events at time tⱼ
- nⱼ = number at risk just before tⱼ (alive and not yet censored)
Reading a KM curve:
- Y-axis: proportion surviving (starts at 1.0 = 100%)
- X-axis: time
- Steps down: events (deaths)
- Tick marks: censored observations (patients lost but still contribute until that point)
- Steeper drops: periods of high event rate
- Flat plateaus: stable periods
Median Survival
The median survival is the time at which the KM curve crosses 0.50 — the point where half the patients have experienced the event. It is more robust than the mean because it is less affected by a few extreme values.
Clinical relevance: “Median overall survival was 18 months in the treatment arm versus 12 months in the control arm” is the standard way clinical trials report survival data. It’s the single most important number in oncology clinical trials.
The Log-Rank Test
The log-rank test compares survival curves between groups. It asks: “Is the survival experience significantly different between these groups?”
- H₀: The survival curves are identical
- H₁: The survival curves differ
The test compares observed events to expected events (under H₀) at each time point across the entire follow-up period. It gives most weight to later time points.
| Consideration | Detail |
|---|---|
| Assumptions | Proportional hazards (constant HR over time) |
| Power | Best when hazard ratio is constant |
| Limitation | Only tests equality — doesn’t estimate HOW different |
| Multiple groups | Can compare 3+ groups simultaneously |
Common pitfall: The log-rank test can be non-significant even when curves look different, if the difference is early (then converges) or if curves cross. If hazards are not proportional, consider alternatives like the Wilcoxon (Breslow) test, which gives more weight to early events.
Cox Proportional Hazards Model
The Cox PH model is the regression analog for survival data. It models the hazard (instantaneous event rate) as:
$$h(t|X) = h_0(t) \cdot \exp(\beta_1 X_1 + \beta_2 X_2 + \cdots)$$
Where h₀(t) is the baseline hazard and the exponential term scales it by covariates.
The Hazard Ratio
The key output is the hazard ratio (HR):
$$HR = e^{\beta}$$
| HR | Interpretation |
|---|---|
| 1.0 | No difference in hazard |
| 2.0 | Twice the hazard (worse survival) |
| 0.5 | Half the hazard (better survival) |
Key insight: HR = 2.0 does NOT mean “dies twice as fast” or “survives half as long.” It means that at any given time point, the hazard (instantaneous risk of the event) is 2x higher. The relationship between HR and median survival depends on the shape of the baseline hazard.
Proportional Hazards Assumption
The Cox model assumes that the ratio of hazards between groups is constant over time. If TP53-mutant patients have HR = 1.5, this ratio should hold at 6 months, 12 months, and 24 months.
Violations:
- Curves that cross (treatment effect reverses over time)
- HR that changes with time (e.g., surgery risk is high early, then protective later)
- Delayed treatment effects (immunotherapy often shows late separation)
Adjusting for Confounders
Like multiple regression, Cox models can include multiple predictors:
$$h(t) = h_0(t) \cdot \exp(\beta_1 \cdot \text{TP53} + \beta_2 \cdot \text{Age} + \beta_3 \cdot \text{Stage})$$
This gives the HR for TP53 mutation adjusted for age and stage — a cleaner estimate of its independent effect.
Clinical relevance: A univariate HR for TP53 might be 1.8 (worse survival). But if TP53-mutant tumors also tend to be higher stage, the adjusted HR might drop to 1.3 after controlling for stage. The adjusted HR is what matters for understanding TP53’s independent prognostic value.
Survival Analysis in BioLang
Kaplan-Meier Curves
set_seed(42)
# Lung adenocarcinoma survival by TP53 status
let n = 250
# Simulate TP53 status (40% mutant)
let tp53_mut = rnorm(n, 0, 1) |> map(|x| if x > 0.25 { 0 } else { 1 })
# Simulate survival times (exponential with TP53 effect)
let base_hazard = 0.03 # monthly hazard rate
let survival_time = []
let status = [] # 1 = dead, 0 = censored
for i in 0..n {
let hazard = base_hazard * if tp53_mut[i] == 1 { 1.6 } else { 1.0 }
let true_time = -log(rnorm(1, 0.5, 0.2)[0] |> max(0.01)) / hazard
let censor_time = rnorm(1, 42, 10)[0] |> max(24)
if true_time < censor_time {
survival_time = survival_time + [true_time]
status = status + [1] # event observed
} else {
survival_time = survival_time + [censor_time]
status = status + [0] # censored
}
}
print("Events: {status |> sum} / {n} ({(status |> sum) / n * 100 |> round(1)}%)")
print("Censored: {n - (status |> sum)}")
# Fit Kaplan-Meier for each group
let wt_times = []
let wt_status = []
let mut_times = []
let mut_status = []
for i in 0..n {
if tp53_mut[i] == 0 {
wt_times = wt_times + [survival_time[i]]
wt_status = wt_status + [status[i]]
} else {
mut_times = mut_times + [survival_time[i]]
mut_status = mut_status + [status[i]]
}
}
let km_wt = kaplan_meier(wt_times, wt_status)
let km_mut = kaplan_meier(mut_times, mut_status)
print("Median survival (TP53 WT): {km_wt.median |> round(1)} months")
print("Median survival (TP53 mut): {km_mut.median |> round(1)} months")
Kaplan-Meier Plot
# Publication-quality survival curves
# Plot KM data using plot()
let km_table = table({
"time": km_wt.times ++ km_mut.times,
"survival": km_wt.survival ++ km_mut.survival,
"group": km_wt.times |> map(|t| "TP53 WT") ++ km_mut.times |> map(|t| "TP53 Mut")
})
plot(km_table, {type: "line", x: "time", y: "survival", color: "group",
title: "Overall Survival by TP53 Status",
x_label: "Time (months)", y_label: "Survival Probability"})
Log-Rank Test
# Compare survival curves statistically
# Compute log-rank test from KM outputs
# The test compares observed vs expected events across groups
let km_all = kaplan_meier(survival_time, status)
# Use Cox PH as a proxy for log-rank (equivalent for single binary predictor)
let cox_lr = cox_ph(survival_time, status, [tp53_mut])
print("=== Log-Rank Test (via Cox) ===")
print("p-value: {cox_lr.p_value |> round(4)}")
if cox_lr.p_value < 0.05 {
print("Significant difference in survival between TP53 groups")
} else {
print("No significant difference detected (p > 0.05)")
}
Cox Proportional Hazards Model
set_seed(42)
# Simulate additional covariates
let age = rnorm(n, 65, 10)
let stage = rnorm(n, 2.5, 0.8) |> map(|x| max(1, min(4, round(x))))
# Univariate Cox model
let cox_simple = cox_ph(survival_time, status, [tp53_mut])
print("=== Univariate Cox Model ===")
print("TP53 Mutation HR: {cox_simple.hazard_ratio |> round(2)}")
print(" p-value: {cox_simple.p_value |> round(4)}")
# Multivariable Cox model — adjust for age and stage
let cox_adjusted = cox_ph(survival_time, status, [tp53_mut, age, stage])
print("\n=== Multivariable Cox Model ===")
print("Hazard ratios: {cox_adjusted.hazard_ratios}")
print("p-values: {cox_adjusted.p_values}")
# Compare: does TP53 HR change after adjusting for stage?
print("\nTP53 HR unadjusted: {cox_simple.hazard_ratio |> round(2)}")
print("TP53 HR adjusted: {cox_adjusted.hazard_ratios[0] |> round(2)}")
Forest Plot of Hazard Ratios
# Visualize all HRs from the multivariable model
let hr_data = table({
"predictor": ["TP53 Mutation", "Age", "Stage"],
"hr": cox_adjusted.hazard_ratios,
"p_value": cox_adjusted.p_values
})
forest_plot(hr_data)
# Left of 1 = protective, Right of 1 = harmful
Survival Curve from Cox Model
# Plot adjusted survival curves from KM estimates
let surv_table = table({
"time": km_wt.times ++ km_mut.times,
"survival": km_wt.survival ++ km_mut.survival,
"group": km_wt.times |> map(|t| "TP53 WT") ++ km_mut.times |> map(|t| "TP53 Mut")
})
plot(surv_table, {type: "line", x: "time", y: "survival", color: "group",
title: "Adjusted Survival Curves from Cox Model",
x_label: "Time (months)", y_label: "Survival Probability"})
Checking Proportional Hazards
# Check proportional hazards assumption
# If hazard ratios change over time, PH is violated
# Visual check: compare early vs late HR
let early_times = []
let early_status = []
let early_tp53 = []
let late_times = []
let late_status = []
let late_tp53 = []
let midpoint = 24 # months
for i in 0..n {
if survival_time[i] <= midpoint {
early_times = early_times + [survival_time[i]]
early_status = early_status + [status[i]]
early_tp53 = early_tp53 + [tp53_mut[i]]
} else {
late_times = late_times + [survival_time[i]]
late_status = late_status + [status[i]]
late_tp53 = late_tp53 + [tp53_mut[i]]
}
}
print("=== Proportional Hazards Check ===")
print("If HRs differ substantially, PH assumption may be violated")
# If p < 0.05, consider: stratified Cox model, time-varying coefficients,
# or restricted mean survival time (RMST)
Python:
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.statistics import logrank_test
# Kaplan-Meier
kmf = KaplanMeierFitter()
kmf.fit(time_wt, event_wt, label='TP53 WT')
kmf.plot()
kmf.fit(time_mut, event_mut, label='TP53 Mut')
kmf.plot()
# Log-rank test
results = logrank_test(time_wt, time_mut, event_wt, event_mut)
print(f"p = {results.p_value:.4f}")
# Cox model
cph = CoxPHFitter()
cph.fit(df, duration_col='time', event_col='status')
cph.print_summary()
cph.plot() # forest plot
# Check PH assumption
cph.check_assumptions(df, show_plots=True)
R:
library(survival)
library(survminer)
# Kaplan-Meier
km_fit <- survfit(Surv(time, status) ~ tp53, data = df)
# Publication-quality KM plot
ggsurvplot(km_fit,
pval = TRUE, risk.table = TRUE,
palette = c("#2196F3", "#F44336"))
# Log-rank test
survdiff(Surv(time, status) ~ tp53, data = df)
# Cox model
cox_model <- coxph(Surv(time, status) ~ tp53 + age + stage, data = df)
summary(cox_model)
# Forest plot
ggforest(cox_model)
# Proportional hazards test
cox.zph(cox_model)
Exercises
Exercise 1: Kaplan-Meier and Median Survival
A clinical trial follows 200 breast cancer patients treated with either chemotherapy or chemotherapy + targeted therapy. Compute KM curves and median survival for both arms.
set_seed(42)
let n = 200
let treatment = rnorm(n, 0, 1) |> map(|x| if x > 0 { 1 } else { 0 }) # 0=chemo, 1=chemo+targeted
# Simulate survival data
# Targeted therapy reduces hazard by 30%
# Generate survival times and censoring
# 1. Compute kaplan_meier() for each treatment arm
# 2. Plot KM curves with plot()
# 3. Report median survival for each arm
# 4. What is the absolute improvement in median survival?
Exercise 2: Log-Rank Test with Multiple Groups
Compare survival across four molecular subtypes (Luminal A, Luminal B, HER2+, Triple-negative). Which pairs differ significantly?
set_seed(42)
let n = 300
# Assign subtypes: 0=LumA, 1=LumB, 2=HER2+, 3=TNBC
let subtype = rnorm(n, 0, 1) |> map(|x|
if x < -0.39 { 0 }
else if x < 0.25 { 1 }
else if x < 0.64 { 2 }
else { 3 })
# Simulate different hazard rates by subtype
# Luminal A: lowest hazard, Triple-neg: highest
# 1. Compute kaplan_meier() for all 4 subtypes
# 2. Fit cox_ph() with subtype as predictor
# 3. Which subtypes have the best and worst prognosis?
Exercise 3: Multivariable Cox Model
Build a Cox model with TP53 status, age, stage, and smoking status. Determine which factors are independently prognostic after adjustment.
let n = 300
# Simulate covariates and survival
# Fit cox_ph() with all 4 predictors
# 1. Report hazard ratios for each predictor
# 2. Which predictors are significant after adjustment?
# 3. Create a forest_plot()
# 4. Does the TP53 hazard ratio change from univariate to multivariable?
Exercise 4: Checking the PH Assumption
Simulate a scenario where the proportional hazards assumption is violated — an immunotherapy that has no effect in the first 3 months but a strong effect afterward (delayed separation). Show that the PH test flags this.
# Simulate two groups:
# Control: constant hazard
# Treatment: same hazard for months 0-3, then 50% reduced hazard
# 1. Plot KM curves — do they cross or show delayed separation?
# 2. Fit cox_ph() and check if HR changes over time
# 3. Is the PH assumption violated?
# 4. What would you do in practice?
Exercise 5: Complete Survival Analysis Pipeline
Perform a full survival analysis: KM curves, log-rank test, univariate Cox for each predictor, multivariable Cox, forest plot. Write a one-paragraph summary of findings.
# Use 400 patients with 5 clinical/genomic variables
# Run the complete pipeline from raw data to clinical interpretation
Key Takeaways
- Censored observations contain real information — survival analysis methods handle them correctly while standard methods cannot
- Kaplan-Meier estimates survival probabilities as a step function; median survival is the primary summary measure
- The log-rank test compares survival curves between groups (the “t-test” of survival analysis)
- Cox proportional hazards regression models the effect of multiple covariates on hazard; the hazard ratio is the key output
- HR = 2.0 means twice the instantaneous risk, NOT twice as fast to die or half the survival time
- Always check the proportional hazards assumption — violations (crossing curves, delayed effects) invalidate the standard Cox model
- Forest plots are the standard visualization for hazard ratios from Cox models
- Adjusted HRs (controlling for confounders) are more clinically meaningful than unadjusted ones
What’s Next
We’ve been analyzing data. But before collecting data, there’s a critical question: How many samples do you need? Day 18 covers experimental design and statistical power — the science of planning studies that can actually detect the effects you’re looking for.