Kaplan–Meier estimates the probability of not having had the event by time \(t\)—the survival function \(S(t)\)—using only the observed event times and who was still at risk at each moment. The result is a step function: it stays flat between event times and drops at each time someone has the event.
Two things per subject:
| What | Meaning |
|---|---|
| Time | Time from start until the event or until follow-up ended (censoring). |
| Event | 1 = event happened at that time; 0 = censored (event not seen). |
Censored means we don’t know what happened after that time (e.g. lost to follow-up, study ended). KM uses them in the “at risk” count until their time, then they drop out.
Suppose we have 8 people. Time is in months; 1 = event (e.g. relapse), 0 = censored.
| id | time | event |
|---|---|---|
| 1 | 2 | 1 |
| 2 | 3 | 0 |
| 3 | 4 | 1 |
| 4 | 4 | 1 |
| 5 | 5 | 0 |
| 6 | 6 | 1 |
| 7 | 8 | 1 |
| 8 | 10 | 0 |
Event times are 2, 4, 6, and 8 months. One person is censored at 3, one at 5, one at 10 (they don’t create a “step” in the curve).
At each event time \(t_i\), we know how many had the event there (\(d_i\)) and how many were still at risk just before (\(n_i\)). The probability of surviving past that moment is \(1 - d_i/n_i\). The Kaplan–Meier estimator is the product of these terms up to time \(t\):
\[ \hat{S}(t) = \prod_{t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right) \]
So we start at 1 (100% “surviving”) and multiply by \((1 - d_i/n_i)\) at each event time. Censored subjects stay in \(n_i\) until their censoring time, then leave the at-risk set.
We order by time and at each event time compute \(n_i\) (at risk), \(d_i\) (events), and the running product \(\hat{S}(t)\):
| Event time t | At risk n | Events d | S(t) | S(t) % |
|---|---|---|---|---|
| 2 | 8 | 1 | 0.875 | 87.5% |
| 4 | 6 | 2 | 0.583 | 58.3% |
| 6 | 3 | 1 | 0.389 | 38.9% |
| 8 | 2 | 1 | 0.194 | 19.4% |
Interpretation: just before the first event (t=2), all 8 are at risk. One has the event, so the proportion surviving that moment is \(1 - 1/8 = 0.875\). So \(\hat{S}(2) = 0.875\). By t=4, one person was censored at 3, so 7 are at risk; 2 have the event at 4, so we multiply by \(1 - 2/7\). That gives \(\hat{S}(4) = 0.875 \times (5/7) \approx 0.625\). The table continues the same way.
The survival curve is a step function:
So the numbers in the table are exactly the heights of the
curve at and after each event time. Here we produce the curve with
trcpetc: estimate_cif_km() does the same
calculation, and show_surv() draws it.
Kaplan–Meier survival curve (trcpetc). Steps at event times (2, 4, 6, 8); censored times (3, 5, 10) do not create steps.
The step down at 2, 4, 6, and 8 matches the calculation table above. Creating a cleaner example so the calculation table is clear: <|tool▁calls▁begin|><|tool▁call▁begin|> StrReplace