Tutorial: Survival Analysis with the Kaplan-Meier Method
This tutorial walks through a Kaplan-Meier survival analysis from start to finish, using heart failure clinical records included as sample data in MIDAS. No installation or coding is needed — everything runs in your browser. Open the sample data in MIDAS to follow along.
You have clinical records for 299 patients diagnosed with heart failure, tracking their survival status over a follow-up period. In this tutorial, you will estimate survival curves using the Kaplan-Meier method and compare survival curves grouped by patient characteristics (anaemia, high blood pressure) using RMST (Restricted Mean Survival Time).
- Load the sample data and examine its structure
- Understand the key feature of survival data: censoring
- Estimate the overall survival curve
- Compare survival curves between patients with and without anaemia
- Interpret the RMST results
- Explore other grouping variables
Load the data
On the launcher screen, click Heart Failure in the Sample Data section. A project is created and the data is loaded.
This dataset contains clinical records of heart failure patients collected at the Faisalabad Institute of Cardiology (Pakistan) in 2015 (Chicco & Jurman, 2020).
Examine the data structure
Open the Data Table tab. You will see 299 rows and 13 columns.

The key columns for survival analysis fall into three categories.
Time and event variables
| Column | Description |
|---|---|
time | Follow-up period in days. The number of days from diagnosis to the last observation (death or censoring) |
DEATH_EVENT | Whether the patient died during follow-up. 1 = death, 0 = alive (censored) |
Patient characteristics (used for grouping)
| Column | Description |
|---|---|
age | Age in years |
anaemia | Presence of anaemia (0: No, 1: Yes) |
diabetes | Presence of diabetes (0: No, 1: Yes) |
high_blood_pressure | Presence of hypertension (0: No, 1: Yes) |
sex | Sex (0: Female, 1: Male) |
smoking | Smoking status (0: No, 1: Yes) |
Laboratory values
The remaining 5 columns (creatinine_phosphokinase, ejection_fraction, platelets, serum_creatinine, serum_sodium) are blood test results. They are not used in this tutorial but can serve as covariates in Cox regression.
What is censoring?
Of the 299 patients, some died during follow-up (DEATH_EVENT = 1) and others were still alive when follow-up ended (DEATH_EVENT = 0). The latter are called censored observations.
A censored patient's survival time is at least time days, but when the event will eventually occur is unknown.
If you simply excluded censored patients, you would lose the information from patients who survived for long periods without an event, estimating survival times only from those who died. This underestimates survival. The Kaplan-Meier method accounts for censoring by incorporating the "survived at least this long" information into the risk set calculation.
For this estimation to be valid, censoring must be independent of the likelihood of experiencing the event (non-informative censoring). For the mathematical treatment of censoring, see Survival Analysis Fundamentals.
Estimate the overall survival curve
Select Analysis > Survival Analysis > Kaplan-Meier... from the menu bar. The Kaplan-Meier tab opens.
Set variables
- Time Variable: select
time - Event Variable: select
DEATH_EVENT
Leave Group Variable (Optional) empty.

Click Run Analysis.

Read the survival curve
The horizontal axis shows follow-up time (days) and the vertical axis shows survival probability . The Kaplan-Meier method does not assume a distributional form — it estimates survival probability directly at each event time, producing a step function that drops when a death occurs. The + marks on the curve indicate censoring times — points where subjects were lost to follow-up. The shaded band around the curve is the 95% confidence interval for the estimated survival probability at each time point (a pointwise interval constructed independently at each time, not a simultaneous band covering the entire curve).
Check Summary Statistics
| Item | Meaning |
|---|---|
| n | Number of subjects (299) |
| Events | Number of deaths |
| Median | Median survival time |
| 95% CI | Confidence interval for the median |
The median is the time point where the survival curve crosses the line. It represents when half of the subjects have experienced the event, and is widely used as a summary measure of survival. If does not fall below 0.5 within the observation period, the median is displayed as NR (Not Reached).
Compare survival curves by anaemia status
Next, examine whether survival differs between patients with and without anaemia.
Set the Group Variable
Select anaemia from the Group Variable (Optional) dropdown and click Run Analysis.

Two survival curves appear: anaemia = 0 (no anaemia) and anaemia = 1 (anaemia present).
Read the curves
The gap between the two curves at each time point is the estimated difference in survival probability between groups. The confidence bands represent the estimation precision of each group's survival function; overlap between bands does not indicate whether the groups differ, because each band is constructed independently for that group and has a different structure from a confidence interval for the difference. Use RMST to compare groups quantitatively.
Interpret the RMST results
Below the curves, the RMST (Restricted Mean Survival Time) results appear for each group.

RMST is the area under the Kaplan-Meier curve from 0 to a restriction time , estimating the average survival time up to . RMST does not require the proportional hazards assumption and remains interpretable even when survival curves cross (details).
| Column | Description |
|---|---|
| Group | Group name |
| RMST | Restricted mean survival time estimate. Computed as the area under the KM curve up to |
| SE | Standard error, based on the Greenwood variance |
| 95% CI | 95% confidence interval for RMST |
When there are two or more groups, an RMST Difference table appears below. It shows the pairwise difference in RMST, its SE, and confidence interval. Read the magnitude of the difference and its uncertainty from the point estimate and the width of the confidence interval. For example, if the estimated difference is 15 days with a 95% CI of [3, 27], the average survival time up to is estimated to differ by approximately 15 days, and the range from 3 to 27 days represents the uncertainty of that estimate. For three or more groups, the table heading changes to RMST Difference (Unadjusted) and the per-pair confidence intervals are unadjusted for multiplicity.
Number at Risk table
The Number at Risk table below the curve shows how many patients remain in the risk set (neither dead nor censored) at each time point.
The numbers decrease over time as patients leave the risk set through both death and censoring. At time points where few patients remain, the survival estimate becomes less precise and the confidence band widens.
Compare by other variables
Follow the same steps to compare by high blood pressure (high_blood_pressure) or smoking (smoking).

You can compute RMST with different group variables, but trying multiple grouping variables is hypothesis generation. Report findings from exploration as exploratory analysis.
Kaplan-Meier can only handle one grouping variable at a time. To consider multiple factors simultaneously, use the Cox proportional hazards model. For example, you can assess the effect of anaemia on survival while accounting for differences in age. See Survival Analysis for instructions.
Add results to a report
To save the survival curve for a paper or presentation, click the Add to Report button. In the dialog that appears, select an existing report or create a new one, and the survival curve is added to that report.
See Reports for details on working with reports.
Summary
- Survival data structure: You need a time variable (follow-up period) and an event variable (death/censoring)
- Censoring: Patients alive at the end of follow-up are included in the analysis as "survived at least this long"
- Survival curve estimation: The Kaplan-Meier method estimates the survival curve directly from observed data without assuming a distribution
- Group comparison: Setting a Group Variable produces group-specific survival curves and enables comparison via RMST differences
For the mathematical background of survival analysis, see Survival Analysis Fundamentals.
References
- Chicco, D., & Jurman, G. (2020). Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making, 20, 16. https://doi.org/10.1186/s12911-020-1023-5
- Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282), 457-481. https://www.jstor.org/stable/2281868
Also available as a Markdown file.