Tutorial: Kaplan-Meier Survival Curves with Heart Failure Data
You have clinical records for 299 patients diagnosed with heart failure, tracking their survival status over a follow-up period. In this tutorial, you will estimate survival curves using the Kaplan-Meier method and compare survival curves grouped by patient characteristics (anaemia, high blood pressure) using the Log-rank test.
- Load the sample data and examine its structure
- Understand the key feature of survival data: censoring
- Estimate the overall survival curve
- Compare survival curves between patients with and without anaemia
- Interpret the Log-rank test results
- Explore other grouping variables
Load the data
On the launcher screen, click Heart Failure in the Sample Data section. A project is created and the data is loaded.
This dataset contains clinical records of heart failure patients collected at the Faisalabad Institute of Cardiology (Pakistan) in 2015 (Chicco & Jurman, 2020).
Examine the data structure
Open the Data Table tab. You will see 299 rows and 13 columns.

The key columns for survival analysis fall into three categories.
Time and event variables
| Column | Description |
|---|---|
time | Follow-up period in days. The number of days from diagnosis to the last observation (death or censoring) |
DEATH_EVENT | Whether the patient died during follow-up. 1 = death, 0 = censored (alive at the end of follow-up) |
Patient characteristics (used for grouping)
| Column | Description |
|---|---|
age | Age in years |
anaemia | Presence of anaemia (0: No, 1: Yes) |
diabetes | Presence of diabetes (0: No, 1: Yes) |
high_blood_pressure | Presence of hypertension (0: No, 1: Yes) |
sex | Sex (0: Female, 1: Male) |
smoking | Smoking status (0: No, 1: Yes) |
Laboratory values
The remaining 5 columns (creatinine_phosphokinase, ejection_fraction, platelets, serum_creatinine, serum_sodium) are blood test results. They are not used in this tutorial but can serve as covariates in Cox regression.
What is censoring?
Of the 299 patients, some died during follow-up (DEATH_EVENT = 1) and others were still alive when follow-up ended (DEATH_EVENT = 0). The latter are called censored observations.
A censored patient's survival time is at least time days, but when the event will eventually occur is unknown.
If you simply excluded censored patients, you would lose the information from patients who survived for long periods without an event, estimating survival times only from those who died. This underestimates survival. The Kaplan-Meier method accounts for censoring by incorporating the "survived at least this long" information into the risk set calculation.
For the mathematical treatment of censoring, see Survival Analysis Fundamentals.
Estimate the overall survival curve
Select Analysis > Survival Analysis > Kaplan-Meier... from the menu bar. The Kaplan-Meier tab opens.
Set variables
- Time Variable: select
time - Event Variable: select
DEATH_EVENT
Leave Group Variable empty.

Click Run Analysis.

Read the survival curve
The horizontal axis shows follow-up time (days) and the vertical axis shows survival probability . The Kaplan-Meier method does not assume a distributional form — it estimates survival probability directly at each event time, producing a step function that drops when a death occurs. The + marks on the curve indicate censoring times — points where subjects were lost to follow-up. The shaded band around the curve is the 95% confidence interval.
Check Summary Statistics
| Item | Meaning |
|---|---|
| n | Number of subjects (299) |
| Events | Number of deaths |
| Median | Median survival time |
The median is the time point where the survival curve crosses the line. It represents when half of the subjects have experienced the event, and is widely used as a summary measure of survival.
Compare survival curves by anaemia status
Next, examine whether survival differs between patients with and without anaemia.
Set the Group Variable
Select anaemia from the Group Variable dropdown and click Run Analysis.

Two survival curves appear: anaemia = 0 (no anaemia) and anaemia = 1 (anaemia present).
Read the curves
The gap between the two curves at each time point is the estimated difference in survival probability between groups. The confidence bands represent the estimation precision of each group's survival function; overlap between bands does not indicate whether the groups differ. Use the Log-rank test to compare groups.
Interpret the Log-rank test
When a Group Variable is specified, the Log-rank test results appear below the curves.
The null hypothesis of the Log-rank test is that the two survival curves are the same.
| Item | Description |
|---|---|
| Chi-squared | Test statistic. Computed by aggregating the differences between observed and expected deaths in each group at each event time. Approximately follows a chi-squared distribution with df degrees of freedom under the null hypothesis |
| df | Degrees of freedom of the chi-squared distribution. Equal to the number of groups minus 1 (1 for two groups) |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one computed from the data, assuming the null hypothesis is true. Compare against a pre-specified significance level to decide whether to reject the null hypothesis |

The detailed table for each group shows Observed (actual deaths) and Expected (deaths expected under the null hypothesis).
- O/E > 1: more deaths than expected (lower survival)
- O/E < 1: fewer deaths than expected (higher survival)
Number at Risk table
The Number at Risk table below the curve shows how many patients remain in the risk set (neither dead nor censored) at each time point.
The numbers decrease over time as patients leave the risk set through both death and censoring. At time points where few patients remain, the survival estimate becomes less precise and the confidence band widens.
Compare by other variables
Follow the same steps to compare by high blood pressure (high_blood_pressure) or smoking (smoking).

You can run the Log-rank test with different group variables, but trying multiple grouping variables is hypothesis generation, not testing. Repeating the test introduces a multiple testing problem. Report findings from exploration as exploratory analysis; to test those hypotheses, use independently collected data.
Kaplan-Meier can only handle one grouping variable at a time. To consider multiple factors simultaneously (for example, to assess the effect of anaemia adjusted for age), use the Cox proportional hazards model. See Survival Analysis for instructions.
Add results to a report
To save the survival curve for a paper or presentation, click the Add to Report button. The curve is added to the report.
See Reports for details on working with reports.
Summary
- Survival data structure: You need a time variable (follow-up period) and an event variable (death/censoring)
- Censoring: Patients alive at the end of follow-up are included in the analysis as "survived at least this long"
- Survival curve estimation: The Kaplan-Meier method estimates the survival curve directly from observed data without assuming a distribution
- Group comparison: Setting a Group Variable produces group-specific survival curves and a Log-rank test
For the mathematical background of survival analysis, see Survival Analysis Fundamentals.
References
- Chicco, D., & Jurman, G. (2020). Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making, 20, 16.
- Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282), 457-481.