---
title: Survival Analysis Fundamentals
description: Time-to-event data and censoring, Kaplan-Meier estimator, Cox proportional hazards model and partial likelihood, and the proportional hazards assumption.
priority: 0.5
---

# Survival Analysis Fundamentals {#survival-analysis-fundamentals}

This page covers the statistical theory behind the [Survival Analysis](survival-analysis) tabs. See that page for usage instructions.

## Time-to-Event Data and Censoring {#time-to-event-data-and-censoring}

Survival analysis is a set of methods for analyzing time until an event occurs. Despite the name "survival," the event need not be death — it can be machine failure, customer churn, time to recidivism, or any event of interest.

The defining feature of survival data is **censoring**. Subjects who did not experience the event during the observation period (e.g., patients still alive at the end of a clinical trial, or patients lost to follow-up) carry only incomplete information: "the event had not occurred by at least this time."

Simply excluding censored observations biases the analysis toward subjects who experienced the event sooner, underestimating survival times. Treating censored observations as "no event" overstates survival times since the true event time is unknown. Survival analysis methods are designed to handle censoring properly, provided that censoring is independent of event occurrence (independent censoring). Under independent censoring, the censoring carries no additional information about the event (it is non-informative), so the KM estimator and Cox model estimates remain unbiased. When censoring is related to the likelihood of the event — for example, when patients drop out due to worsening side effects — this assumption breaks down and the estimates become biased. MIDAS handles right censoring only (censoring due to end of observation or loss to follow-up). Left censoring (when the event had already occurred by the start of observation but its exact time is unknown) and interval censoring (when the event time can only be placed between two observation times) are not supported.

### Why Ordinary Regression Fails {#why-ordinary-regression-fails}

Without censoring, survival times could be analyzed as a response variable in ordinary regression. But censored data provides inequality information — "the true value is at least as large as the observed value" — and the usual residual ($y_i - \hat{y}_i$) cannot be defined. Survival analysis incorporates this inequality into the likelihood function, correctly accounting for censoring.

## Survival Function and Hazard Function {#survival-function-and-hazard-function}

The distribution of survival time $T$ is characterized by two functions.

The **survival function** $S(t) = P(T > t)$ is the probability of not having experienced the event by time $t$. It starts at $S(0) = 1$ and decreases monotonically over time.

The **hazard function** $h(t)$ is the instantaneous rate of event occurrence at time $t$, given survival up to that point:

$$
h(t) = \lim_{\Delta t \to 0} \frac{P(t \le T < t + \Delta t \mid T \ge t)}{\Delta t}
$$

The hazard is a rate (per unit time), not a probability, so it can exceed 1. The survival and hazard functions are related by $S(t) = \exp\left(-\int_0^t h(u)\,du\right)$; knowing one determines the other.

## Kaplan-Meier Estimator {#kaplan-meier-estimator}

The Kaplan-Meier estimator is a nonparametric estimator of the survival function. It makes no distributional assumptions, estimating $S(t)$ directly from observed event times.

Let the distinct event times be $t_1 < t_2 < \cdots < t_k$, with $n_i$ subjects at risk and $d_i$ events at each time $t_i$:

$$
\hat{S}(t) = \prod_{t_i \le t} \left(1 - \frac{d_i}{n_i}\right)
$$

This cumulatively multiplies the "survival fraction" at each event time. Censoring is reflected through changes in the risk set: when a subject is censored, they leave the risk set but are not counted as an event.

When multiple events occur at the same time (ties), $d_i$ is the total number of events at that time and $n_i$ is the size of the risk set immediately before that time.

Under non-informative censoring, the KM estimator is a consistent estimator of $S(t)$. However, near the end of follow-up, censoring reduces the risk set, increasing variance and making the estimate unstable. The variance is estimated using Greenwood's formula, derived via the delta method:

$$
\widehat{\operatorname{Var}}[\hat{S}(t)] = \hat{S}(t)^2 \sum_{t_i \le t} \frac{d_i}{n_i(n_i - d_i)}
$$

Confidence intervals are constructed from this variance. Writing $\text{SE}$ for the standard error of $\hat{S}(t)$ from Greenwood's formula, the standard error of $\log \hat{S}(t)$ is $\text{SE}/\hat{S}(t)$ by the delta method. MIDAS uses the log transformation method, computing $\exp\!\bigl(\log \hat{S}(t) \pm z \cdot \text{SE}/\hat{S}(t)\bigr)$. The log transformation prevents the interval from falling outside $[0, 1]$.

## RMST (Restricted Mean Survival Time) {#rmst}

RMST is the area under the survival function up to a restriction time $\tau$.

$$
\text{RMST}(\tau) = \int_0^\tau S(t)\,dt
$$

It equals $E[\min(T, \tau)]$, the expected survival time when follow-up is restricted to $\tau$. Unlike the hazard ratio, RMST does not require the proportional hazards assumption.

Since the KM estimator $\hat{S}(t)$ is a step function, the integral reduces to a sum of rectangles. Let $t_1 < t_2 < \cdots < t_m$ be the event times at or before $\tau$, with $t_0 = 0$, $\hat{S}(t_0) = 1$, and $t_{m+1} = \tau$:

$$
\widehat{\text{RMST}}(\tau) = \sum_{i=0}^{m} \hat{S}(t_i)\bigl(\min(t_{i+1}, \tau) - t_i\bigr)
$$

### Variance {#rmst-variance}

Variation in the KM estimator at each event time propagates to RMST through the remaining area under the curve. Let $A(t_i) = \int_{t_i}^{\tau} \hat{S}(t)\,dt$ (the area under the KM curve from $t_i$ to $\tau$):

$$
\widehat{\text{Var}}\bigl[\widehat{\text{RMST}}(\tau)\bigr] = \sum_{t_i \le \tau} A(t_i)^2 \frac{d_i}{n_i(n_i - d_i)}
$$

The term $d_i / \bigl(n_i(n_i - d_i)\bigr)$ is the same as in Greenwood's formula and captures the variation of the KM estimator at time $t_i$. Each time point's contribution is weighted by $A(t_i)^2$, so earlier event times contribute more to the RMST variance.

RMST is asymptotically normal as $n \to \infty$. The confidence interval is therefore Wald-type: $\widehat{\text{RMST}} \pm z_{\alpha/2} \cdot \text{SE}$.

### Group Differences {#rmst-difference}

For the RMST difference between two groups, $\Delta = \text{RMST}_1 - \text{RMST}_2$, the groups are independent, so the variance is the sum of each group's variance.

$$
\text{Var}(\hat\Delta) = \text{Var}(\widehat{\text{RMST}}_1) + \text{Var}(\widehat{\text{RMST}}_2)
$$

The confidence interval is Wald-type. For three or more groups, pairwise differences and confidence intervals are computed without multiplicity adjustment.

## Cox Proportional Hazards Model {#cox-proportional-hazards-model}

### Model Formulation {#model-formulation}

The [Cox (1972)](#ref-cox-1972) proportional hazards model is a semiparametric model that estimates the effect of covariates on hazard:

$$
h(t \mid X) = h_0(t) \exp(\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p)
$$

$h_0(t)$ is the baseline hazard (the hazard when all covariates are zero), and $\exp(\beta_j)$ is the hazard ratio for a one-unit increase in covariate $X_j$.

It is called "semiparametric" because $\beta$ is estimated parametrically, but no functional form is specified for $h_0(t)$. This removes the need to assume a distribution for the baseline hazard. After estimating $\beta$, the cumulative baseline hazard $H_0(t)$ can be estimated nonparametrically, which in turn allows computing the survival function $S(t|X)$ for specific covariate values:

$$
S(t \mid X) = \exp\!\bigl(-H_0(t) \cdot \exp(\beta' X)\bigr)
$$

where $H_0(t) = \int_0^t h_0(u)\,du$. The estimator handles ties with the same Efron method used to estimate the coefficients. MIDAS outputs $H_0(t)$ and $S(t|X)$ at user-specified covariate values.

The covariates $X$ in this model are fixed values for each subject throughout the observation period. Handling covariates that change over time (time-varying covariates) requires extensions that MIDAS does not currently support.

### Proportional Hazards Assumption {#proportional-hazards-assumption}

The core assumption is that covariate effects are constant over time. That is, the hazard ratio $h(t \mid X_1) / h(t \mid X_2) = \exp(\beta'(X_1 - X_2))$ does not depend on $t$.

When this assumption is violated (e.g., a treatment effect that fades over time), the estimated $\beta$ represents a weighted average of the time-varying effect, with weights that depend on the risk set composition and baseline hazard, making interpretation difficult. The weights of this average depend on the shape of the baseline hazard and cannot be determined in advance, so what the estimate is actually estimating is unknowable ([Struthers & Kalbfleisch, 1986](#ref-struthers-1986)).

#### Schoenfeld Residuals {#schoenfeld-residuals}

Schoenfeld residuals are used to diagnose potential violations of the proportional hazards assumption. They are defined at each event time $t_{(i)}$ for each covariate $j$:

$$
r_{ij} = X_{ij} - \bar{X}_j(t_{(i)})
$$

$X_{ij}$ is the value of covariate $j$ for the subject who experienced the event. $\bar{X}_j(t_{(i)})$ is the weighted mean of covariate $j$ over the risk set, defined as:

$$
\bar{X}_j(t_{(i)}) = \frac{\sum_{k \in \mathcal{R}(t_{(i)})} X_{kj} \exp(X_k'\hat\beta)}{\sum_{k \in \mathcal{R}(t_{(i)})} \exp(X_k'\hat\beta)}
$$

Here $k$ ranges over subjects in the risk set $\mathcal{R}(t_{(i)})$, $X_k$ is the covariate vector for subject $k$, and $X_k'\hat\beta$ is the linear predictor. The weight $\exp(X_k'\hat\beta)$ is subject-specific — subjects with higher hazard contribute more — and is the same for all covariates $j$.

Under the Breslow approximation for ties, the sum of Schoenfeld residuals equals the score function (the gradient of the log partial likelihood with respect to $\beta$) ([Schoenfeld, 1982](#ref-schoenfeld-1982)). At the MLE $\hat\beta$ the score function is zero, so the sum of residuals is also zero. MIDAS uses the Efron method for ties, so this equality does not hold exactly, but the sum is close to zero within convergence tolerance.

Scaled Schoenfeld residuals adjust the raw residuals by the variance-covariance matrix so that they can be interpreted asymptotically as estimates of $\beta_j(t)$:

$$
r^*_{ij} = d \sum_{k=1}^{p} \hat{V}_{jk} \, r_{ik} + \hat\beta_j
$$

where $d$ is the total number of events, and $\hat{V}_{jk}$ is the $(j,k)$ entry of $\hat{V} = \hat{I}(\hat\beta)^{-1}$ (the estimated $p \times p$ variance-covariance matrix). The factor $d \cdot \hat{V}$ adjusts the scale of the asymptotic covariance matrix. $r^*_{ij}$ can be interpreted as an estimate of $\beta_j(t_{(i)})$. Under proportional hazards, $r^*_{ij}$ shows no systematic trend over time. Individual values have high variance, so the residuals are plotted against time and smoothed (e.g., with LOESS) to assess trends ([Grambsch & Therneau, 1994](#ref-grambsch-1994)).

MIDAS displays the following diagnostics ([usage](survival-analysis#ph-diagnostics)):

- **Proportional Hazards Diagnostics**: Displays the Pearson correlation (rho) between scaled Schoenfeld residuals and a KM-based time transform for each covariate
- **Scaled Schoenfeld residual plots**: Plots $r^*_{ij}$ against time with a LOESS smooth
- **log(-log(S(t))) plot**: Plots Kaplan-Meier estimates as $\log(-\log(S(t)))$ versus $\log(t)$ by group. Under proportional hazards, the curves should be approximately parallel

The rho uses the KM-based time transform $g(t)=1-\hat{S}(t^-)$ as its time axis, while the residual plots use raw time on the horizontal axis. Since $g(t)$ is a monotonically increasing transform of time, the direction of any trend agrees, but the axis scales differ.

### Partial Likelihood {#partial-likelihood}

Cox model parameters are estimated using partial likelihood. For subject $i$ who experienced an event at time $t_{(i)}$, consider the conditional probability that subject $i$ — among all subjects still at risk at that time $\mathcal{R}(t_{(i)})$ — is the one who experiences the event:

$$
L(\beta) = \prod_{i:\text{event}} \frac{\exp(X_i'\beta)}{\sum_{j \in \mathcal{R}(t_{(i)})} \exp(X_j'\beta)}
$$

Each factor corresponds to the conditional probability $h(t_{(i)}|X_i) / \sum_j h(t_{(i)}|X_j)$ within the risk set at time $t_{(i)}$. Substituting $h(t|X) = h_0(t)\exp(X'\beta)$, the $h_0(t_{(i)})$ terms cancel between numerator and denominator, so estimating $\beta$ does not require knowing $h_0(t)$. The formula above assumes distinct event times. When multiple events occur at the same time (ties), the conditional probability is not uniquely defined, requiring an approximation. The Breslow method applies the same risk set to each tied event; the Efron method progressively reduces the risk set among tied events and is generally more accurate. MIDAS uses the Efron method.

Although the partial likelihood is not a full likelihood, it has been shown to yield estimators with the same asymptotic properties as maximum likelihood — consistency and asymptotic normality ([Cox, 1975](#ref-cox-1975); [Andersen & Gill, 1982](#ref-andersen-gill-1982)).

### Interpreting Hazard Ratios {#interpreting-hazard-ratios}

Holding other covariates constant, $\exp(\beta_j)$ is interpreted as the hazard ratio (HR):

- HR > 1: A one-unit increase in $X_j$ increases the hazard by $(\text{HR} - 1) \times 100\%$
- HR < 1: The hazard decreases by $(1 - \text{HR}) \times 100\%$
- HR = 1: $X_j$ has no effect on the hazard

The width of the confidence interval reflects estimation precision: a narrow interval indicates a more precise estimate, while a wide interval indicates limited information from the data. Hazard ratios convey the direction and magnitude of the effect together with the confidence interval, making them more informative than the p-value alone.

### Model Fit Metrics {#model-fit-metrics}

Cox regression reports discrimination and model comparison metrics.

#### Concordance Index {#concordance-index}

The concordance index (Harrell's C) measures how well the model predicts the ordering of survival times.

Among comparable pairs (where one subject experienced an event while the other was still at risk), concordance is the proportion where the risk score ordering $\hat\eta_i = X_i\hat\beta$ agrees with the event ordering. A value of 0.5 indicates no discrimination (equivalent to random prediction); 1.0 indicates perfect discrimination.

The standard error is estimated via the influence function (infinitesimal jackknife). For each observation $i$, the influence is $\delta_i = (c_i - C \cdot n_i) / N$, where $c_i$ is the number of concordant pairs involving observation $i$, $n_i$ is the number of comparable pairs, $C$ is the concordance, and $N$ is the total number of comparable pairs. Then $SE = \sqrt{\sum \delta_i^2}$.

#### AIC {#cox-aic}

$$
\text{AIC} = -2\ell(\hat\beta) + 2p
$$

where $\ell(\hat\beta)$ is the partial log-likelihood and $p$ is the number of covariates. Lower values indicate a better balance between fit and parsimony. Use AIC to compare models with different covariate sets.

## See also {#see-also}

- **[Survival Analysis](survival-analysis)** - Usage instructions and interpreting results
- **[Tutorial: Kaplan-Meier Analysis](tutorial-kaplan-meier)** - A practical example with sample data

## References {#references}

- <span id="ref-cox-1972">Cox, D. R. (1972). Regression models and life-tables. *Journal of the Royal Statistical Society: Series B*, 34(2), 187-220. https://www.jstor.org/stable/2985181</span>
- <span id="ref-kaplan-meier-1958">Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. *Journal of the American Statistical Association*, 53(282), 457-481. https://www.jstor.org/stable/2281868</span>
- <span id="ref-cox-1975">Cox, D. R. (1975). Partial likelihood. *Biometrika*, 62(2), 269-276. https://www.jstor.org/stable/2335362</span>
- <span id="ref-andersen-gill-1982">Andersen, P. K., & Gill, R. D. (1982). Cox's regression model for counting processes: A large sample study. *The Annals of Statistics*, 10(4), 1100-1120. https://www.jstor.org/stable/2240714</span>
- <span id="ref-struthers-1986">Struthers, C. A., & Kalbfleisch, J. D. (1986). Misspecified proportional hazard models. *Biometrika*, 73(2), 363-369. https://www.jstor.org/stable/2336212</span>
- <span id="ref-schoenfeld-1982">Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression model. *Biometrika*, 69(1), 239-241. https://www.jstor.org/stable/2335876</span>
- <span id="ref-grambsch-1994">Grambsch, P. M., & Therneau, T. M. (1994). Proportional hazards tests and diagnostics based on weighted residuals. *Biometrika*, 81(3), 515-526. https://www.jstor.org/stable/2337123</span>