Understanding the hazard function
Suppose you have fit a distribution to your censored survival times as in the previous post and now want to quantify the intuition of an event being imminent. For example, being able to characterise precisely when your customers are about to churn can help identify problem areas to improve on. This notion is called the hazard. We’ll take a look at some of its main properties and how it related to survival analysis.
Definition
Start with a small \(\delta\)-interval around t and and consider the average probability density of an event ocurring in that interval given that it occurs after t: \(\frac{\mathbb P(t \le T < t + \delta \mid t \le T)}{\delta}\), where \(\mathbb P\) is the probability function. To get rid of the arbitrary choice of \(\delta\), we take the limit as \(\delta\) goes to zero.
\[ h(t) := \lim_{\delta \rightarrow 0} \frac{\mathbb P(t \le T < t + \delta \mid t \le T)}{\delta} . \]
This is well-defined whenever the CDF is differentiable. Although it is defined in terms of probabilities, we will show below that the hazard is not itself a probability density function.
Identities
There are a number of useful properties of the hazard function that make it convenient to work with in survival analysis.
Equivalent definition
The above definition helps us understand the intuition behind the hazard function but there’s an equivalent formulation that can be easier to work with. Using
- the definition of conditional probabilities,
- the definition of a derivative, and
- that \(F'(t) = f(t)\) where \(F\) is the CDF and \(f\) the probability function,
we can show that
\[ \begin{align} h(t) &= \lim_{\delta \rightarrow 0} \frac{\mathbb P(t \le T < t + \delta \mid t \le T)}{\delta} \\ &=\lim_{\delta \rightarrow 0} \frac{\mathbb P(t \le T < t + \delta)}{\mathbb P(t \le T)\delta} \\ &=\lim_{\delta \rightarrow 0} \frac{F(t + \delta) - F(t)}{S(t)\delta} \\ &=\lim_{\delta \rightarrow 0} \frac{F(t + \delta) - F(t)}{\delta} \frac{1}{S(t)} \\ &= F'(t) \frac{1}{S(t)} \\ &= \frac{f(t)}{S(t)}. \end{align} \]
We will show below how to use this to simplify the likelihood in the case of censored observations for measuring time to an event of interest.
Relation with the survival function
Using the identity above, we can rewrite the hazard as a derivative of the survival function:
\[ h(t) = \frac{f(t)}{S(t)} = -\frac{d}{dt} \log S(t) . \]
It then follows from the first fundamental theorem of calculus that
\[ S(t) = e^{-\int_0^t h(s) ds}. \]
In other words, the hazard function completely determines the survival function (and therefore also the mass/density function).
Since the integral of the hazard appears in the above equation, we can give it a definition for easier reference. We define the cumulative hazard as
\[ H(t) := \int_0^t h(s) ds . \]
Since \(\lim_{t \rightarrow \infty} S(t) = 0\), it follows that \(\lim_{t \rightarrow \infty} H(t) = -\lim \log S(t) = \infty\). In particular, this means that the hazard function is NOT a probability density function!
Example
The exponential distribution has constant hazard. To see this, suppose \(h(t) = \lambda\). Then \(S(t) = \exp(-\int_0^t \lambda ds) = \exp(-\lambda t)\) so that \(f(t) = -S'(t) = \lambda \exp(-\lambda t)\), which is the probability function for the exponential distribution.
Hazard in censored survival analysis
In the previous post, we motivated the following likelihood in the case of censored survival times:
\[ L(\theta) := \prod_{i = 1}^N \delta_i f(t_i \mid \theta) \times \prod_{i = 1}^N (1 - \delta_i) S(t_i \mid \theta) \]
where \(\delta_i\) is 1 if the event is observed and 0 if it is censored, and \(\theta\) is the vector of parameters of the distribution of survival times. Since \(f(t) = h(t) S(t)\), we can rewrite this as
\[ L(\theta) := \prod_{i = 1}^N h(t_i \mid \theta)^{\delta_i}S(t_i \mid \theta). \]
Example
Assuming that survival times follow an exponential distribution, the hazard \(h(t) = \lambda\) is constant, and the likelihood is
\[ L(\lambda) = \prod_{i = 1}^N \lambda^{\delta_i} e^{-\lambda t_i} = \lambda^D e^{-\lambda T} \]
where \(D := \sum_1^N \delta_i\) is the total number of events observed and \(T := \sum_1^N t_i\) is the total observation time. This expression has an interesting interpretation as a Poisson likelihood. To see this, first note that \(T\) and \(D\) can be considered constant in our likelihood because they don’t depend on our only parameter \(\lambda\). We can consider \(D\) as a Poisson variable with rate \(\lambda\) and exposure \(T\):
\[ D \mid \lambda \sim \text{Poisson}(\lambda T), \]
which gives probability of observing \(D\) events as
\[ \text{Poisson}(D \mid \lambda T) = (\lambda T)^D e^{-\lambda T} = T^D L(\lambda) . \]
However, likelihoods are equivalent up to a multiplicative constant. Since \(T^D\) is constant we can treat our likelihood, \(L\), as Poisson.