If the populations from which data for a survival test were sampled violate one or more of the survival test assumptions, the results of the analysis may be incorrect or misleading. For example, if the assumption of independence of censoring times is violated, then the results for the survival test may be biased and unreliable. If there are factors unaccounted for in the analysis that affect survival and/or censoring times, then the survival test may give inappropriate results unless the data are stratified to reflect the factor(s).
Some small violations may have little practical effect on the analysis, while other violations may render the survival test results uselessly incorrect or uninterpretable. In particular, small sample sizes may increase the effect of assumption violations. Heavy censoring or crossing hazard or survival functions may also affect the reliability of the survival tests.
Potential assumption violations include:
- Implicit factors: lack of independence within the sample
- Lack of independence of censoring: lack of independence of censoring
- Many censored values: problems caused by a large number of censored values
- Special problems with small sample sizes
- Special problems with crossing hazard or survival functions
- Patterns in plots of data: detecting violations of assumptions graphically
- Implicit factors:
- Lack of independence within a sample is often caused by the existence of an implicit factor in the data. For example, if we are measuring survival times for cancer patients, diet may be correlated with survival times. If we do not collect data on the implicit factor(s) (diet in this case), and the implicit factor has an effect on survival times, then we in effect no longer have a sample from a single population, but a sample that is a mixture drawn from several populations, one for each level of the implicit factor, each with a different survival distribution. Implicit factors can also affect censoring times, by affecting the probability that a subject will be withdrawn from the study or lost to follow-up. For example, younger subjects may tend to move away (and be lost to follow-up) more frequently than older subjects, so that age (an implicit factor) is related with censoring. If the sample under study contains many younger people, the results of the study may be substantially biased because of the different patterns of censoring. This violates the assumption that the censored values and the noncensored values all come from the same survival distribution. Stratification can be used to control for an implicit factor. For example, age groups (such as under 50, 51-60, 61-70 and 71 or older) can be used as strata to control for age. This is similar to using blocking in analysis of variance. The goal is to have each group/stratum combination's subjects have the same survival distribution.
- Lack of independence of censoring:
- If the pattern of censoring is not independent of the survival times, then survival estimates may be too high (if subjects who are more ill tend to be withdrawn from the study), or too low (if subjects who will survive longer tend to drop out of the study and are lost to follow-up). If a loss or withdrawal of one subject could tend to increase the probability of loss or withdrawal of other subjects, this would also lead to lack of independence between censoring and the subjects. The survival tests rely on independence between censoring times and survival times. If independence does not hold, the results may be inaccurate. An implicit factor not accounted for by stratification may lead to a lack of independence between censoring times and observed survival times.
- Many censored values:
- A study may end up with many censored values, from having large numbers of subjects withdrawn or lost to follow-up, or from having the study end while many subjects are still alive. Large numbers of censored values decrease the equivalent number of subjects exposed (at risk) at later times, reducing the effective sample sizes. A high censoring rate may also indicate problems with the study: ending too soon (many subjects still alive at the end of the study), or a pattern in the censoring (many subjects withdrawn at the same time, younger patients being lost to follow-up sooner than older ones, etc.) The survival tests perform better when the censoring is not too heavy, and, in particular, when the pattern of censoring is similar across the different groups.
- Special problems with small sample sizes:
- The Gehan-Breslow, Mantel-Cox, and Tarone-Ware survival tests are all based on chi-square statistics, and thus rely on asymptotic theory. If the sample sizes are too small, this reliance may not be appropriate.
- Special problems with crossing hazard or survival functions:
- The Gehan-Breslow, Mantel-Cox, and Tarone-Ware survival tests are not particularly good at detecting differences in survival functions when the hazard or survival functions are not parallel (that is, if their graphs cross).
- Patterns in plots of data:
- If the assumptions for the censoring and survival distributions are correct, then a plot of either the censored or the noncensored values (or both together) against time should show no particular patterns, and the patterns should be similar across the various groups.
A Kaplan-Meier plot, plots of the life table survival functions, plots of the life table hazard functions for each sample will demonstrate whether the survival functions or hazard functions cross (are non-parallel). If the functions do cross then none of the three tests (Gehan-Breslow, Mantel-Cox, or Tarone-Ware) will be particularly good at detecting differences between the survival functions.