Possible alternatives if your data violate survival test assumptions

Home | StatGuide | Glossary

If the populations from which data for a survival test were sampled violate one or more of the survival test assumptions, the results of the analysis may be incorrect or misleading. For example, if the assumption of independence of censoring times is violated, then the estimates for survival may be biased and unreliable. If there are factors unaccounted for in the analysis that affect survival and/or censoring times, then the survival test may not give useful results. In such cases, stratification of the data or using a parametric method may provide a better analysis.

Although the Mantel-Cox, Gehan-Breslow, and Tarone-Ware are similar tests, they are not identical. In some situations, one of the tests may be preferable to the others. There are also other nonparametric tests for comparing survival functions.

The best cures for some problems--running an experiment longer or doing more aggressive follow-up to avoid a large proportion of censored values, or using a large enough sample size to lessen the problems of small sample sizes--are outside the scope of statistical analysis per se.

Alternative procedures:

  • Stratification:
  • Stratification involves dividing a sample into subsamples based on one or more characteristics of the population. For example, a sample may be stratified by gender. The Gehan-Breslow, Mantel-Cox, and Tarone-Ware tests can all be used with stratified data. If the survival function is different for the different strata, then the characteristic used for stratification may be an implicit factor, and the separate analysis for each individual subsample may be more informative than an analysis of the entire sample. Stratification may also reveal correlations between censoring and strata. A potential drawback with stratification is that one or more of the subsamples may be small in size, leading to problems with the reliability of the test results. Also, the results for each subsample are generalizable to only a part of the sample population.
  • Parametric methods:
  • If a specific survival distribution can be assumed based on previous knowledge, then that assumption can be used to use a survival tests geared to that particular function. A specific functional (parametric) form for the survival distribution function, such as the Weibull distribution or the exponential distribution, or the Cox proportional hazards model, can be fitted to individual data, if a particular distribution makes sense a priori. A Kaplan-Meier or life table plot of the survival function may provide a clue. (If the exponential model is appropriate, the graph of the log of the survival function [or the cumulative hazard function, which is -log(survival function)], against time should look like a straight line passing through the origin. If the Weibull distribution is appropriate, a graph of the log of the log of the survival function [or the log of the cumulative hazard function] against the log of time should look like a straight line.) Elandt-Johnson and Johnson, Cox and Oakes, and Lawless discuss methods of testing parametric survival models. Like nonparametric methods, parametric methods make assumptions about the independence of censoring and survival, and can be affected by implicit factors, the presence of many censored values, or small sample sizes. In addition, parametric methods assume that the designated survival function is the correct one.
  • Choosing a particular nonparametric test:
  • The Mantel-Cox, Gehan-Breslow, and Tarone-Ware tests are quite similar, but differ in the weight they assign each survival value. The Gehan-Breslow test gives more weight to earlier failures (deaths), while the Mantel-Cox test gives equal weight to all failures. The Tarone-Ware tests falls in between. The Mantel-Cox test is more powerful with data following exponential or Weibull survival distributions, and in situations with random but equal censoring. It is sensitive to differences in the right tails of survival distributions (i.e., at later times), and in detecting non-parallel hazard functions (i.e., lack of proportional hazards). The Gehan-Breslow test is more powerful with data from a lognormal survival distribution, but may have low power if there is heavy censoring. The Tarone-Ware test, with its intermediate weighting scheme, is designed to have good power across a wide range of survival functions, although it may not be the most powerful of the three tests in a particular situation.
  • Other nonparametric tests:
  • Other nonparametric tests include the Peto and Peto test, the Tsao-Conover test, and Efron's test. Elandt-Johnson and Johnson, Cox and Oakes, and Lawless discuss these tests. If there are no censored observations (all subjects are followed until failure/death), then no special survival tests are needed. If there is no censoring, a two-sample data set can be analyzed using a Mann-Whitney rank sum test, and a multi-sample data set can be analyzed using a Kruskal-Wallis test. Stratified data with no censoring can be analyzed by using stratification as blocking and performing a Friedman's test or a two-sample paired signed-rank test.

Glossary | StatGuide Home | Home