

Year : 2011  Volume
: 2
 Issue : 1  Page : 6264 

Revisiting survival analysis 

Sanjeev Sarmukaddam
Biostatistics Consultant, MIMH, B.J. Medical College and Sassoon Hospital, Pune  411 001, India
Click here for correspondence address and email
Date of Web Publication  26Jul2011 




How to cite this article: Sarmukaddam S. Revisiting survival analysis. Int J Ayurveda Res 2011;2:624 
Sir,
The methods of survival analysis are required to analyze duration (timetoevent) data but their use is restricted possibly due to lack of awareness and the intricacies involved. These methods (including clinical life table, KaplanMeier survival estimation, Logrank test, Survival analysis with covariates like Cox proportional hazards models or accelerated failure time models, etc.) deal with any duration from a defined start to a specific endpoint and not only "death". The article titled "Understanding survival analysis: KaplanMeier estimate" (vol. 1, issue 4, pp 274278) is very good. Although understanding of KaplanMeier procedure and subsequent Logrank test is difficult, they must be known to clinicians and other researchers because they are very useful techniques as they (all in this category) allow 'serial intake' and 'serial dropout'. It is important to understand the concepts. Authors have done it excellently. It is the 'concept' and not 'calculations' which should be given importance. These days good software is available for calculations and so why not use them. One very good/useful software in public domain (can be downloaded free of cost from http:/www.cdc.gov/epo/epiinfo/htm) is WHOCDC's EPIINFO.
However, it is necessary to understand 'how these values are arrived at' and for that the calculations should be done correctly. In this article, first two tables [Table 1] and [Table 2] are correct. One suggestion is, adding a column showing number of 'censored' (indicated with '*') will help understand 'No. at risk' (i.e. numbers in column with heading 'Live at the start of the day') as No. at risk at start of next point = No. at risk at start of previous point  [No. died at/after previous point + No. censored at/after previous point]. Both these tables are recalculated.
Life table method of survival analysis is generally used for groupedinterval censored data where the exact duration is not known but only the interval is known (or known but grouped for convenience). This method of data collection is generally adopted when the number of subjects is really large and periodic visits to the system are more costeffective than continuous observations. Usual lifetable method assumes that the events occur uniformly over the interval for subjects dropping out (i.e. censored) in that interval. Probability of survival for each interval is obtained conditioned on surviving the preceding interval. Survival function is obtained by multiplication of the successive conditional probabilities. Plot of survival function against the endpoint of the time interval, when joined by lines, is the survival curve.
The idea behind the Logrank test for the comparison of two life tables (or survival curves by KaplanMeier method) is simple: If there were no differences between the two groups, the total deaths occurring any time should split between the two groups in the ratio of the numbers at risk in the two groups at that time. For example, if the numbers at risk in the first and second groups (in some fixed interval / fixed time point) were 70 and 30, respectively, and 10 deaths occurred in that interval or at that point in time, in both groups together, we would expect 10 × (70 / 70+30) = 7 deaths to have occurred in the first group, and 10 × (30 / 70+30) = 3 deaths to have occurred in the second group.
A similar calculation can be made at each time of death (in either group). By adding together (sum) for first group the results of all such calculations, we obtain what is called as the 'extent of exposure' which represents the 'expected number of deaths' in first group if the two groups had the same distribution of survival time (and denoted as E_{1} ). This 'extent of exposure' (denoted as E_{2} ) for second group can be obtained in the same way or for each time point by subtracting that number (expected deaths at that time point) from total observed deaths at that time point and then summing.
Let O_{1} and O_{2} denote the actual/observed number of deaths in the two groups, respectively. Since O_{1} + O_{2} = E_{1} + E_{2} , E_{2} can even be calculated just by subtraction. The discrepancy between the O's and E's is measured by {[O_{1}  E_{1} ] ^{2} / E_{1}} + {[O_{2}  E_{2} ] ^{2} / E_{2}} and is called as 'LogRank' test statistic which follows χ^{2} distribution with 1 degreeoffreedom (this test is by MentelCox, other three being Breslow's generalized Wilcoxon, TaroneWare, and PetoPrentice). [Table 3] of that article displays calculations of logrank. There are some errors in that table (for example, for time of event = 27 'Live at start of the day' i.e. N = 41 as five deaths have occurred together without any 'censored' before that point and therefore N = 46  5 = 41 and not 40 as shown). Fortunately these errors have not changed the conclusion, however, borderline significance may change it. This table is also recalculated and displayed below:
The last two columns in the table given in this article are numbers and not probabilities as said. Note that few values are above one in these column(s). Column(s) should have heading like 'Expected number of deaths in group ….'.
'Logrank' test statistic yielded by EPIINFO is 0.3789 and P=0.5382. These figures are confirmed by larger (but priced) software BioMedical Data Processing (BMDP). When software is available, why take burden of performing these complicated calculations?
Correspondence Address: Sanjeev Sarmukaddam Biostatistics Consultant, MIMH, B.J. Medical College and Sassoon Hospital, Pune  411 001 India
DOI: 10.4103/09747788.83178 PMID: 21897649
[Table 1], [Table 2], [Table 3] 











Article Access Statistics   Viewed  1424   Printed  195   Emailed  0   PDF Downloaded  431   Comments  [Add]  

