Analysis of longitudinal data -an application to chronic angle closure glaucoma Pallavi Basu Abhishek Pal Majumder Anirban Basak Priyam Biswas May 29, 2007 Abstract We examine longitudinal data of visual field score and IOP from patients having chronic angle closure glaucoma. In determinig a relationship between field score and IOP , linear regression technique is used . Serious concerns can be raised about the normality assumption. A Box-Cox transformation is hence applied.We try to analyze the assumption that each subfield is equally affected by glaucoma .Resampling technique is used to estimate distribution of test statistic . Predicting Progression was not feasible due to shortage of data.
1
Contents 1 Outlining situation and framing objectives. 1.1 Explaining the variables . . . . . . . . . . . . . . . 1.2 Inclusion criterion . . . . . . . . . . . . . . . . . . 1.3 Categorization by glaucoma stage(by AGIS system) 1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
4 4 4 4 4
2 Brief description of methods of analysis
4
3 Using this dataset 3.1 Dealing with missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 One assumption that can’t be ignored here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Handling of visual acuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 5 5 5
4 Examining relationship between IOP and Visual field score 4.1 Selection of response and Explanatory variables . . . . . . . . . . . 4.2 Independence of left and right eye . . . . . . . . . . . . . . . . . . . 4.2.1 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Formulation of hypothesis and testing . . . . . . . . . . . . 4.2.3 Evaluation of cut-off . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Interpretation of results . . . . . . . . . . . . . . . . . . . . 4.3 Choice of model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Selection of structure of V0 . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 The exponential correlation model . . . . . . . . . . . . . . 4.4.3 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Method of analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Restricted maximum likelihood estimation(REML) . . . . . 4.5.2 Box-Cox transformation . . . . . . . . . . . . . . . . . . . . 4.5.3 Box-Cox transformation and REML . . . . . . . . . . . . . 4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Estimates of the parameters of the model . . . . . . . . . . 4.6.2 Model adequacy . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Interpretation of results . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Testing between nonNONE categories and interpretation of results 4.11 An interesting observation . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
5 5 5 5 5 6 6 6 7 7 7 7 7 7 7 8 8 8 8 8 8 10 10 10 10
5 To evaluate characteristic visual 5.1 Defining baseline field score . . 5.2 Methodology . . . . . . . . . . . 5.3 Category : MILD . . . . . . . . 5.3.1 Hypothesis . . . . . . . 5.3.2 Testing procedure . . . 5.3.3 Evaluation of cut-off . . 5.3.4 Results . . . . . . . . . 5.3.5 Interpretation of results 5.4 Category : MODERATE . . 5.4.1 Hypothesis . . . . . . . 5.4.2 Testing procedure . . . 5.4.3 Evaluation of cut-off . . 5.4.4 Results . . . . . . . . . 5.4.5 Interpretation of results
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
11 11 11 11 11 11 11 13 13 13 13 13 13 13 13
6 To evaluate Progression of visual field damage 6.1 Definition of Progression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Objectives and problem faced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Future scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14 14 14 14
7 Dealing with missing data 7.1 Dropouts and intermittent missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Dealing with intermittent missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Methodology for dropouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15 15 15 15
field defect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
2
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
8 References
16
9 Acknowledgements
17
List of Figures 1 2 3 4 5 6
Scatter plot of left and right IOP at different time points Scatter plot of residual . . . . . . . . . . . . . . . . . . . . Normal probability plot of residuals . . . . . . . . . . . . Empirical cdf for T1 . . . . . . . . . . . . . . . . . . . . . Empirical cdf for T2 . . . . . . . . . . . . . . . . . . . . . Empirical cdf for T . . . . . . . . . . . . . . . . . . . . . .
3
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
6 9 9 12 12 14
1
Outlining situation and framing objectives.
90 patients each having chronic angle closure glaucoma in one or both pairs of eyes were diagonised at 4 different time points within a time span of two years.The purpose of this project is to resolve out issues that are of help to medical experts.
1.1
Explaining the variables
1. Age of the patient at the first time point of visit 2. Gender 3. Visual acuity 4. Intraocular pressure(IOP) 5. Field score(0-20)1 • Nasal (0-2) • Superior hemifield(0-9) • Inferior hemifield(0-9) 6. Kind of treatment provided upto that time point • Drop • Trab • PI • IOL • Needling
1.2
Inclusion criterion
• New and follow up cases of chronic angle closure glaucoma with or without treatment • Absence of any other major eye disease or any other kind of glaucoma • Age group 30-70
1.3
Categorization by glaucoma stage(by AGIS system)
• None(score 0) • Mild(score 1-5) • Moderate(score 6-11) • Severe(score 12-17) • End-stage(score 18-20)
1.4
Objectives
1. To evaluate characteristic visual field defect . 2. To assess the relationship between IOP and visual field damage. 3. To evaluate Progression of visual field damage.
2
Brief description of methods of analysis • Under the null hypothesis it is expected that all the fields - Nasal , Superior hemifield and Inferior hemifield have same effect. A suitable test statistic is used . The distribution of this statistic is estimated using large no. of resamples of equal size from the original data2 .A 100(1 − 0.05)% confidence interval is obtained. Depending on where the value under null lies the null is rejected or not rejected. • Linear regression model taking IOP as response and all other as explanatory variables is used.As the data of a particular individual is correlated over time an autoregressive model of order 1 is selected. Treating the dataset as longitudinal data, linear regression model is fitted. Usual methods of analysis follows thereafter. • Due to a very small no. of data, proper analysis of progression is not feasible. Many more follow ups of the same datasets are required. 1 AGIS 2 in
scoring system is universally used in this work literature called Bootstrap
4
3
Using this dataset
3.1
Dealing with missing data
Missing data will be categorised into dropouts and other.A separate section 3 deals with this problem.However, not much missing data - specially the dropout variety is present in this particular dataset.
3.2
One assumption that can’t be ignored here
Due to unavailability of data , equal time spaced interval for all subsequent visits and for all patients are assumed.Although, this may seem a crude assumption taking into account that a medical expert asks a patient for a next visit at an interval that a particular patient deserves.From this viewpoint, the assumption is partially justified.
3.3
Handling of visual acuity
This has a special kind of data form.A chart4 is used to measure visual acuity.If a person has visual acuity 20/40, at 20 feet from the chart that person can read letters that a person with 20/20 vision could read from 40 feet away.Since a linear model is assumed , the visual acuity score is converted to fraction(20/40 ≡ 1/2).
Examining relationship between IOP and Visual field score
4 4.1
Selection of response and Explanatory variables
IOP is treated as a response and all other variables as explanatory.
4.2
Independence of left and right eye
For each time point,a nonparametric approach to test independence is prefered.Kendall’s test for independence based on signs is used 5 .To avoid the difficulty due to ties the usual cut-off is not used.Fixing an eye, the IOP values for the other eye is permuted over different patients6 to get an estimate of cut-off. 4.2.1
Preliminary Analysis
The Correlation coefficient between left eye and right eye at various time points are examined.Notice that the absolute values decreases with time . Time point 1:ρ = +0.3644 Time point 2:ρ = +0.3429 Time point 3:ρ = −0.1037 Time point 4:ρ = −0.0324 4.2.2
Formulation of hypothesis and testing
For each time point: H0 ≡ τ = 0 and H1 ≡ τ 6= 0 The Kendall sample correlation statistic for X and Y of n independent paired sample is K=
n−1 X
n X
Q((Xi , Yi ), (Xj , Yj ))
i=1 j=i+1
if (d − b)(c − a) > 0 1 where, Q((a,b),(c,d)) = −1 if (d − b)(c − a) < 0 0 if (d − b)(c − a) = 0
0 0 is lower where, kα/2 is upper α/2 tail probability of the null distribution of K and kα/2 RejectH0 if K ≥ kα/2 or K ≤ kα/2 α/2 tail probability of the null distribution of K. 3 refer
section 8 called Snellen chart 5 for detailed theory refer Nonparametric statistical methods(Hollander and Wolfe) 6 termed ’Permutation distribution in literature 4 technically
5
Time point 1
Time point 2
60
60
40
40
20
20
0
0
20
40
0
60
0
20
Time point 3
40
60
Time point 4
40
30
30 20 20 10 10 0
0
20
40
0
60
0
10
20
30
40
50
Figure 1: Scatter plot of left and right IOP at different time points 4.2.3
Evaluation of cut-off
A significant proportion of the data has tied ranks.Due to this drawback usual cut-off tables cannot be refered.Permutation distribution was used to evaluate the cut-off.Keeping IOP values of one of the eyes fixed, the other IOP value was permuted at random.The Kendall’s test statistic was recalculated using this dataset.This procedure is repeated for 10000 times. From the emperical cdf of this statistic , 100(1 − 0.05)% CI was constructed.The null is then rejected or not rejected accordingly as the kendall’s statistic from the original dataset lies within or outside this CI. 4.2.4
Results
The results for the 4 time points are as follows : • At time point 1 the null was rejected • At time point 2 the null was rejected • At time point 3 the null was not rejected • At time point 4 the null was not rejected 4.2.5
Interpretation of results
From the sample correlaion coefficient at various time points mentioned earlier it was already observed that its absolute value decreases with time.Moreover, from the above result at the last two time points , the IOP values for the pair of eyes of an inidividual is found to be uncorrelated. This hints to review the dataset.It is then observed that at the first time point almost 50% of the patients had no medical treatment before.This leads to consider the treatment nonuniformity among the patients.This effect is further continued to the second time point.But,when at the third time point the nonuniformity of being under medication or not disappears and hence for the third and the fourth time points the IOP values for the pair of eyes are uncorrelated.Hence,if treatment is considered as an explanatory variable IOP values of pair of eyes of a patient can be considered uncorrelated.
6
4.3
Choice of model
˜ = β0 + X˜age β1 + Xgender ˜ β2 + Xvisualacuity ˜ ˜ β4 + X˜iol β5 + Xtrab ˜ β6 + X˜pi β7 + Xneedling ˜ YIOP β3 + Xdrop β8 + XM˜ILD β9 + ˜ ˜ ˜ XM ODERAT E β10 + XSEV ERE β11 + XEN DST AGE β12 + ε˜ Since, the IOP is taken as a response variable and all others as explanatory variables, from the earlier conclusions the left and right eyes of an individual are taken as independent experimental units.However, measurements of an unit over different time points cannot be taken uncorrelated. GLM for longitudinal data treats y as a realization of a multivariate Gaussian random vector Y with Y ∼ M V N (Xβ, σ 2 V ) where, V is a block diagonal matrix with nonzero 4 × 4 blocks V0 , each representing the variance matrix for the vector of measurements on a single experimental unit.
4.4
Selection of structure of V0
4.4.1
Motivation
The sample time correlation matrix is: 1.00 0.51 0.40 0.32 0.51 1.00 0.50 0.44 0.40 0.50 1.00 0.51 0.32 0.44 0.51 1.00 Notice correlation between first and second time point is almost same with that of second and third and also with third and fourth time point.Moreover, correlation between first and third time point is close to that of second and fourth time point. 4.4.2
The exponential correlation model
In this model V0 has jk th element, vjk = Cov(Yij , Yik ) of the form vjk = σ 2 ρabs(j−k) Yij denotes the observation of ith experimental unit at jth time point. A justification of above model is to represent the random variables Yij as Yij = µij + Wij , i = 1, . . . , m ,j = 1, . . . , n,where Wij = ρWij−1 + Zij , and Zij s are mutually independent N(0,σ 2 (1 − ρ2 )) where m = no. of experimental units & n = no. of time points for each unit .
4.4.3
Justification
In the exponential correlation model the correlation between jth and kth time points of an individual depends on j and k only through their absolute difference.As, the sample correlation matrix almost satisfies this property ,the exponential correlation model is selected.
4.5
Method of analysis
4.5.1
Restricted maximum likelihood estimation(REML)
In the case of the GLM with dependent errors the REML estimtor is defined as a maximum likelihood estimator based on a linearly transformed set of data Y ∗ = AY such that the distribution of Y ∗ does not depend on β where Y ∼ M V N (Xβ, σ 2 V ) . Calculation7 shows that the REML estimator maximises the loglikelihood equation ˆ 0 σ −2 V −1 (y − X β) ˆ L∗ (σ 2 , V ) = −0.5 log(det(σ 2 V )) − 0.5 log(det(σ −2 X 0 V −1 X)) − 0.5(y − X β) Substituiting,βˆ and σˆ2 in the loglikelihood equation , L∗ (V0 ) = −0.5m(n log(RSS(V0 )) + log(det V0 )) − 0.5 log(det(X 0 V −1 X)) where, and
ˆ 0 ))0 V −1 (y − X β(V ˆ 0 )) RSS(V0 ) = (y − X β(V ˆ 0 ) = (X 0 V −1 X)−1 X 0 V −1 y β(V
To solve V0 method of iteration is used.Subsequently, βˆ and σ ˆ are obtained. 7 Refer
Analysis of longitudinal dataDiggle Chapter 4
7
4.5.2
Box-Cox transformation
Original data on IOP being integer valued it is wise to apply a Box-Cox transformation 8 to ensure normality. ½ λ (y − 1)/λy˙ λ−1 if λ 6= 0 y (λ) = y˙ ln y if λ = 0 where y˙ is the geometric mean of the response variable.Applying this transformation SS E (λ) is calculated for different values of λ and that value of λ is chosen for which SSE (λ) is minimum.
4.5.3
Box-Cox transformation and REML
There being no closed form solution of the REML log-likelihood equation and V 0 being function of ρ only, different values of ρ were used to evaluate the log-likelihood equation and ρˆ is that which maximizes the log-likelihood equation.To implement Box-Cox transformation in this set-up first a λ is fixed for which ρˆ is evaluated and the corresponding SS E is obtained.Now, ˆ is obtained for which SSE (λ) is minimum. varying over λ , λ
4.6
Results
4.6.1
Estimates of the parameters of the model
ρˆ=0.6 ˆ λ=0.36 29.22 −0.01 −1.42 −0.26 −3.06 5.21 ˆ˜ −9.16 β= −1.84 −7.08 2.41 3.27 4.57 3.03 4.6.2
Model adequacy
• The scatter plot of the residuals(Figure 2 ) appears to be random which emphasizes that they do not exihibit any definite pattern. • The normal probability plot of the residuals (Figure 3 ) appears to be in a straight line indicating that the fact errors are indeed normal,emphasizing normality assumption of the response is valid .
4.7
Hypothesis testing H0 ≡ Qβ = 0
,where Q is a full-rank q × p matrix for some q ≤ p. It can be deduced that ˆ L ∼ M V N (Qβ, QRREM ˆ L Q0 ) QβREM ,where
ˆ L = σˆ2 (X 0 V −1ˆ X)−1 RREM REM L
An appropriate test statistic for testing the hypothesis Qβ = 0 would be ˆ L 0 Q0 (QRREM ˆ L ˆ L Q0 )−1 QβREM T = βREM and the approximate null sampling distribution of T is chi-squared on q degrees of freedom. 8 For
details refer Design of experimentsMontogomery
8
Plot of residual 20 15 10 5 0 −5 −10 −15 −20
0
100
200
300
400
500
600
Figure 2: Scatter plot of residual
Normal Probability Plot 0.999 0.997 0.99 0.98
Probability
0.95 0.90 0.75 0.50 0.25 0.10 0.05 0.02 0.01 0.003 0.001 −15
−10
−5
0 Data
5
10
Figure 3: Normal probability plot of residuals
9
15
4.8
Results
It is required to find out whether there is a field category effect on the IOP.Each of the p-values listed below indicate the result for : ˆ H0 ≡ βcorrespondingcategory =0 versus
ˆ H1 ≡ βcorrespondingcategory 6= 0
• P-value of βMˆILD = .029 ˆ • P-value of βM ODERAT E = .006 • P-value of βSEVˆERE ≈ 0 ˆ AGE = .02 • P-value of βEN DST
4.9
Interpretation of results
A p-value less than 0.05 here, indicates that for any unit in that category the expected value of IOP is higher than that if the unit was in the None category. Also, if somebody suffers from glaucoma he/she is bound to have higher IOP than that of a normal person.
4.10
Testing between nonNONE categories and interpretation of results
In similar lines, proper testing was done to check whether there is significant difference among the four categories mild,moderate,severe and end stage. Effect of severe is found to be the most whereas effect of mild is found to be the least amongst the four categories.However,difference of effect of moderate and that of end stage was not significant. Restating the two possible order of increasing effect of visual field categories are, mild < moderate < endstage < severe or, mild < endstage < moderate < severe
4.11
An interesting observation
Normal probability plot of the residuals showed that there were 8 outliers. Retrieving the original data it has been observed that • In cases where outliers have positive residuals trabeculectomy has been done just after the time point at which residual is an outlier • In cases where outliers have negative residuals trabeculectomy has been done just before the time point at which residual is an outlier This emphasizes the fact that trabeculectomy has an enormous effect in reducing the IOP of patients having glaucoma .
10
5
To evaluate characteristic visual field defect
It is of interest to medical experts given a glaucomatous eye at a certain stage which is defined by the category , which subfield has greater damage.The main emphasis in this analysis is given to mild and moderate category as in the other higher(severe and endstage) categories the scores in each subfield already being high enough ,is impossible to compare the degree of damage.
5.1
Defining baseline field score
As glaucoma is a very slow damage process , baseline field score gives an approximate idea of the field score values within a short period of time. To get baseline field score , repeated measurements are taken. This has been done only at the first time point.Enough time span is not covered through the clinical trials to refix the baseline field score.
5.2
Methodology
Visual field scoring method is the same throughout all the subfields..Each subfield has test locations - 6 for nasal , 23 for superior and 23 for inferior.As a result a maximum of 20 score is possible , with a maximum of 2 from the nasal field, and maximum of 9 from each of superior and inferior hemifield.With this scoring methodology, if it is assumed that each of the subfields is affected equally by glaucoma it is expected on the average the subfield scores should be 2N/20 for nasal,9N/20 for inferior and 9N/20 for superior where N is the total field score.Using these fact, from the available dataset using simulation , a CI of the mean(s) is obtained.This gives a way to test H0 vs H1 .
5.3 5.3.1
Category : MILD Hypothesis
H0 ≡Damage of glaucoma in nasal is same as in superior and inferior H1 ≡ Nasal is affected the most Consider, H0Sup ≡ Damage of Glaucoma in Nasal = Damage of Glaucoma in Superior H1Sup ≡ Damage of Glaucoma in Nasal > Damage of Glaucoma in Superior AND H0Inf ≡ Damage of Glaucoma in Nasal = Damage of Glaucoma in Inferior H0Inf ≡ Damage of Glaucoma in Nasal > Damage of Glaucoma in Inferior Clearly, testing the above two hypotheses is equivalent to test the original hypothesis. 5.3.2
Testing procedure
In accordance of H0 if (Xi , Yi , Zi ) are ordered field scores(nasal,superior,inferior) of an unit, whose total field score is N it is expected that their mean should be close to (2N/20, 9N/20, 9N/20) . In other words mean of (X i − 2N/20, Yi − 9N/20, Zi − 9N/20) should be close (0, 0, 0).Here, two tests are performed one,concerning nasal and superior and the other concerning nasal and inferior. Define , X S1N = Xi − 2N/20 − Zi + 9N/20 i3Xi +Yi +Zi =N
S2N =
X
Xi − 2N/20 − Yi + 9N/20
i3Xi +Yi +Zi =N
Now, T1 =
X
S1N /]U nitsM ILD
X
S2N /]U nitsM ILD
N ∈M ild
T2 =
N ∈M ild
From the definitions,it is clear that if the dataset is in accordance with the null T 1 and T2 should be close to zero. 5.3.3
Evaluation of cut-off
Taking the available dataset as population a random sample of same size is drawn from the dataset using SRSWR.This procedure is repeated for 10000 times.For every time point test statistics T 1 and T2) are evaluated.As a result emperical CDF of the two statistics are obtained. From the empirical cdf , 100(1 − α)% CI of T 1 and T2 are obtained.If the value of T1 and T2 evaluated from original dataset falls outside the CI, H0 is rejected in favour of H1 .
11
80 70 60 50 40 30 20 10 0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Figure 4: Empirical cdf for T1
45 40 35 30 25 20 15 10 5 0 0.8
1
1.2
1.4
1.6
! Figure 5: Empirical cdf for T2
12
1.8
2
5.3.4
Results
Two sided CI for T1 is (0.3596, 0.9585) Two sided CI for T2 is (1.2372, 1.7734) Note any of the CI(s) does not contain the value 0 and in both the cases 0 is smaller than the lower cut-off points. In both the cases p-values are almost equal to zero(indeed p value obtained using emperical cdf was actually found to be zero). 5.3.5
Interpretation of results
Since the CI(s) does not contain the value 0 the null hypothesis of equal damage in nasal is same as in superior and inferior is rejected in favour of the alternative. For the mild category, the Nasal subfield is affected most
5.4 5.4.1
Category : MODERATE Hypothesis
H0 ≡ Damage of glaucoma is same in superior and inferior subfield H1 ≡ Inferior is affected more than superior Alternatively, H0 ≡ Damage of glaucoma in superior = Damage of glaucoma in inferior H1 ≡ Damage of glaucoma in inferior > Damage of glaucoma in superior
5.4.2
Testing procedure
In accordance of H0 if (Yi , Zi ) are the field scores of superior and inferior respectively of an unit, whose total field score is N it is expected that their mean should be close to (9N/20, 9N/20) . In other words mean of (Y i − 9N/20, Zi − 9N/20) should be close (0, 0).Define , X SN = (Yi − Zi ) i3Xi +Yi +Zi =N
Now, T =
X
S N /]U nitsM ODERAT E
N ∈M oderate
From the definitions,it is clear that if the dataset is in accordance with the null T should be close to zero. 5.4.3
Evaluation of cut-off
Taking the available dataset as population a random sample of same size is drawn from the dataset using SRSWR.This procedure is repeated for 10000 times.For every time point test statistics T is evaluated.As a result an emperical CDF of the statistics is obtained. From the empirical cdf , a 100(1 − α)% CI of T are obtained.If the value of T evaluated from original dataset falls outside the CI, H0 is rejected in favour of H1 . 5.4.4
Results
Two sided CI of T is (1.1290,2.8710) Note the CI does not contain the value zero and zero is smaller than the lower cut-off point. 5.4.5
Interpretation of results
Since the CI does not contain the value 0 the null hypothesis that damage in inferior is same as in superior is rejected in favour of the alternative. For the moderate category the Inferior subfield is more affected than the Superior subfield
13
350
300
250
200
150
100
50
0
0
0.5
1
1.5
2
2.5
3
3.5
4
Figure 6: Empirical cdf for T
6
To evaluate Progression of visual field damage
6.1
Definition of Progression
Progression is quantified as a field score increase of ≥ 4 in three consecutive reliable visual field tests.
6.2
Objectives and problem faced
Progression is an important measure of how the glaucoma is advancing. It would be very helpful to practitioners if it could be predicted when a progression would take place in an unit. The time span over which this data is collected is less than two years.Only 3% of the units have progressed (in cardinality 4-5).It is not of much worth to try to predict progression in this dataset.More time span is necessary to put forward any comment on progression.Also, from medical perspective it not much interesting to study progression in this dataset.
6.3
Future scope
More follow ups in this study can be used in future. • As the field score is a count data,taking field score as a response and a poisson distribution on the errors the joint distribution of field scores upto the future time point that is needed to be predicted can be obtained.Hence,using the conditional probability on all the data available before the time point needed to be predicted, a range of field score can be given which may be able to determine progression. • Also, the stochastic nature of the field scores can be used.The transition probabilities may be estimated given the dataset and hence a prediction can be obtained.
14
7
Dealing with missing data
In the dataset two different types of missing data were observed. • During the first visit of a few patients very high IOP was observed.Medical expertise says that it is not justified to measure the visual field score of a patient while he/she having a very high IOP because high IOP causes a great deal of variation in visual field score which perturbs to prepare a baseline field score .Proper medicine(s) are suggested to control high IOP value.In the subsequent follow ups while the patient has a reasonably lower IOP visual field score is measured and a baseline field score value is prepared. • In some of the cases it is seen that patients did not come for subsequent follow ups.
7.1
Dropouts and intermittent missing values
• Supose it is intended to take a sequence of measurements Y1 , Y2 , . . . , Yn on a particular unit .Missing vaues are said to be dropouts if whenever Yj is missing ,so are Yk ∀k ≥ j. • All other types of missing values are considered to intermittent missing values.
7.2
Dealing with intermittent missing values
As discussed earlier intermittent missing values occurred because of high IOP values which leads to a large variation in visual field score.It would be of least importance to fill in those missing values as it is well known that field score is bound to show temporary variation even within a very short span of time. So only choice left was to discard those intermittent missing values for the purpose of analysis .
7.3
Methodology for dropouts
Let Y ∗ denote the complete set of measurements which would have been obtained if there were no dropouts,and partition this set into Y ∗ = (Y 0 , Y d ) with Y 0 denoting the measurements actually obtained and Y d denoting the measurements which would have been available if there were no dropouts.Finally, R denote a set of indicator random variables, denoting which elements of Y ∗ fall into Y 0 and which into Y d .Now, a probability model for the missing value mechanism defines the probability distribution of R conditional on Y ∗ = (Y 0 , Y d ) • Dropouts are said to be completely random if R is independent of both Y 0 and Y d . • Dropouts are said to be random if R is independent of Y d . • Dropouts are said to be informative if R is dependent on Y d . Different methods exist in literature9 to test whether dropouts are comletely random,random or informative.However in all methodology large enough data on dropouts required.Thereafter different models follow for different types of dropouts. In the dataset total number of dropout cases were 3 which fails to suffice the minimum number of dropouts required to test for randomness.So analysis was done discarding those dropouts. 9 Refer
Analysis of longitudinal data Diggle
15
8
References • Analysis of longitudinal data (Diggle) • Design and analysis of experiments (Montgomery) • Non-parametric statistical methods (Hollander,Wolfe) • Applied linear statistical models (John Neter) • The elements of statistical learning (Tribshirani)
16
9
Acknowledgements
We are grateful to Dr. Sanchita Ray for her constant help and support. We extend sincere thanks to Prof. Arijit Chakraborty and Prof. Saurabh Ghosh for their fruitful suggestions. Finally , we thank our batchmates and seniors.
17