FIS HERIES
STATIS TIC AL
AP PLICA TIONS
USE OF REGRESSION MODELS IN BIOMETRIC STUDIES [worked out problems]
T.M.SANKARAN 2009
IN
USE OF REGRESSION MODELS IN BIOMETRIC STUDIES T.M.SANKARAN INTRODUCTION The knowledge about the type of relationship existing between different body parts helps a lot in the study of vital statistics of fishes. This can be utilized in future for purposes like prediction and comparison. The practical method adopted for this is the construction of a mathematical model taking into account the relative variations in the body parts. The selection of such models, consequently, demands very keen observation of the situation. Statistical tools, such as fitting Regression models and analyzing are found to be very useful here. In this paper the author demonstrates the application of linear regression models in predicting and evaluates its efficiency. Also the effect of introducing additional morphometric characters on prediction is tried with the help of Multiple regression model. A good model is one that is simple, but gives a good fit over a wide range and best test of goodness of the model, in practice, is its usefulness in prediction. The prediction includes both interpolation and extrapolation. This necessitates the selection of the correct regression model from among those available. In the simple case the raw material in possession of the investigator is a set of pairs of values of two variables, x and y. Using them, two regression lines, namely, the regression of y on x and that of x on y can be found out. Before choosing either of the two, it is necessary to know which of the variables the model is intended to predict. The regression of y on x gives a better estimate of y, and that of x on y gives a better estimate of x. this problem has been dealt with by Winsor (1946), and Eisenhart (1939) gives more details on the modification required when the sample is small. SIMPLE AND MULTIPLE MODELS The simple linear regression model is found quite suitable in depicting the relationship between any two morphometric characters in a species of fish. But, very often, the efficiency of the model in predicting when compared to other models is not taken into consideration. Here an attempt is made to see whether the efficiency of the model in prediction increases with the introduction of additional characters. A sample of 42 fish belonging to Carangid (Selarkalla) population was taken for study. Total length (x1), head length (x2), the distance between the tip of the snout and the base of the caudal peduncle (x3) and the depth (x4) were considered. These characters were defined as in Sugunan and Sankaran (1972). A preliminary plot revealed a linear relationship between x 1 and each of other x's. The matrix of correlation coefficients between the variables is given in Table 1.
TABLE 1 – Matrix of correlation coefficients
x1
x1
x2
x3
x4
1
0.9874
0.9965
0.9915
1
0.9807
0.9821
1
0.9892
x2 x3 x4
1
the second order partial correlation coefficients are given in Table 2, where r12.34 is the standard notation for the partial correlation coefficient between x1 and x2, ,partialling out the effects of x3 and x4 (Yule and Kendall, 1950). TABLE 2 - Partial Correlation Coefficients r12.34
0.5345
r13.24
0.8249
r14.23 0.2996 The correlation coefficients are very high. Partial correlation coefficients r12.34 and r13.24 are significant (Prob.< 0.01), whereas r14.23 is not (Prob >0.05). The implication is that x1 is influenced by x2 and x3 than x4. The linear regressions of x1 on x2 and x1 on x3 are obtained as x1 = -1.7277 + 5.1163 x2
(1)
x1 = 0.3534 + 1.4364 x3
(2)
The significance of these regressions are tested using analysis of variance method and results are given in Tables 3 and 4.
TABLE 3 – Significance of linear regression (1) Source of variation
D.F.
Sum of squares
Mean Sum of squares
Linear regression
1
332.97
332.97
5.01
0.12
Deviation regression Total
from 40 41
337.98
TABLE 4 - Significance of linear regression (2)
Source of variation
D.F.
Sum of squares
Mean Sum of squares
Linear regression
1
335.63
335.63
2.35
0.0588
Deviation regression Total
from 40 41
337.98
These results show that the linear regressions are highly significant. In fact the first regression explains about 98.5% of the total variation, while the second one explains 99.3%. Even though these simple models explain almost all variations present, it will be interesting to see the combined effect of the three characters considered, on the total length. Evidently the contribution of each of them will be varying. An idea about the proportionate contributions can be got from the partial regression coefficients. A comparison between them will help understand whether they differ significantly. The linear multiple regression of x1 on x2, x3 and x4 was obtained as x1 = - 0.2703 + 1.1398 x2 + 0.9330 x3 + 0.4793 x4.
(3)
By an analysis of variance test the usefulness of these variables in prediction was found to be significant. The results of the test is given in Table 5. TABLE 5 – Significance of the linear regression (3) Source of variation
D.F.
Sum of squares
Mean sum of squares
Regression
3
336.37
112.12
1.61
0.04
Deviation regression Total
from 38 41
337.98
Here the regression explains more than 99.5 % of the total variation. Also a similar test was carried out to see whether all the three partial regression coefficients in (3) could be considered as equal. Under the hypothesis that the three are equal, a common estimate of the population regression coefficient was obtained, proceeding in the usual lines (Rao, 1952), as b = 0.7505 . Table 6 gives the result of the test which shows a significant difference among the partial regression coefficients.
TABLE 6 – Testing equality of the Partial Regression Coefficients.
Source of variation
D.F.
Sum of squares
Mean sum of squares
Deviation from equality 2
42.98
21.49
Residual
38
1.61
0.04
Total
40
44.59
In practice, fish with damaged tail-ends often occur in collections, which prevents the accurate measuring of total length. So, it is natural to find the estimates of the total length resorting to other characters which influence the total length. The present study shows that, in Carangid species, the total length can be predicted using head length, the distance between the tip of the snout and the base of the caudal peduncle and depth. From the results given in Table 6 it is seen that the contribution made by each of the three characters is different. So, a selection from among them has to be done. The values of the partial correlation coefficients and the results of the significance tests of the simple models give a clue to the effect. The distance between the tip of the snout and the base of the caudal peduncle (x3) evidently gives best estimates of the total length. PREDICTING WEIGHT For a given length ( l ) of a fish, the weight ( w ) is estimated using the exponential relationship, w=alb (4) where a and b are constants. Here the attempt is to see whether this model can be modified as w = a l b 1 g b2 (5) by introducing another morphometric character girth (g) and thus can improve the production formula. A sample of 62 fish belonging to Sardinella longiceps species was taken for the study. The total length (in cms.), the weight (in gms.) were measured. For measuring the girth, thread was used and a uniform pattern was followed by measuring around the region where the depth was maximum. The logarithms of the measurements were used such that the transformed variables W = log w and L = log l showed a linear relationship as W = - 1.8302 + 2.7564 L . (6) The significance of this fitted line was tested and result given in Table 7. TABLE 7 – Significance of Regression line (6) Source of variation D.F. Sum of squares Mean square Regression
1
0.4460
0.4460
Deviation from regression 60
0.1325
0.0023
Total
0.5785
61
sum
of
The matrix of the correlation coefficients between the logarithms of the three variables is given in Table 8. The first order partial correlation coefficients are also given.
TABLE 8 – Matrix of correlation coefficients and Partial correlation coefficients
W
W
L
G
1
0.8789
0.8787
rWL.G = 0.5632
1
0.8293
rWG..L = 0.5621
L G
1
High correlation is found between the variables. The significant values of partial correlation coefficients show the high rate of influence of both the characters on weight. So, the introduction of G may help improve the prediction model. Taking logarithms on both sides, (5) becomes a linear model. This model was fitted for the data and the resulting equation was of the form: W = -1.0761 + 0.8568 L + 1.8072 G (7) where G = log g. The goodness of this fit and the improvement made by the additional character are tested and result presented in Table 9. TABLE 9 - Significance of model (7) and that of added character. Source of variation
D.F.
Sum of squares
Mean sum of squares
Regression on length and 2 girth
0.5420
0.2710
Regression on length alone
1
0.4460
Added reduction due to girth 1
0.0960
0.0960
Residual
59
0.0365
0.0006
Total
61
0.5785
Contributions made by both the variables are significant, and the introduction of girth has improved the value of the formula to a great extent. While the model with length alone accounts only for 77.1 % of the total variation, the multiple model accounts for about 93.7 %. This study reveals that the weight in Sardinella longiceps can be predicted more accurately by by using length and girth together. Based on the present data, the estimating equation using length alone is w = 0.0148 l 2.7564 , (8)
and that by using both length and girth is w = 0.0839 l 0.8568 g 1.8072 ,
(9)
The model (9) may further be modified by introducing suitable additional character and the efficiency in prediction improved. But, since this model itself accounts for 93.7% of the total variation, the additional work involved in the introduction of one more character may not be justified. SUMMARY The distance between the tip of the snout and the base of the caudal peduncle predicts the total length almost accurately in Carangid species. But for knowing the relative contribution of different characters towards the prediction of total length, it is not advisable to go for multiple regression model. The introduction of an additional morphometric character, girth, increases the efficiency of the weight prediction formula, in Sardinella longiceps species. The modified form is given by w = a l b1 g b2 where w, l, g are defined in the text. REFERENCES 1. Eisenhart C. (1939): “The interpretation of certain Regression methods and their use in Biological and Industrial Research”. Ann. Math. Stat.,10. pp. 162 – 186. 2. Rao, C. Radhakrishna (1952): 'Advanced Statistical Methods in Biometric Research', John Wiley & Sons, Inc., New York. 3. Sugunan, V.V. & T.M.Sankaran (1972): “Analysis of Morphometric Characters of Selarkalla (Cuvier)”. Indian Journal of Marine Sciences, 1 (1), pp. 92 – 93. 4. Winsor, C.P. (1946): “Which Regression ?” Biometrics Bulletin, 2, pp. 101 – 109. 5.Yule, G.V. And Kendall, M.G.(1950): 'Introduction to the Theory of Statistics'. Charles Griffin & Co., London. =========0=========