Data Collection And Analysis In Obstetrics And Gynecology

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Collection And Analysis In Obstetrics And Gynecology as PDF for free.

More details

  • Words: 5,484
  • Pages: 47
Data Collection and Analysis In Obstetrics and Gynecology

BY S. M. Ogbonmwan Department of Mathematics University of Benin Benin City, Nigeria 10/14/08

1

Data Collection and Analysis In Obstetrics and Gynecology 





What is Data? Measurable characteristics of a sampling unit (or subject) of a population, that yields information about the population. Type of Data: There are mainly two types. viz: Broadly, data can either be Categorical or Numerical Categorical Data: The simplest type of observation that is made on a subject that comes to the clinic is the allocation (the classification) of the subject to one of only two categories that relate to the presence or absence of some attributes.

10/14/08

2

Examples: Pregnant/Not Pregnant Married/Single Hypertensive/Normotensive Diabetic/Non-Diabetic. More than two categories:  Marital Status: Married/Single/Divorced/Separated  Blood group: A/B/AB/O  Degree of pain: Minimal/Moderate/severe/unbearable – Numerical Data:  There are two main types viz: Discrete and continuous.  Discrete Data:  Arise when observations take certain numerical values through counting.  Examples:  Number of children, number of visits to ANC in a year, number of ectopic heart beats in 24 hours, number of threatened abortions in the last two years, etc. 

10/14/08

3

      

Continuous Data: Usually obtained by some form of measurements. Examples: Height, weight, age, body temperature, blood pressure, serum cholesterol, etc. Other types of Data: Censored Data: In many cases of life data, one could find that all of the subjects in the sample may not have failed. That is, in some cases the event of interest may not be observed or the exact timesto-failure of some of the subjects may not be known. These types of data are commonly called censored data and they are of three types; viz: right censored (or suspended), interval censored and left censored data.

10/14/08

4

 





 

Right Censored (Suspended): These are the cases (of life data) composed of subjects that did not fail. Example: 8 breast cancer cases, 5, failed at the end of experiment then the remaining 3 would be regarded as suspended (right censored) data. Interval Censored Data: Interval censored data results where there is uncertainty as to the exact times the units failed within an interval. Example: Assuming units are being inspected every 6 hours say at 6:00 am, 12:00 noon, 6:00 pm and so on. Assuming 8 were surviving at 6:00 am and when inspected at 12:00 noon only 7 were surviving. Then you can only say that one failed between 6:00 am and 12:00 noon. The exact time when that one failed would not be known. Left Censored Data: In this case, failure time is only known to be before a certain time. Example: Suppose an experiment scheduled for inspection after 12 hours is found to have failed before inspection. Thus, what is known is that the experiment failed sometime before 12 hours (i.e. between 0 and 12 hours) but nor exactly when.

10/14/08

5

   

 



Variable: A Variable is any attribute, Phenomenon or event that can have different values. A variable can either be quantitative of qualitative A quantitative variable describes a characteristic in terms of a numerical value. The value may vary from subject to subject or from time to time in the same subject. The value is expressed in units of measurement. Examples: Height in meters, Blood pressure in mm/Hg, weight in kilograms, etc. A qualitative variable describes the attribute of a characteristic (by classifying it into categories to which the subject either belongs or does not belong). Examples: State of origin, Tribe or Ethnic group, etc.

10/14/08

6

Types of Variables: Two types: Continuous and Discrete.  Continuous Variable:  A Variable with potentially infinite number of possible values in any interval. It can assume either integral or fractional values and can be measured to different levels of accuracy. Continuous variable is realized through actual measurements.  Examples: Weight of babies delivered in a Health facility could be 314, 2.98. 2.94, 3.10 kg.  Discrete Variable:  Can have a number of values in any interval. The values are invariably whole numbers. They are integers. Discrete variable is usually realized through counting.  Examples: Number of children in a family, number of clinic in a community, number of children delivered within a given period in a Teaching Hospital, etc. 10/14/08 7 

  

 

Collection of Data (In O & G) Sources of Data: There are two main sources of data in Healthcare delivery including O & G. these are regular or routine system and Ad Hoc systems. Regular or Routine Data Collection Systems: A regular or routine data collection system usually consists of established procedures for collecting data (in the clinics) as they become available. This could be at national, sub-national or institutional levels. This system provides a rough indication of the frequency of occurrence of diseases and their descriptive epidemiology, which serves as leads concerning disease etiology. The sources of data in this system include information from: hospital (medical) records, autopsy reports, physician records, etc.

10/14/08

8

                

Example: (Part of Patient’s Form)SystDiast Patient’s Name: ----------------------------------------Patient’s Number: Data of Registration: Data of Birth: Sex (1= male, 2 = female): Marital status: Religion: Ethnic group/Tribe: Height (m): Weight (kg) Diastolict Systolic Blood pressure (mm Hg): Number of Pregnancies: Number of Deliveries: Number of Children Alive: Number of Children Dead: Number of Abortions:

10/14/08

9

– The advantage of this system of data collection is that it guarantees availability of data in every specific area of healthcare delivery.

 

   

Ad Hoc Data Collection Systems:

Ad hoc data collection is usually in the form of a (Research) survey to gather information that may not be available on a regular basis. This at times may include special investigative studies or it could just be the collection of additional information as part of the routine data collection. This system gives a large coverage of the population.

Examples:

An investigation of the effects of FGM on complications during delivery An investigation of breastfeeding practices among women who registered a birth in the previous year. A study to investigate whether the use of hormonal contraceptives affect the fertility status of the users.

– The Ad hoc data collection systems could be extensive, intensive and expensive. However, an advantage of the Ad hoc system is that it provides accurate and reliable data (when well conducted) in response to the specific needs of the users. An important tool for ad hoc data collection system is the use of adequate questionnaire.

10/14/08

10

          

  

Good Questionnaire Design. Guidelines for Designing a Questionnaire

·Use simple language Avoid long complicated questions (avoid double negatives) Be unambiguous – be clear and simple Do not ask general questions if you want specific answers. Ask only valid questions. Do not ask leading questions Avoid hypothetical questions about situations outside the people’s direct experience Be careful with embarrassing questions. Do not make it too difficult for the respondents. Use minimum number of questions. Pre-coded questions enable you to analyse your replies easily by the computer, but they may force people to give wrong answers. People tend to choose the first response. Ask easy questions first and difficult questions last. Pre-test your questionnaires.

10/14/08

11

– Steps in the Planning of a Survey

    

    

Step 1 – Preparation of a detailed written statement of the objectives of the survey. Step 2 – Determination of the items of information required and methods of collection. Step 3 – Definition of the reference population on which information is to be sought. Step 4 – decision on whether the reference population is to be studied as a whole or in part (sample). Step 5 – Determination of the number of units in the population to be selected for study during the survey (sample size). Step 6 – Decision on how respondents will be selected from the population (sampling method). Step 7 – Design, testing and validation of the questionnaires on which observations will be recorded. Step 8 – Selection and training of enumerators (interviewers). Step 9 – Collection of data. Step 10 – Preparation for data analysis.

10/14/08

12

 Analysis



 

of Data:

The general methodology for the analysis of data (in O & G) is of two types; viz: Descriptive and Inferential.

Descriptive Statistics Approach for Data Analysis: Descriptive Statistics:

Descriptive statistics are the statistical tools for the organization and summarization of data. They describe a set of data which eventually provides a basis for a generalization about a population when only a sample is observed. Descriptive statistics point up a characteristic of the population being studied. Descriptive statistics simply summarize a mass of data into a few simple ideas. In data analysis, descriptive statistics are presented in tables which provides summary statistics for continuous, numeric variables. The summary statistics includes:  measures of central tendency such as mean, median and mode  measures of dispersion (spread of the distribution) such as range and standard deviation (including variance of the distribution)  measures of distribution such as skewness and kurtosis 10/14/08 13 which indicate how much a distribution varies from a 

 In

summary, descriptive statistics described a set of data which will provide a basis for a generalization about a population when a sample is observed. Thus, descriptive statistics point up a characteristic of the population being studied. Descriptive statistics summarize a mass of data into a few simple ideas.

10/14/08

14

 

 

 



Organization and Presentation of data

Useful information is usually not immediately evident from a mass of raw data. Collected data need to be organized in such a way that the information they contain may clearly reveal the patterns of variation in the distribution. Organization of data gives vent to the understanding of the structures and characteristics of the data. Data are usually presented in either tabular or diagrammatic forms.

Tabular Presentation

This is the presentation of data in tables so as to organize them into a compact and readily comprehensible form. For example, a frequency distribution table gives the number of observations at different values or classes of the variable. Tabular presentation could be handled as:

(a) Single variable frequencies:

For a qualitative variable (such as the distribution of the state of origin of 100-women who visited the ANC in the last one year). For a large data set of a quantitative variable requiring grouping of the data into classes (such as the distribution of the weight of new born babies in a Teaching Hospital)

10/14/08

15

 

(b) Cross-tabulation: Two dimensional tables, in which two variables are cross-tabulated (such

as the cross-classification of weight of babies at birth and economic status of their parents). Three-dimensional tables, in which three variables are cross-classified (such as outcome of treatment by sex and by age group).



– Diagrammatic presentation

Diagrammatic presentation is the use of a diagram to show the distribution of data. The methods of diagrammatic presentation of data are:



 

A circle is divided into sectors with areas proportional to the frequencies or the relative frequencies of the categories of the variable.



 

Qualitative or Categorical Data Pie Charts Bar Charts

The bars are constructed to show the frequency or relative frequency for each category of the attribute. The bars are usually equal in width. It is important that the vertical scale should start at zero; otherwise the heights of the bars will not be proportional to the frequencies.

10/14/08

16

  

 

 

(b) Quantitative data Frequency Histograms The chosen class intervals should not overlap and should cover the full range of the data. The area of each bar (not just its height) should be proportional to the frequency. Unequal class intervals are taken into account by the areas of the bars. Frequency Polygons (Line Charts) This is constructed by joining the midpoints of the top of each bar of a histogram. This chart provides ease of visual comparison between two or more distributions drawn on the same chart. Cumulative frequency polygons and cumulative frequency charts (Ogives). This is the chart in which the cumulative frequencies are plotted against the upper tabulated limit for each class. In principle, the ogvie can be used to estimate, by interpolation, the frequency of occurrence of a value of the variable less than or equal to a specified value.

10/14/08

17



Measures of Location:



One of the first statistics usually computed for a set of data is a measure of central tendency such as the Mean, Median and

 

the Mode. The Mean:

Most frequently used in data analysis. The Mean may be considered as the center of gravity of the distribution. n

Mean:

X=

∑ xi i= 1

n

Raw data

k

X=

∑ f i xi i= 1 k

∑ fi

Group data

i= 1

10/14/08

18

The Median: It is the point in the distribution with 50% of the measures of scores on each side of it. That is, it is the midpoint of the distribution for even number of observations; the median occupies the point between

n n+2 th and th 2 2

positions when the values of the observations are arranged in order of magnitude. When the number of observations is odd, the Median occupies the

n +1 th position in the ordered arrangements. For the grouped data case, the 2

Median is estimated by using the expression:

n  −Cf 2 Median = L1 + fi

  C

i

Where

L1 = lower class boundary of the median class n=

number of observations C f = Cumulative frequency of the class just before the median class

Ci = Median class interval f i = frequency of the median class 10/14/08

19

The Mode: This is simply the value that occurs most frequently in the distribution. For the grouped frequency case, the Mode is estimated by using the expression:

Mode = L1 +

( f − fa ) × c ( f − f a ) + ( f − fb )

Where

L1 = lower class boundary of the modal class f=

modal frequency f a = Frequency of the class after the modal class

f b = Frequency of the class before the modal class C=

10/14/08

Modal class interval

20

 Measure of Variability  The Range:

(Measure of Spread)

The simplest way to describe the spread of a set of data is to quote the lowest and highest values. The difference between the highest and lowest values given the range of the distribution. It is however not satisfactory measure. It is therefore not widely used.

 Variance: This is the mean of the squared differences (deviations) between the mean and each observed value. It is mathematically expressed as:

∑ ( xi − X ) n

Variance,

S2

=

i =1

(

 k  ∑ f i xi − X =  i =1 k   ∑ fi − 1 i =1 

2

n −1

)

2

     

 Standard Deviation: The square root of the variance

∑ ( xi − X ) n

Standard deviation 10/14/08

S

=

i =1

n −1

2

21

 

 

Inferential Statistics: Usually when samples are studied, the investigator will be interested in going beyond the sample and would want to make inference about the population from which the sample was drawn. Thus, from the knowledge of the descriptive statistics such as the mean and variance from sample values, inferences about the same traits in the population are made. The use of inferential statistics is basic to Medical research. The exploits in inferential statistics include: Confidence Interval, Test of hypothesis, contingency Tables, Nonparametric Tests, Regression and Correlation analysis, ANOVA, etc. Confidence Interval: Confidence Interval combines the features of estimates from a sample with known properties of the normal distribution to get an idea about the uncertainty associated with a single sample estimate of the population parameter. Confidence interval gives a range of values for which one can be confident would include the true value.

10/14/08

22

C I for a Single Mean ( µ ) The 100 (1 − α )% C I = X ± Z (α ) . 2

OR

X ± t n−1 (α 2).

σ n

s n

C I for the Difference of Two Means ( µ 1 − µ 2 )

σ 12 σ 22 The 100 (1 − α )% C I = X − X 2 ± Z (α ) + 2 n1 n2 OR

where

C I = X − X 2 ± t n1+ n2 − 2 .(α 2) S p

1 1 + , n1 n2

(n1 − 1) S12 + ( n2 − 1) S 22 Sp = n1 + n2 − 2

10/14/08

23

C I for the Single Proportion (P) The 100(1 − α )% C I = P ± Z (α

)2 .

p0 q 0 n

Difference of Two Proportions ( Ρ1 − Ρ2 ) The 100(1 − α )% C I = Ρ 1 − Ρ 2 ± Z (α ) . 2

10/14/08

(

) (

Ρ 1− Ρ Ρ 1− Ρ + n1 n2

)

24

 





Test of Statistical Significance Tests of significance are standard statistical procedures for drawing inferences from sample estimates about unknown population parameters In medical research, tests of significance allow us to decide whether the sample estimates, or differences between estimates are within their normal biological variation, commonly called variability due to chance. Procedure for testing statistical hypothesis – – – –

State the null hypothesis State the alternative hypothesis (indicate 1 – tail or 2 – tail) State the level of significance (explain type 2 errors) Choose the test statistic (explain parametric and nonparametric tests) – Compute the numerical value of the statistic from the observed data – Compare the calculated value of test statistic with tabulated values in appropriate standard distribution tables at a specified probability level of significance – Decide whether or not to reject the null hypothesis according to the p-value

10/14/08

25

Test for Single Mean: Hypotheses Case 1 (right tail)

H 0 : µ = µ0

H 1 : µ = µ1 > µ0

Test Statistic

Z=

X − µ0 σ n OR

X − µ0 S n X − µ0 Z= σ n

Decision

Reject if Z > Z (α ) Reject if T > T (α )

T= Case 2 (left tail)

H 0 : µ = µ0

H 1 : µ = µ1 < µ0

OR

X − µ0 σ n X − µ0 Z= σ n

Reject H0 if Z > Z (α ) Reject if T > t n(α−1)

T= Case 3 (two tailed)

H 0 : µ = µ0

H 1 : µ = µ1 ≠ µ 0

OR

T= 10/14/08

X − µ0 S n

Reject H0 if Z > Z (α

2

)

Reject H0 if T > T (α )

2

26

Test for Difference of Two Means:

H0 : µ 1 = µ 2 H 1 : (a) µ 1 > µ 2 (b) µ 1 < µ 2 (c) µ 1 ≠ µ 2

Test statistics are created along the lines given for the test for single mean, and the decisions follow accordingly.  Finally, Tests of proportions are handled by the use of Z~ test for large samples or by the use of t – test for small samples. 

10/14/08

27

 Contingency Tables: Test for Associations between two categorical variables is by the use of the χ ~2 distribution The test statistic is: n

( 0 i − ei ) 2

i =1

ei

χ2 = ∑

and the null hypothesis of no association is rejected

whenever the calculated value of χ 2 > χ υ2 (α ) where χ υ2 (α ) is the value of the chi-squared distribution with υ degrees of freedom at α -level of significance.

10/14/08

28

 

   

Nonparametric Tests:

In the tests for means, proportions and association, there is a fundamental assumption of the knowledge of the distribution of the test statistics and indeed the knowledge of the functional form of the distribution of the variables under consideration. When there is no knowledge of the functional form of the basic density function of the variables, then it is usually good to resort to the Nonparametric test such as: The Wilcoxon (Rank sum) test The Mann-Whitney U – test The Median test The Sign test

 The Wilcoxon Test (Two Samples) Test statistic:

n

SW = ∑ R j where Rj, j = 1, 2, …, n are the ranks of the X S j =1

Reject H0 when Z = 10/14/08

m( N + 1) 2 > Z (α ) 2 mn( N + 1) 12

SW −

29

The Mann-Whitney U – Test (Two Samples) Test statistic:

U = SW −

m(m + 1) 2

Where SW is as in Wilcoxon test Reject H0 when Z = 



 



mn 2 > Z (α ) 2 mn( N + 1) 12 U=

Regression and Correlation:

A high proportion of data analyses are carried out to study the relationship between two variables. The purposes of such analysis are: To assess whether the two variables are associated. To enable the value of one variable to be predicted from any known value of the other variable To assess the amount of agreement between the values of the two variables.

10/14/08

30

 Correlation: Correlation is the method of analysis used when studying the measure of relationship (association) between two continuous variables – e.g. – percentage of body fat and age or normal adults. The actual measure of the association is done by calculating the correlation coefficient r. The correlation coefficient r can take any value between –1 and +1. The Pearson’s measure of correlation coefficient is expressed as:

∑ ( X i − X )(Yi − Y ) n

r=

i =1

∑(X n

i =1

i

−X

) ∑ (Y − Y ) 2 n

i =1

2

i

while the Spearman rank correlation coefficient is expressed as: n

rs = 1 − 10/14/08

6∑ d i2 i =1 2

n(n + 1) 31

 Regression: Linear regression describe the linear relationship between variables and can be used to predict the value of one variable for an individual when we only known the other variable. Consider a simple case of: Fetal weight (kg) and Nonpregnant Maternal weight. Here we consider the fetal weight as the response (or outcome) variable while the maternal weight is the predictor variable. These are also called the dependent and independent variables respectively. The linear relationship between the dependent (Y) and the independent (X) variables is given as:

Y = α + βX The estimate of α and β are:

∑ ( X i − X )(Yi − Y ) n



β=

i =1

∑( X n

i =1



i

−X

)

2



α =Y −β X ∧



Hence, Y = α+ β X which is used for prediction. 10/14/08

32

Multiple Regression:

Y = α + β 1 X 1 + β 2 X 2 + ... + β p X p e.g. – obesity, smoking and snoring

YSnoring = α + β 1 X Smoking + β 2 X Obesity Logistic Regression: Good for prediction for dichotomous variables.

10/14/08

33

  

Simple Experimental Design One Way ANOVA

In research work or in the handling of patients, comparisons are often made between several sets of data collected from basically similar populations, such as treatments given to some groups of patients having the same ailment except that different drugs were used for each group. Generally, any experiment denoted to compare several treatments (source of variation) must embody two important principles of experimental design viz: (i) Replication and (ii) Randomization. The simplest experimental design which incorporates those two principles is the completely Randomized design or simply also called the one-way classification or the one-way analysis of variance involving one factor appearing at different levels.

The null hypothesis we would wish to test is: H0: µ1 = µ 2 = ... = µ k = µ versus H1: At least one of the µ k differs from µ . Test for One-Way Classification 1.

State H0 and H1 H0: µ1 = µ 2 = ... = µ k

2.

Choose the level of significance, α

3.

Complete the ANOVA table

10/14/08

34

The null hypothesis we would wish to test is: H0: µ 1 = µ 2 = ... = µ k = µ versus H1: At least one of the µ k differs from µ . Test for One-Way Classification 1.

State H0 and H1 H0: µ 1 = µ 2 = ... = µ k

2.

Choose the level of significance, α

3.

Complete the ANOVA table

10/14/08

35

ANOVA TABLE S. V.

d. f.

SS

MS

F-Ratio

Treatment

k–1

SStr

SStr/k–1 = MStr

Error

k(n – 1)

SSE

SSE/k(n–1)= MSE

MS tr = FCal MS E

Total

kn – 1

SST

5.

Under H0 and the assumptions in (3) being correct, Fcal under F – Ratio in the ANOVA table has Fk-1,(n – 1) – distribution. Hence, we find the critical point by reading off Fk-1,(n – 1) ( α ) from the F – distribution table for the appropriate level of significance.

6.

Compare the values of Fcal from the ANOVA table and Fk-1,(n – 1) ( α ) – from the statistical table. If Fcal > Fk-1,(n – 1) ( α ) then reject the null hypothesis.

7.

Draw a conclusion.

Remark When the sample sizes (i.e. the number of observations in each treatment) are not all equal, necessary adjustment must be made in the computation of sums of squares. Example Six patients each were tested on four types of oral contraceptive 10/14/08 36 to investigate the average reaction time.

Risk Estimation: Disease

Exposure

 

Yes

No

Total

Yes

a

b

c+b

No

c

d

c+d

Total

a +c

b+d

n=a+b+c+d

Relative Risk (RR) RR estimates the magnitude of an association between exposure and disease. It indicates the likelihood of developing the disease in the exposed group relative to those who are not exposed. It is the ratio of the incidence of disease in the exposed group divided by the corresponding incidence of disease in the non-exposed group.

10/14/08

37

Thus, RR

a /( a + b) a c + d a (c + d ) = = . = c /(c + d ) a + b c c ( a + b)

Remarks:  1. RR of 







1.0 indicates that the incidence rates of disease in the exposed and non-exposed groups are identical and thus indicates that there is no association observed between the exposure and the disease. 2. A value of RR greater than 1.0 indicates a positive association or an increase risk among those exposed (to a factor). 3. Analogously, a RR less than 1.0 means that there is inverse association or a decrease risk among those exposed. 4. RR may change (in some cases) with time e.g. RR for 1 year exposure might be different from RR for 10 years exposure.

10/14/08

38

Odd Ratio (for case – control cases) Cases where participants are selected on the basis of their disease status. OR



ratio of the odds of exposure among the cases to that among the controls.

OR



10/14/08

a c b d

ad = bc

39

  

 

Worked Examples Example 1: Blood pressure levels were measured in 100 diabetic and 100 non-diabetic women aged 40 – 49 years. Mean systolic blood pressures were 146.4 mm Hg (with standard deviation of 18.5) among the diabetics and 140.4 mm Hg (with standard deviation of 16.8) among the non-diabetics. By making the necessary assumptions, calculate the 95% confidence interval for the difference of means of the blood pressures of the two groups of women. Solution: Assume that the blood pressures of each of the two groups of women are normally distributed. Hence, assume that the difference of means of the blood pressures is also normally distributed.

10/14/08

40

Given is : 100(1− α )% = 95%

⇒ 1 − α = 0.95 ⇒ α = 0.05 ⇒ α = 0.025 2

The formula for 100(1 − α )% CI for difference of two means is:

S12 S 22 X 1 − X 2 ± Z (α ) . + 2 n1 n2 This is true since n1 = n2 = 100 are considered to be large values. Substituting, we have Z (α ) = Z (0.025 ) = 1.96 2

18.5 2 16.8 2 146.4 − 140.4 ± 1.96 + 100 100 i.e.

6 ± 1.96 × 2498979792

i.e.

6 ± 4.898

(1.102,

10.898)

∴ 95% confidence interval for the difference of mean is: 1.1 to 10.9

10/14/08

41



Example 2:



A team of medical researchers wished to measure the level of weight gained by users of oral contraceptives. The weights of 12 women were taken before and after the use of the contraceptive within one year interval. But unfortunately, one of the women died before the end of the year, and therefore there was no result for her (this is indicated by * in the date set). Estimate the weight of the woman that died before the experiment was concluded.

10/14/08

42

Weights of Women Before (X) 50 55 60 65 70 75 79.5 80 85 90 95 100

After (Y) 61 61 59 71 80 76

* 90 106 98 100 114

Solution:

First, we shall find the regression line Y = α + β X by estimating α and β .

10/14/08

43

Complete the table: x

y

x2

y2

xy

50 55 60 65 70 75 80 85 90 95 100 825

61 61 59 71 80 76 90 106 98 100 114 916

2500 3025 3600 4225 4900 5625 6400 7225 8100 9025 10000 64625

3721 3721 3481 5041 6400 5776 8100 11236 9604 10000 12996 80076

3050 3355 3540 4615 5600 5700 7200 9010 8820 9500 11400 71790

10/14/08

44

Using the result of the table we get

n x y − x y β = ∑ i 2i ∑ i ∑2 i = 1.1236 n∑ x i − ( ∑ x i ) ∧



α = Y − β X = −0.9973 ∧



∴ Y = α + β X = − 0.9973 + 1.1236 X Hence, when X = 79.5

we have

Y = − 0.9973 + 1.1236 × 79.5 = 88.3289 That is, the estimated weight of the woman that died (after one year) would have been 88.33kg. 

Example 3: Serum amylase determination were made on a sample of 15 apparently healthy subjects. The sample yielded a mean of 96 units/100ml and a standard deviation of 35 units/100 ml. The population variance was unknown. Can one conclude that the mean of the population from which the sample of Serum amylase determination came is different from 120.

10/14/08

45

Solution:

H 0 : µ = 120 = µ 0 H 1 : µ ≠ 120 ≠ µ 0

test statistic is

t=

X − µ0 S µ

Let α = 0.05

Since we have a two sided test we put α

2

= 0.025 in each tail of the

distribution ∴ we find t14 (0.025 ) = 2.1448 (obtained from statistical table) computed t,

t=

96 − 120 = −2.65 35 15

∴ t = 2.65 Decision rule: Since t = 2.65 > t14 (0.025 ) = 2.1448 We shall reject the null hypothesis. Conclusion: Based on the given data we shall conclude that the mean of the population from which the sample came is not 120. 10/14/08

46

Exercise: At admission two groups of women on two different family planning methods in clinical trials show the following characteristics. Mean

SD

No. of women

Weight (kg) Cycloprovera HRP 102

56.83 59.29

12.48 15.47

42 48

Height (cm) Cycloprovera HRP 102

155.86 155.83

5.17 6.39

42 48

Age (years) Cycloprovera HRP 102

27.71 28.46

4.10 4.66

42 48

Systolic BP (mm Hg) Cycloprovera HRP 102

118.7 121.9

9.2 9.8

42 48

Diastolic BP (mm Hg) Cycloprovera HRP 102

78.1 78.9

7.3 7.9

42 48

Find whether the two groups differ substantially at admission 10/14/08

47

Related Documents