Bio B ostaatiistiics
Courrtesy ofo Sayyed Hassan AL-AAwami
Speecial Thannks to
6eebb
Statistical inferences:
1. Estimation 2. Hypothesis testing
Hypothesis Testing Before I start my 2nd lecture… I would like to say that hypothesis testing needs more attention than estimation. I.e. don’t read this while you are watching TV or eating your dinner.
Other thing, I have done little new things in this lecture to make it easier for you:
1. It is highly
colorized
to comfort your eyes
and
waste your printer-ink. 2. Full of examples. And from my mates’ experiences, the more examples you solve, the better you become. 3. Also, it is very important to understand the
doctor’s note, so I
stool some full sentences from it. And I searched in some websites to make sure of some information. 4. I explained few things in ARABIC, which is not accepted from a 3rd medical-school student. (sorry, my English is on my size)
Ready……Let’s start then:
Introduction: Hypothesis testing is another way to make statistical interferences. There are two types of Hypothesis that we are going to deal with:
1. Null hypothesis (H0). 2. Alternative hypothesis (HA).
I hope that it will become clear while you move on
Example: In a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug. We would write H0: there is no difference between the two drugs on average. However, the H1 is that there is a difference between them. Notice these two important points: 1. Any hypothesis should be: a. Clear b. Ambitious c. Can be tested. 2. Equality must appear in the null hypothesis (=, <, >). 3. The null and alternative hypotheses are complementary. That is, the two together exhaust all possibilities regarding the value that the hypothesized parameter can assume. (If H0: µ = x, then HA: µ ≠ x…and if H0: µ < x, then HA: µ > x)
.
Recall: Alpha = level of significance, error, type 1 error, p-value. So don’t be confused if you see these names
How to test a hypothesis? These are simple steps which will help you. We are going to discuss each one separately.
1. Formulate the null and alternative hypothesis: It is the most important step. - The researcher tries to Reject the null hypothesis. - In the questions, the alternative hypothesis is the one which usually given.
2. Choose a level of significance: It is Alpha; usually it is given in the question. (Usually, it is 0.05) Notice these things about alpha, it usually equals to: 0.1 For social science. 0.05 For physiological studies or pathogenesis. 0.01 For clinical denial.
3. Determine the sample size: (n) Also it is usually given.
4. Calculate z (or t) score: (which is the test statistic) The general equation:
Test statistic = (relevant statistic- hypothesized parameter) / standard error of the relevant statistic. Relevant statistic = sample mean.
Test statistic = (x- µ) / SE.
5. Utilize the table to determine if the z score falls within the acceptance region. This is very important and a little bit complicated. But we are going to make it ridiculous as far as possible. - “Acceptance region” means: when we draw the normal distribution curve, And mark “the level of significance area” or “alpha”… we will have a shape like these:
- Notice the pink or the blue areas are the level of significance. They are also called “rejection regions” and the white areas are called “the acceptance regions” - This curve is for NULL HYPOTHESIS.
- WHAT does that mean YA SAYED? When you draw the curve and mark the level of significance “all that from the question”, you will calculate the test statics from the equation. Then you will see is your result in the pink or blue areas, or in the white area… If it is in the pink or blue areas, you will reject the null hypothesis and accept the alternative hypothesis. And if it is in the white area, you will accept the null hypothesis and reject the alternative hypothesis.
Comparing a sample mean with population mean Example: suppose that chest circumference for new born girls is N distributed with µ = 13 in (µ=parameter of the population), SD= 0.7 in. group of 25 new born girls found to have average chest circumference of 12.6 in (x=parameter of the sample). is this evidence that this group differ from the population? At alpha= 0.05. H0: µ= 13. (H0: µ = x) HA: µ≠ 13. (HA: µ ≠ x) Why? As we said before, the researcher tries to reject the null hypothesis. وبين١٢.٦ نحن نريد التحقق ھل يوجد فرق بين المعطى من التجربة العملية وھو ؟؟١٣ النظرية الموضوعة .HA وكما قلنا سابقا بأن المعطى في السؤال عادة ما يكون ال يوجد فرق:H0 إذاً ستكون. يوجد فرق بينھما:أال وھو
- The next step we will draw the normal distribution curve for NULL HYPOTHESIS with α = 0.05. (Again: notice these hints in the questions: (differ) means two tails AND (more/less) than means one tail)
(These steps are already explained on estimation)
Now, we are going to calculate the test statistics: Z= (x-µ) /SD Z= (12.6 – 13) / SE = -2.9 (2.9). Now, we look at the curve that we drew. Is this number (2.9) in the acceptance region?? 2.9> 1.96 We will find that this number is on the blue region which is the rejected region. So, we will reject the H0 and accept the HA. i.e. µ ≠ x HA: µ ≠ 13. So, the verbal conclusion is:
Based on this data, the mean of the population chest circumference is different from the mean of the sample at alpha = 0.05
Example: The mean value of a certain enzyme level in 10 subjects is 28 with variance equals 45. In general population, the mean level of that enzyme is 25. Find if the mean enzyme level in this sample differ from that of the general population.
وبين النظرية28 نحن نريد التحقق ھل يوجد فرق بين المعطى من التجربة العملية وھو ؟؟25 الموضوعة .HA وكما قلنا سابقا بأن المعطى في السؤال عادة ما يكون ال يوجد فرق:H0 إذاً ستكون. يوجد فرق بينھما:أال وھو
H0 = 25, HA ≠ 25 Notice: there is no Level of significance in the question (α), so will assume it as 0.05.
Now, we will calculate the test statistics: Z = (28- 25) / SE = 1.41 1.41 < 1.96 …… THAT means that it is located in the acceptance region. So, we will accept the H0. Conclusion, based on this data, the mean level of the enzyme in this sample does not differ from the population at alpha = 0.05
IMPORTANT POINT THE CURVE may have one tail or two. Usually, when the H0 is (=), the curve has two tails. But when it is (<) or (>), it has one tail
Example: 50 smokers were questioned about the number of hours they sleep each day. We want to test the hypothesis that the smokers need less sleep than the general public which needs an average of 7.7 hours of sleep. At level of significance equal to 0.05. If the sample mean is 7.5 and the standard deviation is 0.5, what can you conclude? µ = 7.7 hrs. x = 7.5 hrs.
نحن نريد التحقق ھل أن متوسط النوم الذي يحتاجه المدخنين والذي حصلنا...ھذه المرة ؟؟٧.٧ ( ھو اقل من النظرية الموضوعة٧.٥) عليه من التجربة العملية .HA وكما قلنا سابقا بأن المعطى في السؤال عادة ما يكون .ً ھو أقل فعال:أال وھو
H0: µ > 7.7 (x < µ)
HA: µ < 7.7 (x > µ)
المفروض اصير الرقم موجب )االشارات مو داك الزود بس لألسف ما.....١.٦٤٥ مھمة في ھالموقع( ويساوي ً وخالص طبعا .حصلت رسمة ومستمل ارسم بايدي ...١.٦٤٥ سويناه ومن وينةhow المفروض افھمتوا
!!تخيل انك بعدك ما تعرف
Remember: it is only one tail, because it is not (=) as we said before. And we choose the left area because it is (less than). That is why it is (-ve). And, please, don’t forget how that we drew this curve. Z = (7.5-7.7) / SE = -2.83 Since -2.83 is to the left of -1.٦٤٥, it is in the critical region. Hence we reject the null hypothesis and accept the alternative hypothesis. SO, the conclusion is: based on this data, the smokers need less hours of sleeping than general public. At the level of significance = 0.05
Do you feel that you can easily answer this type of questions by now?
Test yourself Example: Suppose that we want to test the hypothesis with a
significance level of 0.05 that the climate has changed since industrialization. Suppose that the mean temperature throughout history is 50 degrees. During the last 40 years, the mean temperature has been 51 degrees with a standard deviation of 2 degrees. What can we conclude?
Answer: we can conclude that there has been a change in temperature.
The difference between two population means Example: in a sample of 12 patient’s in Hospital A yielded a mean serum uric acid value of X1 = 4.5 mg/100ml with variance = 1 in. Hospital S a sample of 15 patients of the same age and sex has X2 = 3.4 with variance= 1. If the two populations are normally distributed. Do these data provide sufficient evidence to indicate a DIFFERENT between the two populations means at α = 0.05? There is a little difference but it is easy too. H0: µ1 = µ2 H1: µ1 ≠ µ2 The difference is that the test statistic equation will be: SE = Z = {(x1-x2) – (µ1-µ2)} / SE Z = 2.82
(µ1-µ2 = 0 ) ألنھم متساوين
And when you draw the curve you will see it is two tails…. ( بصير زي )الرسمات الي راحت Conclusion: Since calculated value 2.82 > tabulated value 1.96. That means the null hypothesis is rejected.
Problem: In health survey of a certain community, 150
This problem was a part of one exam
persons were interviewed. One of the items of information obtained was the number of prescriptions each person has filled during the past years per day. The average number of 150 people was 5.8 with standard deviation of 3.1. The investigators wishes to know if this data provide sufficient evidence to indicate that the population mean is greater than 5 at 0.05 level of significance.
Find: 1. The null hypothesis. 2. The alternative hypothesis. 3. Test u should use. 4. Calculated value. 5. Tabulated value. 6. Decision. 7. Interpretation. 8. 95% C.I for the problem.
1. The null hypothesis. H0: population mean < 5
2. The alternative hypothesis. As we said before, the alternative hypothesis is the one which is given in the question. H1: population mean > 5
3. The test you should use. Test statistics for one population mean: Z = (x-µ) / SE
Never forget Tests: ‐One population mean
‐Two populations’ means
‐Paired t‐test
‐One population proportion
‐Two population proportions
4. Calculated value. Z = (x-µ) / SE Z = (5.8 - 5) / 0.25 = 3.2
‐Chi ‐squared
5. Tabulated value. Since the level of significance = 0.05, the curve will be like this: مرة ثانية: المفروض اصير الرقم موجب ويساوي ١.٦٤٥ومن الجھة الثانية بس لألسف ما حصلت رسمة ومستمل ارسم بايدي .
From the diagram above: Tabulated value for (z) = 1.645 مالحظة ھامة جداً :ليش ما كان الناتج 1.96؟ حاول الرسم بنفسك ...وستجد أن المساحة في احد النصفين تساوي ٠.٥والباقي ھو ٠.٤٥في الجھة األخرى ...ومن الجدول نجد انه ال يوجد لدينا ٤٥كاملة بل يوجد ٤٤.٩٥و ... ٤٥.٠٥لذلك أخذنا الرقم الذي بينھما اال وھو .١.٦٤٥
6. Decision. We reject the null hypothesis and accept the alternative one. I.e. the population mean is greater than 5. للتوضيح :سبب رفضنا لــ H0ان الرقم الي حسبناه طلع اكبر من الي في الجدول ...وال تنسوا ان الرسمة الي رسمناھا خاصة بالــ H0لھذا رفضناھا ھي وليس .HA
7. Interpretation. Based on this data, the mean number of prescription of the sample is higher than the population mean at level of significance 0.05.
8. 95% C.I for the problem. C.I. = Estimator + [Reliability coefficient x Stand error]. C.I. = 5.8 + (1.96 x 0.25) = 6.29, 5.31
Why did we take 1.96?? Because in any C.I. always take two tales!!
We hope you enjoyed while reading this note. We tried our best to make it easy and light… At the end, Again and again, I would like to say that if there were any mistakes and you didn’t notice, then, say GOOD BYE to some marks. “Testing hypothesis-topic WAS my nightmare. But after reading this, it became my lovely dream. I can say about this note: it is really highly colorized, but, in the same time, it is easy metabolized” 3rd year student