MGT1051 Business Analytics for Engineers
Normal Distribution
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Data Distribution • Data can be “distributed” (spread out) in different ways
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
What is Normal (Gaussian) Distribution? • The normal distribution is a descriptive model that describes real world situations. • It is defined as a continuous frequency distribution of infinite range (can take any values not just integers as in the case of binomial and Poisson distribution). • This is the most important probability distribution in statistics and important tool in analysis of epidemiological data and management science.
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Types of Distribution • Frequency Distribution • Normal (Gaussian) Distribution • Probability Distribution • Poisson Distribution • Binomial Distribution • Sampling Distribution • t distribution • F distribution © 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
A Bell Curve
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
What are some examples of things that follow a Normal Distribution? • Heights of people • Size of things produced by machines • Errors in measurements • Blood Pressure • Test Scores
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Standard Normal Distribution • mean=median=mode • Symmetry about the center • 50% of the values less than the mean and 50% greater than the mean
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Characteristics of Normal Distribution • It links frequency distribution to probability distribution • Has a Bell Shape Curve and is Symmetric • It is Symmetric around the mean: Two halves of the curve are the same (mirror images)
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
The Standard Deviation 68% of values are within 1 standard deviation of the mean 95% of values are within 2 standard deviations of the mean 99.7% of values are within 3 standard deviations of the mean © 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Why do we need to know Standard Deviation? • Any value is • likely to be within 1 standard deviation of the mean • very likely to be within 2 standard deviations • almost certainly within 3 standard deviations
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
How good is rule for real data? Check some example data: • The mean of the weight of the women = 127.8 lb • The standard deviation (SD) = 15.5 lb
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
68% of 120 = .68x120 = ~ 82 runners In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean.
112.3
127.8
143.3
25
20 P e r c e n t
15
10
5
0 80
© 2018 C. Gangatharan – VIT
90
100
110
120 POUNDS
Dec 11, 2018 – Tue
130
140
150
160
MGT1051 – Business Analytics for Engineers
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.
96.8
127.8
158.8
25
20 P e r c e n t
15
10
5
0 80
© 2018 C. Gangatharan – VIT
90
100
110
120 POUNDS
Dec 11, 2018 – Tue
130
140
150
160
MGT1051 – Business Analytics for Engineers
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the mean.
81.3
127.8
174.3
25
20 P e r c e n t
15
10
5
0 80
© 2018 C. Gangatharan – VIT
90
100
110
120 POUNDS
Dec 11, 2018 – Tue
130
140
150
160
MGT1051 – Business Analytics for Engineers
The Normal Distribution: as mathematical function (pdf)
f ( x)
1
2
This is a bell shaped curve with different centers and spreads depending on and
Note constants: =3.14159 e=2.71828 © 2018 C. Gangatharan – VIT
1 x 2 ( ) 2 e
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Outliers ? Bill Gates makes $500 million a year. He’s in a room with 9 teachers, 4 of whom make $40k, 3 make $45k, and 2 make $55k a year. What is the mean salary of everyone in the room? What would be the mean salary if Gates wasn’t included? Mean With Gates: $50,040,500 © 2018 C. Gangatharan – VIT
Mean Without Gates: $45,000
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
What is an outlier? • Observations inconsistent with rest of the dataset – Global Outlier • Special outliers – Local Outlier • Observations inconsistent with their neighborhoods • A local instability or discontinuity
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Outlier Detection Find the mean and median of the following set of numbers: 3
12
7
40
9
14
18
15
17
Mean is 15 Median is 14 © 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Outlier In a set of numbers, a number that is much LARGER or much SMALLER than the rest of the numbers is called an Outlier. To find any outliers in a set of data, we need to find the 5 Number Summary of the data. © 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Outlier Detection To find any outliers in a set of data, we need to find the 5 Number Summary of the data. Find the 5 Number Summary of the following numbers:
Step 1: Sort the numbers from lowest to highest Step 2: Identify the Median Step 3: Identify the Smallest and Largest numbers
Step 4: Identify the Median between the smallest number and the Median for the entire set of data, and between that Median and the largest number in the set. 3
7
© 2018 C. Gangatharan – VIT
9
12
14 Dec 11, 2018 – Tue
15
17
18
40
MGT1051 – Business Analytics for Engineers
Outlier Detection 3 - Smallest number in the set
9 - Median between the smallest number and the median 14 - Median of the entire set 17 - Median between the largest number and the median 40 - Largest number in the set
These are the five numbers in the 5 Number Summary 3
7
© 2018 C. Gangatharan – VIT
9
12
14 Dec 11, 2018 – Tue
15
17
18
40
MGT1051 – Business Analytics for Engineers
Outlier Detection A 5 Number Summary divides your data into four quarters.
3
7
9
12
14
15
17
18
1st
2nd
3rd
4th
Quarter
Quarter
Quarter
Quarter
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
40
MGT1051 – Business Analytics for Engineers
Outlier Detection 25% of all the numbers in the set are smaller than Q1
3
7
9
12
14
15
17
18
40
The Lower Quartile (Q1) is the second number in the 5 Number Summary The Upper Quartile (Q3) is the fourth number in the 5 Number Summary 25% of all the numbers in the set are larger than Q3 © 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Outlier Detection What percent of all the numbers are between Q1 and Q3?
3
7
9
12
14
15
17
18
40
50% of all the numbers are between Q1 and Q3 This is called the Inter-Quartile Range (IQR) The size of the IQR is the distance between Q1 and Q3
17 - 9 = 8 © 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Outlier Detection
3
7
9
12
14
15
17
18
40
IQR = 8
To determine if a number is an outlier, multiply the IQR by 1.5 8 • 1.5 = 12
An outlier is any number that is 12 less than Q1 or 12 more than Q3 © 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Outlier Detection + 12 - 12
3
7
9
12
14
15
17
18
40
IQR = 8
-3
39 OUTLIER
© 2018 C. Gangatharan – VIT
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers
Outlier Detection Find the mean and median of the following set of numbers (no outliers):
3
12 7 Mean is 15
40
9
18
15
17
Mean is 11.875
Median is 14
© 2018 C. Gangatharan – VIT
14
Median is 13
Dec 11, 2018 – Tue
MGT1051 – Business Analytics for Engineers