1/16/2019
STATISTIKA & PROBABILITAS
Agung Nugroho, Ph.D 1
1
Penilaian: 1. Tugas (10%) 2. Praktikum (15%) 3. Kuis (15%) 4. UTS (30%)
5. UAS (30%)
2
2
1
1/16/2019
Materi Kuliah
Minggu 1 Minggu 2
Materi Pengantar Statistika dan probabilitas Statistika Deskriptif
Minggu 3 Minggu 4 Minggu 5 Minggu 6
Statistika Deskriptif dan Praktikum Excel Teori Peluang Distribusi Peluang Diskrit Distribusi Peluang Kontinyu
Minggu 7 Minggu 8
Distribusi Sampling & Teknik Sampling UTS 3
3
Materi Kuliah
Minggu 9 Minggu 10
Materi Point estimation dan confidence interval Hypothesis testing
Minggu 11 Minggu 12 Minggu 13 Minggu 14
Hypothesis testing-2 Analisis Regresi dan korelasi Analisis Regresi dan korelasi Praktikum Excel
Minggu 15 Minggu 16
Pengantar Statistika Quality control UAS 4
4
2
1/16/2019
Reference 1. Johnson, R. A., & Bhattacharyya, G. K., “Statistics: Principles and Methods”, Wiley Global Education, 6th Edition, 2014. 2. Douglas C., Montgomery, George C. Runger, “Applied Statistics and Probability for Engineers”, John Wiley & sons, 2014. 3. Levine, D. M., Ramsey, P. P., & Smidt, R. K., “Applied Statistics for Engineers and Scientists: Using Microsoft Excel and Minitab”, Prentice Hall, 2001.
5
5
Introduction
1st week 6
6
3
1/16/2019
What Engineers Do? An engineer is someone who solves problems of interest to society
with the efficient application of scientific principles by: • Refining existing products • Designing new products or processes
The Creative Process
Figure 1-1 The engineering method
7
7
Statistics Supports The Creative Process The field of statistics deals with the collection,
presentation, analysis, and use of data to: • Make decisions • Solve problems • Design products and processes
It is the science of data. For students, statistics is important to collect,
organize, analysis, and interpretation data during research and thesis. 8
8
4
1/16/2019
Definition Statistics is the science of collecting, organizing, analyzing
and interpreting data in order to make decision in the presence of uncertainty. Collection of fact, generally in form of numbers arranged in a table or diagram. Example: Health statistic, Birth statistic, etc
9
9
Classification Statistical Descriptive → collecting, organizing, analyzing
and interpreting data Statistical Inference makes use of information from a sample to draw conclusions about the population from which the sample was taken. ⚫
Descriptive Statistics ✓ ✓ ✓ ✓ ✓
Collect Organize Summarize Display Analyze
⚫
Inferential Statistics ✓ Predict and forecast values of population parameters ✓ Test hypotheses about values of population parameters ✓ Make decisions 10
10
5
1/16/2019
Variability • Statistical techniques are useful to describe
and understand variability. • By variability, we mean successive observations of
a system or phenomenon do not produce exactly the same result. • Statistics gives us a framework for describing this
variability and for learning about potential sources of variability.
11
11
An Engineering Example of Variability Eight sample are taken from output waste water treatment plant and their Cl concentration are measured (in ppm): 12.6, 12.9, 13.4, 12.3, 13.6, 13.5, 12.6, 13.1. All of the sample does not have the same concentration. We can see the variability in the above measurements as they exhibit variability. The dot diagram is a very useful plot for displaying a small body of data say up to about 20 observations. This plot allows us to see easily two features of the data; the location, or the middle, and the scatter or variability.
Cl concentration 12
12
6
1/16/2019
Hypothesis Tests Hypothesis Test • A statement about a process behavior value. • Compared to a claim about another process value. • Data is gathered to support or refuse the claim.
One-sample hypothesis test: • Example: chlorine concentration (ppm) = 30 vs chlorine concentration (ppm) < 30
Two-sample hypothesis test: • Example: chlorine conc. at A (ppm) – chlorine conc. at B (ppm) = 0 vs chlorine conc. at A (ppm) – chlorine conc. at B (ppm) > 0 13
13
An Experiment in Variation W. Edwards Deming, a famous industrial statistician & contributor to the Japanese quality revolution, conducted a illustrative experiment on process over-control or tampering. Let’s look at his apparatus and experimental procedure. Marbles were dropped through a funnel onto a target and the location where the marble struck the target was recorded. Variation was caused by several factors: Marble placement in funnel & release dynamics, vibration, air currents, measurement errors.
14
14
7
1/16/2019
How Is the Change Detected Graphically? The center line on the control chart is just the average of the concentration measurements for the first 20 samples X = 91.5 g / l
when the process is stable. The upper control limit and the lower control limit are located 3 standard deviations of the concentration values above and below the center line. Figure 1-5 A control chart for the chemical process concentration data. Process steps out at hour 24 & 29. Shut down & adjust process. 15
15
Mechanistic and Empirical Models A mechanistic model is built from our underlying knowledge of the basic physical mechanism that relates several variables. Example: Ohm’s Law Current = V/R I = E/R I = E/R + where is a term added to the model to account for the fact that the observed values of current flow do not perfectly conform to the mechanistic model. • The form of the function is known. An empirical model is built from our engineering and scientific knowledge of the phenomenon, but is not directly developed from our theoretical or firstprinciples understanding of the underlying mechanism. The form of the function is not known. 16
16
8
1/16/2019
An Example of an Empirical Model • In a semiconductor manufacturing plant, the finished semiconductor is wirebonded to a frame. In an observational study, the variables recorded were: • Pull strength to break the bond (y) • Wire length (x1) • Die height (x2)
17
17
Visualizing the Data and Resultant Model Using Regression Analysis
3D plot of the pull strength (y), wire length (x1) and die height (x2) data.
3D Plot of the predicted values (a plane) of pull strength from the empirical regression model.
18
18
9
1/16/2019
DESCRIPTIVE STATISTIC
19
19
Statistic Descriptive Describe the basic features of the data in a study. It provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data
20
10
1/16/2019
Key term Variable, is a characteristic that changes or varies over time
and/or for different individuals or objects under consideration. Ex = Hair color, white blood cell count , bottom outlet of disitillation tower, etc Data is a set of measurements, can be either from a sample or a population. Population is the set representing all measurements of interest to the investigator. A population is any entire collection of people, animals, plants or things from which we may collect data. In order to make any generalizations about a population, a sample, that is meant to be representative of the population, is often studied. A sample is a subset of measurements selected from the population of interest. The sample should be representative of the population. 21
Population and Sample: Example Population: 150-plus million adult American. Sample: 1500 interviewed.
Population (N)
Sample (n)
22
11
1/16/2019
Types of Data
Ex : red, black, blue, white
Ex : None, mild, moderate, severe
Ex : 1 person, 3 student, 5 pet Ex : 166 cm, 63.9kg,etc
23
Type of Measurement Scales • Nominal Scale - groups , classes, categories
✓Gender, color, professional classification, etc. • Ordinal Scale - order matters
✓Ranks (top ten videos, products, etc.) • Interval Scale - difference or distance matters.
✓Temperatures (0F, 0C) • Ratio Scale - Ratio matters.
✓Salaries, weight, volume, area, length, etc.
24
12
1/16/2019
Percentiles and Quartiles Percentiles partition the data into 100 segments. The Pth percentile in the ordered set is that value below which lie P% (P percent) of the observations in the set. The position of the Pth percentile is given by (n + 1)P/100, where n is the number of observations in the set.
⚫ ⚫
⚫
25
Example The magazine Forbes publishes annually a list of the world’s wealthiest individuals. For, 2007, the net worth of the 20 richest individuals, in $ billions, is as follows:
Billions 33 26 24 21 19 20 18 18 52 56 27 22 18 49 22 20 23 32 20 18
Sorted Billions 18 18 18 18 19 20 20 20 21 22 22 23 24 26 27 32 33 49 52 56
Find the 50th, 80th and the 90th percentiles of this data set.
⚫
26
13
1/16/2019
Example (Continued) Percentiles To find the 50th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(50/100) = 10.5. Thus, the percentile is located at the 10.5th position. The 10th observation in the ordered set is 22, and the 11th observation is also 22. The 50th percentile will lie halfway between the 10th and 11th values (which are both 22 in this case) and is thus 22.
⚫
⚫ ⚫
⚫
27
Example
⚫
⚫
⚫
⚫
To find the 80th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(80/100) = 16.8. Thus, the percentile is located at the 16.8th position. The 16th observation is 32, and the 17th observation is also 33. The 80th percentile is a point lying 0.8 of the way from 32 to 33 and is thus 32.8.
28
14
1/16/2019
Example
⚫
⚫ ⚫
⚫
To find the 90th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(90/100) = 18.9. Thus, the percentile is located at the 18.9th position. The 18th observation is 49, and the 19th observation is also 52. The 90th percentile is a point lying 0.9 of the way from 49 to 52 and is thus 49 + 0.9(52 – 49) = 49 + 0.93 = 49 + 2.7 = 51.7.
29
Quartiles – Special Percentiles ⚫
⚫
⚫
⚫
⚫
Quartiles are the percentage points that break down the ordered data set into quarters. The first quartile (lower quartile, Q1) is the 25th percentile. It is the point below which lie 1/4 of the data. The second quartile (middle quartile, Q2) is the 50th percentile. It is the point below which lie 1/2 of the data. This is also called the median. The third quartile (upper quartile, Q3) is the 75th percentile. It is the point below which lie 3/4 of the data. The interquartile range (IQR) is the difference between the first and the third quartiles. IQR = Q3 – Q1
30
15
1/16/2019
Example Finding Quartiles Billions 33 26 24 21 19 20 18 18 52 56 27 22 18 49 22 20 23 32 20 18
Sorted Billions 18 18 18 18 19 20 20 20 21 22 22 23 24 26 27 32 33 49 52 56
(n+1)P/100 Position
First Quartile (20+1)25/100=5.25
Median
(20+1)50/100=10.5
Third Quartile (20+1)75/100=15.75
Quartiles
19 + (.25)(1) = 19.25
22 + (.5)(0) = 22
27+ (.75)(5) = 30.75
31
Your Turn! Fifty statistics students were asked how much sleep they get per school night (rounded to the nearest hour). The results were (student data):
• • • •
28th percentile = 3rd quartile = 80th percentile = 90th percentile =
AMOUNT OF FREQUENCY SLEEP PER SCHOOL NIGHT (HOURS) 4 2 5 5 6 7 7 12 8 14 9 7 10 3
32
32
16
1/16/2019
Summary Measures: ⚫
Measures of Central Tendency
⚫
Measures of Variability ✓ Range
✓ Median
✓ Interquartile range
✓ Mode
✓ Variance ✓ Standard Deviation
✓ Mean
⚫
Measures of Shape: ✓ Skewness ✓ Kurtosis
33
MEASURES OF CENTER
(Ukuran Pemusatan) Mean, Median, Mode 34
17
1/16/2019
Arithmetic Mean or Average The mean of a set of measurements is the sum of the measurements divided by the total number of measurements. Symbol: x bar x Grouped data
Ungrouped data
x=
xi n
x=
f i .xi n
where n = number of measurements 𝑥𝑖 = sum of measurements
35
Example: Mean Consider 8 observations (xi) of pull-off force from engine connectors as shown in the table. i 8
x = average =
xi i =1
=
8 104 = = 13.0 pounds 8
12.6 + 12.9 + ... + 13.1 8
xi 12.6 12.9 13.4 12.3 13.6 13.5 12.6 13.1 13.00 = AVERAGE($B2:$B9) 1 2 3 4 5 6 7 8
Figure 6-1 The sample mean is the balance point.
If we were able to enumerate the whole population, the population mean would be called μ (the Greek letter “mu”). 36
36
18
1/16/2019
Median • The median of a set of measurements is the middle measurement when the measurements are ranked from smallest to largest. • The position of the median is once the measurements have been ordered.
0.5(n +1) • Also called second quartile or 50th percentile
37
Example The set: 2, 4, 9, 8, 6, 5, 3 Sort:
n=7
2, 3, 4, 5, 6, 8, 9 Position: 0.5(n + 1) = 0.5(7 + 1) = 4th
Median = 5
• The set: 2, 4, 9, 8, 6, 5 n=6 Median = (5 + 6)/2 = 5.5 • Sort: 2, 4, 5, 6, 8, 9 →average of the 3rd and 4th data • Position: 0.5(n + 1) = 0.5(6 + 1) = 3.5th
38
19
1/16/2019
Mode The mode is the data which occurs most
frequently. Example: 1. The set: 2, 4, 9, 8, 8, 5, 3 The mode is 8, which occurs twice 2. The set: 2, 2, 9, 8, 8, 5, 3 There are two modes—8 and 2 (bimodal) 3. The set: 2, 4, 9, 8, 5, 3 There is no mode (each value is unique). 39
Example The number of quarts of milk purchased by 25 households: 0 0 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 5
Mean?
x=
xi 55 = = 2.2 n 25
Median?
m=2 Mode? (Highest peak)
mode = 2 40
20
1/16/2019
MEASURES OF VARIABILITY
(Ukuran Penyebaran) Range, Interquartile range, Variance, standard deviation 41
Variability Tell us how far scores spread out
Tells us how the degree to which scores deviate
from the central tendency
Mean = 10
Mean = 10
42
42
21
1/16/2019
Measures of Variability or Dispersion ⚫
Range ✓ Difference between maximum and minimum values
⚫
Interquartile Range ✓ Difference between third and first quartile
⚫
(Q3 - Q1)
Variance ✓ Average of the squared deviations from the mean
⚫
Standard Deviation ✓ Square root of the variance
43
Sample Range If the n observations in a sample are denoted by x1, x2, …, xn, the sample range is: r = max(xi) – min(xi)
(6-6)
It is the largest observation in the sample minus the smallest observation. From Example : r = 13.6 – 12.3 = 1.30
Note that: population range ≥ sample range 44
44
22
1/16/2019
Example 1-3: Finding range Billions 33 26 24 21 19 20 18 18 52 56 27 22 18 49 22 20 23 32 20 18
Sorted Billions 18 18 18 18 19 20 20 20 21 22 22 23 24 26 27 32 33 49 52 56
Ranks 1 2 3 4 5 First Quartile 6 7 8 9 10 Median 11 12 13 14 15 Third Quartile 16 17 18 19 20
Range = Maximum – Minimum = 56 – 18 = 38
(20+1)25/100=5.25
19 + (.25)(1) = 19.25
(20+1)50/100=10.5
22 + (.5)(0) = 22
(20+1)75/100=15.75
27+ (.75)(5) = 30.75
Interquartile Range = Q3 – Q1 = 30.75 – 19.25 = 11.5
45
Variance
( xi − x ) 2 2 s = n −1
( xi − ) 2 2 = N
2
s2 =
( xi ) 2 n n −1
xi −
2
2 =
( xi ) 2 n N
xi −
46
46
23
1/16/2019
Standard Deviation The standard deviation is the square root of the
variance. σ is the population standard deviation symbol. s is the sample standard deviation symbol. Sample standard deviation: 𝑠 = 𝑠 2 Population standard deviation : 𝜎 = 𝜎 2
47
47
Example : Sample Variance Table below displays the quantities needed to calculate the sample variance and sample standard deviation.
Dimension of: xi is pounds Mean is pounds. Variance is pounds2. Standard deviation is pounds. Desired accuracy is generally accepted to be one more place than the data.
i 1 2 3 4 5 6 7 8 sums =
xi x i - xbar 12.6 -0.4 12.9 -0.1 13.4 0.4 12.3 -0.7 13.6 0.6 13.5 0.5 12.6 -0.4 13.1 0.1 104.00 0.0 divide by 8 xbar = 13.00 variance = standard deviation =
2
(x i - xbar) 0.16 0.01 0.16 0.49 0.36 0.25 0.16 0.01 1.60 divide by 7 0.2286 0.48
48
48
24
1/16/2019
Example : Variance by Shortcut n x − xi i =1 i =1 2 s = n −1 n
2 i
2
n
1,353.60 − (104.0 ) 8 = 7 2
=
1.60 = 0.2286 pounds 2 7
i xi 1 12.6 2 12.9 3 13.4 4 12.3 5 13.6 6 13.5 7 12.6 8 13.1 sums = 104.0
2
xi 158.76 166.41 179.56 151.29 184.96 182.25 158.76 171.61 1,353.60
s = 0.2286 = 0.48 pounds 49
49
Exercise The experiment show that concentration of Cl- in the of solution is measured by one operator using the same instrument 8 times. She obtains the following data (ppm):
7.15, 7.20, 7.18, 7.19, 7.21, 7.20, 7.16, and 7.18 Calculate the sample mean, mode, median Find 28th percentile, 80th percentile and 1st quartile
Calculate variance and standard deviation 50
25
1/16/2019
51
52
52
26