Handout1

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Handout1 as PDF for free.

More details

  • Words: 3,152
  • Pages: 10
Program & Bibliographie - 3(1,2): ~5 theory (301, B2) +10 practice (Comp. Chem. Lab by gro up)

- Website: www2.hcmut.edu.vn/~dzung / (available from Sep 15)

TIN HỌC TRONG CNTP

- R: www.rwww.r-project.org

Nguyễ Nguyễn Hoà Hoàng Dũng, ng, PhD. Trườ Trường Đại học Bách khoa Tp. Tp. HCM

NHDzung – Lesson 1, slide 2

Problem Foreign and Vietnamese Cheeses : Quality and Preference ? HowHow-to Conduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results *

NHDzung – Lesson 1, slide 3

1-1. Samples and Populations A population consists of the set of all measurements in which the investigator is interested. A sample is a subset of the measurements selected from the population. A census is a complete enumeration of every item in a population.

NHDzung – Lesson 1, slide 5

Sensory practices

NHDzung – Lesson 1, slide 4

Simple Random Sample Sampling from the population is often done randomly, randomly, such that every possible sample of equal size (n) will have an equal chance of being selected. A sample selected in this way is called a simple random sample or just a random sample. sample.

NHDzung – Lesson 1, slide 6

1

Samples and Populations

Problem Foreign and Vietnamese Cheeses : Quality and Preference ? HowHow-to Conduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results

Population (N)

Sample (n)

*

Sensory practices

NHDzung – Lesson 1, slide 7

Measurements

NHDzung – Lesson 1, slide 8

The criteria of “science”

•The assigning of numbers to the values of a variable (SS Stevens, Science 1946;103:677 -80) •Rules specify procedures to assign numbers to values

Science

Pseudoscience

Logic, experimental evidence

Belief, loyalty

Results are repeatable

Results are not repeatable

Falsiability* Falsiability*

Not falsifiable

PeerPeer-reviewed journals

Not in peer reviewed journals

Evolution / learn from mistakes

Constant, unchanged belief

*capable of being tested (verified or falsified) by experiment o r observation NHDzung – Lesson 1, slide 9

Criteria of measurements Validity measures what it purports to

NHDzung – Lesson 1, slide 10

Accuracy vs reliability (precision)

Accuracy - the degree of “truthfulness” truthfulness” of an attribute that is being measured.

Reliability (consistency and repeatability) Sensitivity to important variation

precision

accuracy Measurement error decreases the accuracy of measurement NHDzung – Lesson 1, slide 11

NHDzung – Lesson 1, slide 12

2

Some important concepts: Data - Variables Scales Qualitative - Categorical Frequency or Nominal:

Quantitative - Measurable or Countable:

Examples areare• Color • Gender • Nationality

Examples areare• Temperatures • Humidity • Gross compounds • Preference points scored on a 100 point

THÔNG TIN CHUNG 1.1 Mô tả ngườ người trả trả lời phỏ phỏng vấn 1.1.1 Giới tính của người được phỏng vấn?1 n?1. Nam 1. Độc thân Tình trạng hôn nhân: nhân:

2. Nữ 2. Có gia đình

1.1.2 Tuổi của người được phỏng vấn? Dướ Dưới 25 tuổ tuổi 25 – 30 tuổ tuổi 31 – 54 tuổ tuổi >55 tuổ tuổi 1.1.3 Xin Ông/Bà Ông/Bà cho biết nghề nghiệp hiện nay ? Học sinh, sinh, sinh viên Bác sĩ/giá /giáo viên Công nhân/ nhân/ lao động làm thuê/bá thuê/bán hàng Hưu trí trí 1.1.4 Ông/Bà Ông/Bà cho biết thu nhập của gia đình Ông/Bà Ông/Bà ở mức nào sau đây 1 . Thấ Thấp ( ≥ 2 triệ triệu đồng và < 5 triệ triệu) 2 . Trung bình (≥ 5 triệ triệu và <8 triệ triệu) 3 . Cao ( ≥ 8 triệ triệu)

NHDzung – Lesson 1, slide 13

NHDzung – Lesson 1, slide 14

Some important concepts: Data - Variables Scales •8 phomat phomat (EdamF (EdamF,, EdamH, EdamH, GoudaH, GoudaH, m1, m2, m3, m4, m5) m5) •11 ngườ người thử thử (chuyên gia) •3 lầ lần lặ lặp lạ lại

Variable Measurement scales • Discrete variables • Nominal scales ? (Label) • Continuous variables • Ordinal scales (Ranks in Army) • Independent variables • Inteval scales (Celsius, • Dependent variables Fahrenheit)

•15 thuậ thuật ngữ ngữ mô tả: sour bitterness umami salty greasiness

• Ration scales (true zero point, ratio)

butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full flavor yellow hard •Thang điể điểm không cấ cấu trú trúc từ từ 0-100 mm NHDzung – Lesson 1, slide 15

NHDzung – Lesson 1, slide 16

Types of measurement Qualitative Qualitative (định (địnhchất) chất)

Qualitative measurements

Quantitative Quantitative (định (địnhlượng) lượng)

Nominal

Interval

Ordinal

Ratio

NHDzung – Lesson 1, slide 17

Nominal level

Ordinal level

• Classification • A set of objects can be classified into exhaustive, mutually exclusive and unique symbol • Ex: religion, sex, location, etc

• Classification + Ordering • A set of numbers can be assigned rank values and nothing more. • Ex: socio-economic status, education, levels of satisfaction, etc

NHDzung – Lesson 1, slide 18

3

Quantitative measurements Interval level

Ratio level

• Classification + Ordering + Standard distance • A set of objects can be described by units that indicate how far one case is from another case • Ex: temperature

• Classification + Ordering + Standard distance + Natural zero • Quantitative variable with natural zero • Ex: income, age, weight, bone mineral density

Problem Foreign and Vietnamese Cheeses : Quality and Preference ? HowHow-to Conduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results *

Sensory practices

NHDzung – Lesson 1, slide 19

1.2.2. Ông/Bà Ông/Bà cho biết loại pho mát cứng nào mà Ông/Bà Ông/Bà thường sử dụng Cheddar Gouda Edam Emental Khá Khác (ghi rõ) rõ)…………………….. …………………….. 1.2.4. Ông/Bà thích chung đối với sản phẩm phó phó mát Ông/Bà cho biết mức độ ưa thí bán cứng 1 2 3 4 5 6 7 8 9 1.2.5. Xin Ông/Bà phó mát bán cứng. ng. Ông/Bà cho biết tần số sử dụng sản phẩm phó > 3 lần/tuầ n/tuần 1 – 2 lần/tuầ n/tuần 1-3 lần/thá n/tháng 1.2.6. Xin Ông/Bà Ông/Bà cho biết lượng phó phó mát bán cứng sử dụng trong tuần của Ông/Bà Ông/Bà < 100g 100 – 300g > 300g

NHDzung – Lesson 1, slide 20

1.2.7. Theo Ông/Bà Ông/Bà phó phó mát cứ ng ăn v ới sản phẩm nào? Bánh mì Bánh sandwich Salad Bánh biscuit Rượ Rượu vang Khá Khác (ghi rõ tên) tên)……………………………… 1.2.8. Khi chọn mua sản phẩm phó phó mát cứ ng, ng, Ông/Bà Ông/Bà cho biết mức độ quan tâm đối với những y ếu tố sau đây (1=r (1=rất không quan tâm, tâm, 2=không 2= không quan tâm, tâm, 3=không 3=không ý kiến, 4=quan 4=quan tâm, tâm, 5=r 5=rất quan tâm) tâm) Giá 1 2 3 4 5 Giá cả Tính chấ 2 3 4 5 chất cảm quan của sản phẩ phẩm 1 Mức độ quen thuộ 1 2 3 4 5 thuộc Thuậ 1 2 3 4 5 Thuận lợi khi sử dụng Có lợi cho sức khoẻ 1 2 3 4 5 khoẻ Khố 1 2 3 4 5 Khối lượ lượng sản phẩ phẩm

NHDzung – Lesson 1, slide 21

NHDzung – Lesson 1, slide 22

•8 phomat phomat (EdamF (EdamF,, EdamH, EdamH, GoudaH, GoudaH, m1, m2, m3, m4, m5) m5) •11 ngườ người thử thử (chuyên gia) •3 lầ lần lặ lặp lạ lại •15 thuậ thuật ngữ ngữ mô tả: sour bitterness umami salty greasiness butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full flavor yellow hard •Thang điể điểm không cấ cấu trú trúc từ từ 0-100 mm NHDzung – Lesson 1, slide 23

NHDzung – Lesson 1, slide 24

4

Summary Measures Population Parameters Sample Statistics judge

session

product

sour

bitterness

umami

S1

1

m1

50

18

0

salty 40

S2

1

m1

100

65

40

100

S3

1

m1

32

11

35

4

S4

1

m1

30

10

25

1

S5

1

m1

60

23

30

29

S6

1

m1

30

35

25

50

S7

1

m1

50

32

45

64

S8

1

m1

32

23

40

40

S9

1

m1

78

27

45

21

S10

1

m1

55

30

34

18

S11

1

m1

62

21

43

32

Measures of Variability

Measures of Central Tendency

• Range • Variance • Standard Deviation

• Median • Mode • Mean l

NHDzung – Lesson 1, slide 25

NHDzung – Lesson 1, slide 26

1-3. Measures of Central Tendency or Location • Median

â Middle value when sorted in order of magnitude â 50th percentile

• Mode

â Most frequentlyoccurring value

• Mean

Other summary measures: – Skewness – Kurtosis

Arithmetic Mean or Average The mean of a set of observations is their average - the sum of the observed values divided by the number of observations. Sample Mean

Population Mean N

µ=

â Average

n

∑x

x=

i =1

N

NHDzung – Lesson 1, slide 27

i =1

n

NHDzung – Lesson 1, slide 28

Arithmetic Mean or Average

Median Robust parameter of central tendency Non affected by outliers

Affected by outliers

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

∑x

0 1 2 3 4 5 6 7 8 9 10 12 14

0 1 2 3 4 5 6 7 8 9 10 12 14

Means = 5

NHDzung – Lesson 1, slide 29

Means = 6

Median = 5

Median = 5

NHDzung – Lesson 1, slide 30

5

Mode

Measures of Central Tendency or Location

x =

1 n

x =

Ø Mean :

1 n

k

∑nx i

i

n

∑x i =1

i

=

x1 + x 2 + K + x n n

n1 x1 + n2 x 2 + K + nk x k n

=

i =1

Sample size 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

0 1 2 3 4 5 6

Mode = 9

med ( x ) = x ( p + 1)

Ø Median :

Without Mode

= NHDzung – Lesson 1, slide 31

Mean or Median ?

x ( p ) + x ( p + 1) 2

si

n = 2p + 1

si

n = 2p

NHDzung – Lesson 1, slide 32

Quartiles The value of the boundary at the 25th, 50th, or 75th percentiles of a frequency distribution divided into four parts, each containing a quarter of the population

Ÿ Outliers : median

25%

Ÿ Many of « ex aequo » (variable discrete) : mean

25%

( Q1 )

25%

( Q3 )

( Q2 )

Position of ith quartile

Position Position of Q1 =

25%

( Qi ) =

1 ( 9 + 1) 4

= 2.5

Q1 =

i ( n + 1) 4

(12 + 13 ) = 12.5 2

Data classified in increasing order : 11 12 13 16 16 17 18 21 22 NHDzung – Lesson 1, slide 33

1-4. Measures of Variability or Dispersion Range • Difference between maximum and minimum values Variance • Mean* squared deviation from the mean Standard Deviation • Square root of the variance

NHDzung – Lesson 1, slide 34

Dispersion Range ( x ) = x( n ) − x (1)

Ø Range :

Range = 12 - 7 = 5

7

8

9

10

11

Range = 12 - 7 = 5

7

12

8

9

10

11

12

q0.75 − q0.25 Ø Intervalle interquartile : ∗

Definitions of population variance and sample variance differ slightly . NHDzung – Lesson 1, slide 35

NHDzung – Lesson 1, slide 36

6

Mean (average)

Variation

Given a series of values xi (i = 1, … , n): n): x1, x2, …, xn, the mean is: 1 n x=

n

∑ xi i =1

Study 1: 1: the color scores of 6 consumers are: 6, 7, 8, 4, 5, and 6. The mean is: n

x=

1 6 + 7 + 8 + 4 + 5 + 6 36 = =6 ∑ xi = 6 6 n i =1

Study 2: 2: the color scores of 4 consumers are: 10, 2, 3, and 9. The mean is: 1 n 10 + 2 + 3 + 9 24

x=

∑ xi = n i =1

=

4

4

The mean does not adequately describe the data. We need to know the variation in the data. An obvious measure is the sum of difference from the mean: For study 1, the scores 6, 7, 8, 4, 5, and 6, we have: (6(6-6) + (7(7-6) + (8(8-6) + (4(4-6) + (5(5-6) + (6(6-6) =0+1+2–2–1+0 =0

=6

NOT SATISFACTORY!

NHDzung – Lesson 1, slide 37

NHDzung – Lesson 1, slide 38

Sum of squares

Variance

We need to make the difference positive by squaring them. This is called “Sum of squares” squares” (SS) For study 1: 6, 7, 8, 4, 5, 6, we have: SS = (6(6-6)2 = (5-6)2 + (6(4-6)2 + (5(8-6)2 + (4(7-6)2 + (8(6-6)2 + (710 For study 2: 10, 2, 3, 9, we have: SS= (10(9-6)2 = 50 (3-6)2 + (9(2-6)2 + (3(10-6)2 + (2-

We have to divide the SS by sample size n. But in each square we use the mean to calculate the square, so we lose 1 degree of freedom. Therefore the correct denominator is n-1. This is called variance (denoted by s2)

s2 =

s2 =

NHDzung – Lesson 1, slide 39

Variance - example

n

σ=

∑ (x − x) i =1

N N

=

s = 2

i =1

∑x

2

i =1

( x) −

N ∑ i =1

n

=

N

σ

2

s= NHDzung – Lesson 1, slide 41

s2 =

2

(n − 1)

2

N

For study 1: 6, 7, 8, 4, 5, and 6, the variance is:

Sample Variance

N

σ2 =

1 n 2 ∑ ( xi − x ) n − 1 i =1 NHDzung – Lesson 1, slide 40

1-5. Variance and Standard Deviation ∑ (x − µ)2

n −1

Or, in the sum notation:

This is better! But it does not take into account sample size n.

Population Variance

(x1 − x )2 + (x 2 − x )2 + ... + (x n − x )2

( )

∑x − 2

i =1

n ∑x i =1

n

(n − 1) s

2

2

(6 − 6 )2 + (7 − 6 )2 + (8 − 6 )2 + (5 − 6 )2 + (6 − 6 )2 6 −1

=

10 =2 5

For study 2: 10, 2, 3, 9, the variance is: s2 =

(10 − 6 )2 + (2 − 6 )2 + (3 − 6 )2 + (9 − 6 )2 4 −1

=

50 = 16 .7 3

The scores in study 2 were much more variable than those in study 1. NHDzung – Lesson 1, slide 42

7

Standard deviation

Standard Deviation

The problem with variance is that it is expressed in unit squared, squared, whereas the mean is in the actual unit. We need a way to convert variance back to the actual unit of measurement.

Data A 11

12 13 14

Mean = 15.5 s = 3.338

15 16 17 18

19 20 21

15 16 17 18

19 20 21

Mean = 15.5 s = .9258

15 16 17 18

19 20 21

Mean = 15.5 s = 4.57

Data B

We take the square root of variance – this is called “standard deviation” deviation” (denote by s)

11

12 13 14

Data C

For study 1, s = sqrt(2) = 1.41 For study 2, s = sqrt(16.7) = 4.1

11

12 13 14

NHDzung – Lesson 1, slide 43

NHDzung – Lesson 1, slide 44

1-6 Form indicators: Skewness & Kurtosis

Skewness

Skewness

Skewed to left

• Measure of asymmetry of a frequency distribution

• Skewed to left • Symmetric or unskewed • Skewed to right Kurtosis

Mean < median < mode

• Measure of flatness or peakedness of a frequency distribution

• Platykurtic (relatively flat) • Mesokurtic (normal) • Leptokurtic (relatively peaked)

F re q ue nc y

3 0

2 0

1 0

0 1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

x

NHDzung – Lesson 1, slide 45

NHDzung – Lesson 1, slide 46

Kurtosis

Kurtosis Mesokurtic - not too flat and not too peaked

Platykurtic - flat distribution 7 0 0 5 0 0 6 0 0 4 0 0

F re q u e n c y

F re q u e n c y

5 0 0 4 0 0 3 0 0

3 0 0

2 0 0

2 0 0 1 0 0 1 0 0 0

0 - 3 .5

- 2 .7

- 1 .9

- 1 .1

- 0 .3

X

NHDzung – Lesson 1, slide 47

0 .5

1 .3

2 .1

2 .9

3 .7

-4

-3

-2

-1

0

1

2

3

4

X

NHDzung – Lesson 1, slide 48

8

Quantitative variable

Diagram

NHDzung – Lesson 1, slide 49

NHDzung – Lesson 1, slide 50

Quantitative variable : boxplot

Quantitative variable

x x

If we want to see in detail: 21 freq. between 1.65 m & 1.70 m distribute in 8 in [1.65 ; 1.675] & 13 in [1.675 ; 1.70]

Plus grande valeur inférieure à q 0.75 +1.5(q 0.75 - q 0.25) q 0.75 Median q 0.25 Plus petite valeur supérieure à q 0.25 -1.5(q 0.75 - q 0.25)

?

x

Boîte à moustaches NHDzung – Lesson 1, slide 51

NHDzung – Lesson 1, slide 52

Principes of good « figure »

Form indicators γ1 < 0 Asymetry

γ1 > 0 Symetry

Asymetry

§Biể Biểu diễ diễn kết quả quả phứ phức tạp một cách rõ ràng, ng, chí chính xác và hiệ hiệu quả quả §Trì Trình bày nhiề nhiều ý tưở tưởng một cách hiệ hiệu quả quả nhấ nhất §Không nói dối !

Q1

Q 2 Q3

Q1 Q 2Q3

NHDzung – Lesson 1, slide 53

Q1 Q2

Q3

NHDzung – Lesson 1, slide 54

9

A BAD figure Digestion interactions of coral

Freq.

A

o cr

po

ri

da

e

P

i or

te

s

(M

)

M

us

si

da

e A

y lc

on

ac

ea

ns

P

i or

te

s

(B

120 110 100 90 80 70 60 50 40 30 20 10 0

) A

a lg

e F

i av

id

ae

Wins

S

n po

ge

Figure 3. Digestion interactions for coral taxa sampled at Pioneer Bay, Orpheus Island

s

Losses

Frequency

Fig.

A GOOD figure

60

Wins

50 40

Losses

30 20 10 0

op cr A

ae id or

) (M es rit o P

ae sid us M

an ce na yo lc A

s

( es rit Po

B)

ae lg A

ae id vi Fa

o Sp

es ng

Taxon

NHDzung – Lesson 1, slide 55

NHDzung – Lesson 1, slide 56

10

Related Documents

Handout1
November 2019 14
804 Handout1
November 2019 70