Mathematical Sciences Foundation

  • Uploaded by: manveeriilm21
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Mathematical Sciences Foundation as PDF for free.

More details

  • Words: 3,354
  • Pages: 73
Mathematical Sciences Foundation www.mathscifoun d.org Copyright © Mathematical Sciences Foundation

1

Descriptive Statistics

Descriptive statistics includes statistical methods involving collection, presentation, characterization of a set of data in order to describe the various features of that set of data. In general, methods of descriptive statistics include graphic methods and numerical measures. Bar charts, line graphs etc. comprise the graphic methods, whereas numerical measures include Copyright © Mathematical Sciences Foundation

2

Measures of Central Tendency/ Statistical Averages

Copyright © Mathematical Sciences Foundation

3

Averages condense the information contained in a data set into a single number. • This number is helpful in taking overview of statistical data. • This number is helpful in making comparison between two or more data sets.

Copyright © Mathematical Sciences Foundation

4

Characteristics of a Good Average • It should be easy to calculate. • It should be easy to comprehend. • It should not be affected too much by fluctuations of the sample.

Copyright © Mathematical Sciences Foundation

5

Statistics Statistics is the science of collecting, describing and interpreting data.

Copyright © Mathematical Sciences Foundation

6

Measures of Central Tendency

Mean

Median

Copyright © Mathematical Sciences Foundation

Mode

7

Mean (Arithmetic Mean) It is the sum of observations divided by the total number of observations. Mathematical ly,

x1 + x2 + ... + xn x= n

Copyright © Mathematical Sciences Foundation

8

Arithmetic mean is affected by extreme values. Consider a situation where two samples differ in only one value Data Set 1 Data Set 2 6 6 10 10 5 38 7 7 4 4 8 8 Arithmetic Mean 6.667 12.167 Copyright © Mathematical Sciences Foundation

Samples differing in one value

9

If we delete the extreme value: 38 from Data Set 2, then the new arithmetic mean is 27.000. Data Set

Arithmetic Mean

Data Set 1 Summary: Mean

6.667

Data Set 2 12.167

6 10 7 4 8 7.000

after deleting an extreme value from the Data Set 2 7.000

This gives motivation for another measure called Copyright © Mathematical Sciences “Trimmed Mean” Foundation

10

Trimmed Mean It is the mean taken by excluding a percentage of data points from the top and bottom tails of a data set. Note: Trimmed Mean should be calculated when one wishes to exclude outlying data from the analysis. Copyright © Mathematical Sciences Foundation

11

Median It is the value of the data that occupies the middle position when the data is arranged in increasing or decreasing order.

Copyright © Mathematical Sciences Foundation

12

Median Consider a data of size n. Arrange the data in increasing or decreasing order. The median is calculated in the following way: If n is odd: Median will n +1  th term.   2 be the   If n is even: Median will be the  n  th and   mean of 2  n   + 1  th terms. 2 

Copyright © Mathematical Sciences Foundation

13

For example if we need to find the median for the data set: 20,13,16,17,11,19,12,18 Ranked data: 11, 12, 13, 16, 17, 18 ,19, 20 No. of terms in data = 8 Median = mean of16 + 17 =

2

 8   th and 2

8   + 1  th terms 2 

= 16.5

Copyright © Mathematical Sciences Foundation

14

Not e: The median

is the number in the middle of

an ordered set of numbers (observations); that is, half the numbers have values that are greater than the median, and half have values that are less.

Copyright © Mathematical Sciences Foundation

15

Median is not affected by extreme values. Example: Consider two sets of data Data set 1: Data set 2:

6, 7, 8, 9, 9, 10 6, 7, 8, 9, 9, 1100

In both cases the median is 8.5

Copyright © Mathematical Sciences Foundation

16

Mod e It is the value which occurs most frequently in a set of observations.

Copyright © Mathematical Sciences Foundation

17

Characteristic of Mode • Mode is not affected by extreme values.

Limitation of Mode • Sometimes mode may not be a true representative of a central value of a data set. For example: 2, 3, 4, 5, 6, 10, 10 Copyright © Mathematical Sciences Foundation

18

Comparison: Mean, Median and Mode Mean and Median of a data are unique, whereas a data can have more than one Mode. Exampl e: Consider the data set 1,1,1,2,2,2,3,4,5. The mean is 2.333, median is 2, but there are two modes namely 1 and 2.

Copyright © Mathematical Sciences Foundation

19

Comparison: Mean, Median and Mode Consider the following data: 100, 100, 100, 421, 422, 423,424, 425. Mean = 301.875 Median = 421.5 Mode = 100 In such data median is the best measure of central tendency among the three measures.

Copyright © Mathematical Sciences Foundation

20

Averages in Open Office Calc

Copyright © Mathematical Sciences Foundation

21

What is Open Office Calc? A powerful spreadsheet program that • performs numerical computations • can organize/summarize huge data sets • carries out advanced statistical and financial analysis by solving complicated mathematical models

Copyright © Mathematical Sciences Foundation

22

How to access Open Office Calc

Applications

Office

Openoffice.org Spreadsheet

Copyright © Mathematical Sciences Foundation

23

A First Look at Open Office Calc Input Line

Name Box

Copyright © Mathematical Sciences Foundation

24

Points to note • Rows are numbered as 1,2,3,… • Columns are marked as A,B,C,… • Name box always displays the current selected cell Copyright © Mathematical Sciences Foundation

25

Open Office Calc as a desk calculator You can perform simple operations like addition, multiplication and division. Simply select a cell and in the Input Line enter the expression. Remember to begin the expression with an equal to (=) sign. To compute 5+7 you have Copyright © Mathematical Sciences to enter Foundation

26

Various Arithmetic Operations Operation Addition Subtraction Multiplication Division Raise to power

Symbol + * / ^

Copyright © Mathematical Sciences Foundation

27

Averages using Calc functions

Copyright © Mathematical Sciences Foundation

28

AVERAGE Calculates the arithmetic mean of numeric arguments Syntax: AVERAGE (number1,

number2,...) number1, number2, ...   are numeric arguments for Example: Refer to the worksheet which you want the average

“Averages”

Copyright © Mathematical Sciences Foundation

29

Remarks : AVERAGE • The arguments must either be numbers, arrays or references that contain numbers. • If an array or reference argument contains text, or empty cells, those values are ignored; however, cells with the value zero are included. • Arguments that contain TRUE evaluate as 1; arguments that contain FALSE evaluate as 0 (zero). Copyright © Mathematical Sciences Foundation

30

AVERAGEA Calculates the arithmetic mean of the values in the list of arguments. In addition to numbers, text and logical values such as TRUE and FALSE are also included in the calculation.

Syntax: AVERAGEA (value1,

value2,...) value1, value2, ...   are arguments for Example: Refer to the worksheet which you want “AverageA the”average Copyright © Mathematical Sciences Foundation

31

Remarks: AVERAGEA • The arguments must be numbers, arrays or references. • Array or reference arguments that contain text evaluate as 0 (zero). If the calculation does not include text values in the average, use the AVERAGE function. • Arguments that contain TRUE evaluate as 1; arguments that contain FALSE evaluate as 0 (zero). Copyright © Mathematical Sciences Foundation

32

TRIMMEAN Returns the mean of the interior of a data set. Syntax: TRIMMEAN(array, alpha) Array is the array or range of values to trim and average. Alpha is the fractional number of data points to exclude from the calculation. For example, if percent =Refer 0.2, 4 points areworksheet trimmed from a data Example: to the set of 20 points ( 2 from the top and 2 from the “Trimmean Copyright ” © Mathematical Sciences bottom of the set). 33 Foundation

Remarks: TRIMMEAN • If percent < 0 or percent > 1, TRIMMEAN returns an error value. • TRIMMEAN rounds the number of excluded data points down to the nearest multiple of 2. If percent = 0.1, 10 percent of 30 data points equals 3 points. For symmetry, TRIMMEAN excludes a single value from the top and bottom of the data set. Copyright © Mathematical Sciences Foundation

34

MEDIAN Returns the median of the given numbers.

Syntax: MEDIAN (number1,

number2,...) number1, number2, ...   are numerical arguments for which you want the median

Example: Refer to the worksheet “Averages ” Copyright © Mathematical Sciences Foundation

35

MODE Returns the most frequently occurring or repetitive value in an array or range of data.

Syntax: MODE (number1,

number2,...) number1, number2, ...   are arguments for which you want to calculate mode. Example: Refer to thethe worksheet “Averages ” Copyright © Mathematical Sciences Foundation

36

Measures of Dispersion

Copyright © Mathematical Sciences Foundation

37

Let’s look at an example of three data sets, Observations Mean Data set 1

7

8

10

11

9

9

Data set 2

4

6

9

12

14

9

Data set 3

2

5

9

13

16

9

2

3

4

5

6

7

8

9

1 0

1 1

1 2

1 3

1 4

1 5

1 6

2

3

4

5

6

7

8

9

1 0

1 1

1 2

1 3

1 4

1 5

1 6

2

3

4

5

6

7

8

9

1 1 1 Copyright © Mathematical 0 Sciences 1 2

1 3

1 4

1 5

1 638

Foundation

To capture the sense of the data, we need to measure the central location as well as the spread. This is carried out by the various measures of dispersion. The numerical value of the various measures of dispersion describe the amount of spread, or variability, in the data: These measures will give large values for data which is more spread out and small values for data which Copyright © Mathematical Sciences is less spread out. Foundation

39

Characteristics for an Ideal Measure of Dispersion • It should be easy to calculate and easy to understand. • It should be affected as little as possible by fluctuations of sampling.

Copyright © Mathematical Sciences Foundation

40

Common Measures of Dispersion  RANGE  MEAN DEVIATION  VARIANCE  STANDARD DEVIATION

Copyright © Mathematical Sciences Foundation

41

Ran ge Range is the difference between the largest and the smallest value in the data. It can be determined by: Range = Highest value – Lowest value It gives a quick measurement of Copyright © Mathematical Sciences Foundation the spread.

42

Limitations of Range It does not measure the spread of the majority of data – it only measures the spread between highest and lowest values.

Copyright © Mathematical Sciences Foundation

43

600 500 400 300 200 100 0 0

5

10

15

600

Range in both these distributions is the same i.e. 300.

500 400 300 200 100 0 0

2

4

6

8

10

12

14

Copyright © Mathematical Sciences Foundation

44

Deviations from a Central One way to measure the spread of a data set is to Value xi point measure the distance of each data from a central value, say A (which could xi be meanxior − Amedian or mode). We define the deviation of from A to be . Note: The sum of the deviations about mean is zero and consequently the mean deviation about mean is also zero, which is not a useful statistic. One way to remove neutralizing effect is to45 Copyright ©this Mathematical Sciences Foundation ignore the

Mean Absolute Deviation Mean absolute deviation is mean of the absolute values of the deviations from mean of the data. N

i.e. Mean absolute deviation =

∑ i =1

xi − x , where x is N

mean of the data.

Copyright © Mathematical Sciences Foundation

46

Varian ce The mean

of the squares of deviation about mean is called the variance. N

i.e variance =

∑( xi −x i =1

)

2

N

wherex is the mean and N is the size of the population

Copyright © Mathematical Sciences Foundation

47

Standard Deviation The positive square root of the variance is called standard deviation. i.e. standard deviation or standard deviation

=variance =

∑( xi −x )

2

N

wherex is the mean and N is the size of the population Copyright © Mathematical Sciences Foundation

48

Measures of dispersion using Calc functions

Copyright © Mathematical Sciences Foundation

49

Rang e There is

no built in function to calculate range directly. We can calculate range by taking the difference of the maximum value and the minimum value of the data set. Following formula can be used to calculate range:

= MAX(value1,value2,…) MIN(value1, value2…)

-

Example: Refer to the worksheet “Dispersion ” Copyright © Mathematical Sciences Foundation

50

AVEDEV Returns the average of the absolute deviations of data points from their mean. Syntax: AVEDEV ( number1,

number2 , …. ) number1, number2, ...   are 1 to 30 arguments for which you want the average of the absolute Example: Refer to the worksheet deviations “Dispersion ” Copyright © Mathematical Sciences Foundation

51

VARP Calculates variance based on the entire population. Syntax: VARP ( number1, number2,

……. ) number1, number2, ...   are 1 to 30 number arguments corresponding to a population. Example: Refer to the worksheet “Dispersion ” Copyright © Mathematical Sciences Foundation

52

VARPA Calculates variance based on the entire population. In addition to numbers, text and logical values such as TRUE and FALSE are included in the calculation.

Syntax: VARPA ( value1, value2, …….

) value1, value2, ...   are 1 to 30 value arguments corresponding to a sample of a population Copyright © Mathematical Sciences Foundation

53

STDEVP Calculates standard deviation based on the entire population given as arguments.

Syntax: STDEVP (number1,

number2, ……. ) number1, number2, ...   are 1 to 30 number arguments corresponding to a population. Example: Refer to the worksheet “Dispersion ” Copyright © Mathematical Sciences Foundation

54

MEASURES OF POSITION

Copyright © Mathematical Sciences Foundation

55

PERCENT ILE , x ,..., x Consider the data xset

. Percentiles are the numbers which divide the ordered data set in 100 equal sized data subsets. For any data set, there are 99 percentiles denoted P1 , P2 ,..., P99by . 1

2

n

P2 For instance, ,the second percentile, is a number such that at most 2% of the data points are less than it and at most 98% of the data points are greater than it. Copyright © Mathematical Sciences Foundation

56

How to find percentile of a data set? Supposex1 , x2 ,..., x101 is a data set arranged x1 ≤ xorder, in increasing i.e., 2 ≤ ... ≤ x 100 ≤ x101 . Here P1 = x2 because at most 1% of the data points arex2less than and at most 99% of x2 are the data points more than . P20 = x21 because at most 20% of the data points arex21less than and at most 80% of x21 the data points are more than . Copyright © Mathematical Sciences Foundation

57

How to find percentile of a data set? Supposex1 , x2 ,..., x10 is a data set arranged x1 ≤ xorder, x10 in increasing . 2 ≤ ... ≤ i.e., Here we do not have data points that can divide the data set into 100 equal parts. In such a situation, percentiles are calculated in the following way:

Copyright © Mathematical Sciences Foundation

58

x1 x2

x3

x4 x5

x6

x7 x8

x9 x10

Here we have 9 intervals. The complete data constitutes 100%. We distribute this 100% over 9 intervals so that each interval contains 100% ≈ 11.1% 9 11.1% 11.1% 11.1% 11.1% 11.1% 11.1% 11.1% 11.1%

x1 x2

x3

x4 x5

x6

x7 x8

Copyright © Mathematical Sciences Foundation

11.1%

x9 x10

59

Hence, x2 = P11.1 , x3 = P22.2 , x4 = P33.3 ,... P20 Suppose we want to find . x3 = P22.2 As x2 = P11.1 and lies between x2 and x3 P20 of To find the exact value following steps:

P20 , therefore, we follow the

Copyright © Mathematical Sciences Foundation

60

Step 1: Count the number of intervals between the data points. If there are n data points, then there will be n-1 intervals. In above example there are 10 – 1 = 9 intervals. x1 x2

x3

x4 x5

x6

x7 x8

Copyright © Mathematical Sciences Foundation

x9 x10

61

Pm Step 2: To find

we calculate the number ( n − 1) p= m 100 p as sum of its integer part i and and write fractional part f.p = i + f In our example, we wish toP20 find . ( 10 − 1) p= × 20 Henc 100 e, = 1.8 = 1 + 0.8 Thus,

i = 1, f = 0.8 Copyright © Mathematical Sciences Foundation

62

m th Step 3: The

Pm percentile

is given by

Pm = xi +1 + f ( xi + 2 − xi +1 ) Thus, in our example P20 = x1+1 + 0.8 ( x1+ 2 − x1+1 ) = x2 + 0.8 ( x3 − x2 )

Copyright © Mathematical Sciences Foundation

63

Example: Find 20th percentile of the data set 12, 13, 15, 18, 19, 20, 23, 24, 29 Step 1: There are 9 data points. Thus number of intervals = 9-1 = 8. P20 Step 2: To calculate we find the number ( 9 − 1) p= × 20 = 1.6 = 1 + 0.6 100 f = 0.6 Thus,i = 1 and . Copyright © Mathematical Sciences Foundation

64

Example: Find 20th percentile of the data set 12, 13, 15, 18, 19, 20, 23, 24,P29 Step 3: Thus is 20 given by P20 = x1+1 + 0.6 ( x1+ 2 − x1+1 ) = x2 + 0.6 ( x3 − x2 ) = 13 + 0.6 ( 15 − 13 ) = 14.2 Copyright © Mathematical Sciences Foundation

65

Quartile P25 , Pi.e., Consider 25th , 50th and 75th percentiles 50 andpercentiles divide the ordered P75 . These setparts. into These percentiles are fourdata equal known as Quartiles. Q1 P25 is known as first quartile and is denoted by . P50 is known as second quartile and is Q2 denoted by . It is also equal to the median. Q3 P75 is known as third quartile and is denoted by . Copyright © Mathematical Sciences 66 Foundation

Percentiles using Open Office Calc functions

Copyright © Mathematical Sciences Foundation

67

PERCENTILE Returns the kth percentile of values in a range.

yntax:

PERCENTILE ( data, alpha )

data is the range of data alpha is the percentile value in the range 0…1, incl Note: For 1st percentile, alpha = 0.01, for 15th percentile, alpha = 0.15 and so on. Example: Refer to the worksheet “Percentile ” Copyright © Mathematical Sciences Foundation

68

QUARTILE Returns the quartile of a data set.

yntax:

QUARTILE (array, quart)

array is the array or cell range of numeric values for which the quartile value is to be calculated.

quart indicates the quartile to be calculated. Note: For 1st quartile, quart = 1; for 2nd quartile, quart = 2 and for 3rd quartile, quart = 3. Example: Refer to the worksheet “Percentile ” © Mathematical Sciences Copyright Foundation

69

Histogram A histogram is a graphical display based on the frequency table.

Copyright © Mathematical Sciences Foundation

70

FREQUENCY function in OPEN OFFICE CALC Class Interval

Classes

<=7000

7000

7000-7500

7500

7500-8000

8000

8000-8500

8500

8500-9000

9000

9000-9500

9500

9500-10000

10000

10000-10500

10500

10500-11000

11000

11000-11500

11500

11500-12000

12000

12000-12500

12500

Frequency function can be used to construct frequency table.

Copyright © Mathematical Sciences Foundation

71

FREQUENCY function (cont..)

1

2

3. Select the data range and class range

4 Copyright © Mathematical Sciences Foundation

72

FREQUENCY function (cont..) Class Classes Frequen Interval cy <=7000 7000 2 7000-7500 7500 1 7500-8000 8000 3 8000-8500 8500 0 8500-9000 9000 12 9000-9500 9500 15 9500-10000 10000 23 10000-10500 10500 20

Bar graph of frequency table 25

20

15

10

10500-11000

11000

15

5

11000-11500

11500

4

0

11500-12000

12000

3

12000-12500

12500

2

7000 7500 8000 8500 9000 9500 100001050011000115001200012500

Copyright © Mathematical Sciences Foundation

73

Related Documents


More Documents from "Mao"