5. Basic Stats, Hypothesis And Normal Distribution.xlsx

  • Uploaded by: Ankur Sharma
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View 5. Basic Stats, Hypothesis And Normal Distribution.xlsx as PDF for free.

More details

  • Words: 3,955
  • Pages: 26
Types of Variables

Quantitative variables Eg. Number of childern in family Score in maths Sales in $ for a product Height/weight of student per capita income

Continuous Infinite number of values E.g sales in $, per capita income Weight/height of student etc.

Descrete Limited # of values e.g # of childeren in family, number of cars in city

Qualitative Variables/ classification variable/ categorical variable Defined or limited number of levels e.g Gender, Size of Tshirt, Winning position, colors

Nominal Can not order E.g Gender, color

Ordinal Order e.g size of t shirt - S<M<XL<XXL Winning position: 1>2nd runner up > 3rd runner up

Score data for a class id

Gender

race

ses

schtyp

prog

read

write

math

science

socst

70

Male

4

1

1

1

57

52

41

47

57

121 86

Female Male

4 4

2 3

1 1

3 1

68 44

59 33

53 54

63 58

61 31

141 172 113 50 11 84 48 75 60 95 104 38 115 76 195 114 85 167 143 41 20 12 53 154 178 196 29 126 103 192 150 199 144 200 80 16 153 176 177

Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male

4 4 4 3 1 4 3 4 4 4 4 3 4 4 4 4 4 4 4 3 1 1 3 4 4 4 2 4 4 4 4 4 4 4 4 1 4 4 4

3 2 2 2 2 2 2 2 2 3 3 1 1 3 2 3 2 2 2 2 3 2 2 3 2 3 1 2 3 3 2 3 3 2 3 1 2 2 2

1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 2 1 2 1 1 1 2 2

3 2 2 1 2 1 2 3 2 2 2 2 1 2 1 2 1 1 3 2 2 3 3 2 3 2 1 1 2 2 3 2 1 2 2 3 3 2 2

63 47 44 50 34 63 57 60 57 73 54 45 42 47 57 68 55 63 63 50 60 37 34 65 47 44 52 42 76 65 42 52 60 68 65 47 39 47 55

44 52 52 59 46 57 55 46 65 60 63 57 49 52 57 65 39 49 63 40 52 44 37 65 57 38 44 31 52 67 41 59 65 54 62 31 31 47 59

47 57 51 42 45 54 52 51 51 71 57 50 43 51 60 62 57 35 75 45 57 45 46 66 57 49 49 57 64 63 57 50 58 75 68 44 40 41 62

53 53 63 53 39 58 50 53 63 61 55 31 50 50 58 55 53 66 72 55 61 39 39 61 58 39 55 47 64 66 72 61 61 66 66 36 39 42 58

56 61 61 61 36 51 51 61 61 71 46 56 56 56 56 61 46 41 66 56 61 46 31 66 46 46 41 51 61 71 31 61 66 66 66 36 51 51 51

168 40 62 169 49 136 189 7 27 128 21 183 132 15 67 22 185 9 181 170 134 108 197 140 171 107 81 18 155 97 68 157 56 5 159 123 164 14 127 165 174 3 58 146 102

Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male

4 3 4 4 3 4 4 1 2 4 1 4 4 1 4 1 4 1 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 1 4 4 4 1 4 4 4 1 4 4 4

2 1 3 1 3 2 2 2 2 3 2 2 2 3 1 2 2 2 2 3 1 2 3 2 2 1 1 2 2 3 2 2 2 1 3 3 2 3 3 1 2 1 2 3 3

1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1

2 1 1 1 3 2 2 2 2 2 1 2 2 3 3 3 2 3 2 2 1 1 2 3 2 3 2 3 1 2 2 1 3 2 2 1 3 2 2 3 2 2 3 2 2

52 42 65 55 50 65 47 57 53 39 44 63 73 39 37 42 63 48 50 47 44 34 50 44 60 47 63 50 44 60 73 68 55 47 55 68 31 47 63 36 68 63 55 55 52

54 41 65 59 40 59 59 54 61 33 44 59 62 39 37 39 57 49 46 62 44 33 42 41 54 39 43 33 44 54 67 59 45 40 61 59 36 41 59 49 59 65 41 62 41

57 43 48 63 39 70 63 59 61 38 61 49 73 44 42 39 55 52 45 61 39 41 50 40 60 47 59 49 46 58 71 58 46 43 54 56 46 54 57 54 71 48 40 64 51

55 50 63 69 49 63 53 47 57 47 50 55 69 26 33 56 58 44 58 69 34 36 36 50 55 42 65 44 39 58 63 74 58 45 49 63 39 42 55 61 66 63 44 63 53

51 41 66 46 47 51 46 51 56 41 46 71 66 42 32 46 41 51 61 66 46 36 61 26 66 26 44 36 51 61 66 66 51 31 61 66 46 56 56 36 56 56 41 66 56

117 133 94 24 149 82 8 129 173 57 100 1 194 88 99 47 120 166 65 101 89 54 180 162 4 131 125 34 106 130 93 163 37 35 87 73 151 44 152 105 28 91 45 116 33

Male Male Male Male Male Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female

4 4 4 2 4 4 1 4 4 4 4 1 4 4 4 3 4 4 4 4 4 3 4 4 1 4 4 1 4 4 4 4 3 1 4 4 4 3 4 4 2 4 3 4 2

3 2 3 2 1 3 1 1 1 2 3 1 3 3 3 1 3 2 2 3 1 1 3 2 1 3 1 3 2 3 3 1 1 1 2 2 2 1 3 2 2 3 1 2 1

1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1

3 3 2 2 1 2 2 1 1 2 2 3 2 2 1 2 2 2 2 2 3 1 2 3 2 2 2 2 3 1 2 2 3 1 1 2 3 3 2 2 1 3 3 2 2

34 50 55 52 63 68 39 44 50 71 63 34 63 68 47 47 63 52 55 60 35 47 71 57 44 65 68 73 36 43 73 52 41 60 50 50 47 47 55 50 39 50 34 57 57

49 31 49 62 49 62 44 44 62 65 65 44 63 60 59 46 52 59 54 62 35 54 65 52 50 59 65 61 44 54 67 57 47 54 52 52 46 62 57 41 53 49 35 59 65

39 40 61 66 49 65 52 46 61 72 71 40 69 64 56 49 54 53 66 67 40 46 69 40 41 57 58 57 37 55 62 64 40 50 46 53 52 45 56 45 54 56 41 54 72

42 34 61 47 66 69 44 47 63 66 69 39 61 69 66 33 50 61 42 50 51 50 58 61 39 46 59 55 42 55 58 58 39 50 50 39 48 34 58 44 50 47 29 50 54

56 31 56 46 46 61 48 51 51 56 71 41 61 66 61 41 51 51 56 56 33 56 71 56 51 66 56 66 41 46 66 56 51 51 56 56 46 46 61 56 41 46 26 56 56

66 72 77 61 190 42 2 55 19 90 142 17 122 191 83 182 6 46 43 96 138 10 71 139 110 148 109 39 147 74 198 161 112 69 156 111 186 98 119 13 51 26 36 135 59

Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female

4 4 4 4 4 3 1 3 1 4 4 1 4 4 4 4 1 3 3 4 4 1 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 4 4 1 3 2 3 4 4

2 2 1 3 2 2 2 2 1 3 2 2 2 3 2 2 1 1 1 3 2 2 2 2 2 2 2 3 1 2 3 1 2 1 2 1 2 1 1 2 3 3 1 1 2

1 1 1 1 2 1 1 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1

3 3 2 2 2 3 3 2 1 2 3 2 2 2 3 2 2 2 2 2 3 1 1 2 3 3 1 2 2 2 2 2 2 3 2 1 2 3 1 3 1 2 1 2 2

68 42 61 76 47 46 39 52 28 42 47 47 52 47 50 44 47 45 47 65 43 47 57 68 52 42 42 66 47 57 47 57 52 44 50 39 57 57 42 47 42 60 44 63 65

62 54 59 63 59 52 41 49 46 54 42 57 59 52 62 52 41 55 37 54 57 54 62 59 55 57 39 67 62 50 61 62 59 44 59 54 62 60 57 46 36 59 49 60 67

56 47 49 60 54 55 33 49 43 50 52 48 58 43 41 43 46 44 43 61 40 49 56 61 50 51 42 67 53 50 51 72 48 40 53 39 63 51 45 39 42 62 44 65 63

50 47 44 67 58 44 42 44 44 50 39 44 53 48 55 44 40 34 42 58 50 53 58 55 54 47 42 61 53 51 63 61 55 40 61 47 55 53 50 47 31 61 35 54 55

51 46 66 66 46 56 41 61 51 52 51 41 66 61 31 51 41 41 46 56 51 61 66 71 61 61 41 66 61 58 31 61 61 31 61 36 41 37 43 61 39 51 51 66 71

78 64 63 79 193 92 160 32 23 158 25 188 52 124 175 184 30 179 31 145 187 118

Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female

4 4 4 4 4 4 4 2 2 4 2 4 3 4 4 4 2 4 2 4 4 4

2 3 1 2 2 3 2 3 1 2 2 3 1 1 3 2 3 2 2 2 2 2

1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 2 2 1 2 1

2 3 1 2 2 1 2 3 2 1 1 2 2 3 1 3 2 2 1 3 1 1

39 50 52 60 44 52 55 50 65 52 47 63 50 42 36 50 41 47 55 42 57 55

54 52 65 62 49 67 65 67 65 54 44 62 46 54 57 52 59 65 59 46 41 62

54 45 60 49 48 57 55 66 64 55 42 56 53 41 42 53 42 60 52 38 57 58

53 58 56 50 39 63 50 66 58 53 42 55 53 42 50 55 34 50 42 36 55 58

41 36 51 51 51 61 61 56 71 51 36 61 66 41 41 56 51 56 56 46 52 61

137

Female

4

3

1

2

63

65

65

53

61

Variable types Qualitative variable Continuous variable

Read Scores

Mean 52.23

Obs

Average

Variable Gender

Middle Value Value with maximum replication

Score of read, write, math, science

Read Variable : Measures of Central Tendancy Median Mode 50 47

Average of deviation of each observation from mean of data.

Dispersion STD 10.25

For eg the first observation of read has value of 57 and mean of read score is 52.23. So the difference 57-52.23 = ~5. In STD all deviations are taken from each data point w.r.t mean and the average of such deviations is called STD

Read Asc Order 1

28

2

31

3

34

4

34

5

34

6

34

7

34

8

34

9

35

10

36

Q1(25%)

44 =200*0.25=50th

11

36

Q2/Median (50%)

50 =200*0.5=100th

12

36

Q3 (75%)

60 =200*0.75=150th

13

37

14

37

15

39

16

39

17

39

18

39

19

39

20

39

21

39

22

39

23

41

24

41

25

42

26

42

27

42

28

42

29

42

30

42

Percentiles/ Quartiles

Value

Value of Obs number

31

42

32

42

33

42

34

42

35

42

36

42

37

42

38

43

39

43

40

44

41

44

42

44

43

44

44

44

45

44

46

44

47

44

48

44

49

44

50

44

51

44

52

44

53

45

54

45

55

46

56

47

57

47

58

47

59

47

60

47

61

47

62

47

63

47

64

47

65

47

66

47

67

47

68

47

69

47

70

47

71

47

72

47

73

47

74

47

75

47

Q1 (25% quartile)

76

47

77

47

78

47

79

47

80

47

81

47

82

47

83

48

84

50

85

50

86

50

87

50

88

50

89

50

90

50

91

50

92

50

93

50

94

50

95

50

96

50

97

50

98

50

99

50

100

50

101

50

102

52

103

52

104

52

105

52

106

52

107

52

108

52

109

52

110

52

111

52

112

52

113

52

114

52

115

52

116

53

117

54

118

55

119

55

120

55

Median or Q2 (50% quartile)

121

55

122

55

123

55

124

55

125

55

126

55

127

55

128

55

129

55

130

55

131

57

132

57

133

57

134

57

135

57

136

57

137

57

138

57

139

57

140

57

141

57

142

57

143

57

144

57

145

60

146

60

147

60

148

60

149

60

150

60

151

60

152

60

153

60

154

61

155

63

156

63

157

63

158

63

159

63

160

63

161

63

162

63

163

63

164

63

165

63

Q3 (75% quartile)

166

63

167

63

168

63

169

63

170

63

171

65

172

65

173

65

174

65

175

65

176

65

177

65

178

65

179

65

180

66

181

68

182

68

183

68

184

68

185

68

186

68

187

68

188

68

189

68

190

68

191

68

192

71

193

71

194

73

195

73

196

73

197

73

198

73

199

76

200

76

Score data for a class id

Gender

race

ses

schtyp

prog

read

write

70

Male

4

1

1

1

57

52

Variable types

Variable

Qualitative variable Continuous variable

Gender Score of read, write, math, science

Average of deviation of each observation from mean of data.

Average

For eg the first observation of read has value of 57 and mean of read score is 52.23. So the difference 5752.23 = ~5. In STD all deviations are taken from each data point w.r.t mean and the average of such deviations is called STD

Middle Value

Read Scores

Mean 52.23

Obs

Read Variable : Measures of Central Tendancy Median Mode 50 47

Read Asc Order

Dispersion STD 10.25

Value with maximum replication

1

28

2

31

3

34

4

34

5

34

6

34

7

34

8

34

9

35

10

36

Q1(25%)

44 =200*0.25=50th

11

36

Q2/Median (50%)

50 =200*0.5=100th

12

36

Q3 (75%)

60 =200*0.75=150th

13

37

48

44

49

44

50

44

51

44

99

50

100

50

101

50

102

52

148

60

149

60

150

60

151

60

Percentiles/ Quartiles

Q1 (25% quartile)

Median or Q2 (50% quartile)

Q3 (75% quartile)

Value

Value of Obs number

n of each mean of data.

ervation of read d mean of read he difference 57D all deviations h data point w.r.t age of such d STD

math

science

socst

41

47

57

C13:

Average

D13:

Middle Value

E13:

Value with maximum replication

F13:

Average of deviation of each observation from mean of data. For eg the first observation of read has value of 57 and mean of read score is 52.23. So the difference 57-52.23 = ~5. In STD all deviations are taken from each data point w.r.t mean and the average of such deviations is called STD

Normal distribution or a bell curve Confidence Interval, which represents the range of data distribution. Used in hypothesis testing, if a value lies in this range the value belongs to this distribution

Mean = Median = Mode

- H0 and H1 hypothesis - Critical region - Significance level (5%) - Type I & II err - one tail and 2 tail test

Z statistic Confidence Interval p- value

-1.96 30.4 2.5% Mean Std Dev

+1.96 69.6 2.5%

Confidence Interval 50 10

Standard Error = Std Dev/ √n

Problem Statement: Is Sample Mean representative of Population Sample Mean score Hypothesized of Read Subject population Mean 52.23 51

Z Statistic =(52.23-51)/ (10/SQRT(200))

Z Stat cut 1.739483 off

ean = dian = Mode

Key Highlights of normal distribution - Data is plotted on X axis and Y axis represents frequency - Data distribution: 68% of data lies in +1 Stdev and -1 Stdev w.r.t mean and so on…. - Mean = Median = Mode - While testing Hypothesis data beyond -2 stdev and +2 stdev is considered to be not of the distribution and H 0 is rejected.

Read Score 57 68 44 63 47 44 50 34 63 57 60 57 73 54 45 42 47 57 68 55 63 63 50 60 37 34 65 47 44 52 42 76 65 42 52 60 68 65 47 39 47

Steps in creating bell curve with an example 1. Take out mean and Stdev from a distribution 2. Calculate Mean - 1 Stdev, Mean -2 Stdev, Mean -3 Stdev 3. Similarlyy Calculate Mean + 1 Stdev, Mean +2 Stdev, Mean +3 Stdev 4. Plot frequency against point 2 and 3 from data Frequency

Normal distribution Standards Value

`

Mean - 3 Stdev

21.47118952

Mean - 2 Stdev

31.72412635

Mean - 1 Stdev

41.97706317

Mean

52.23

Mean + 1 Stdev

62.48293683

Mean + 2 Stdev

72.73587365

Mean + 3 Stdev

82.98881048

0 2 22 91 39 39 7

Go to data tab -> Data analysis -> Histogram and select data range and bin range to populate frequenc`y

Rejection area. H0 is rejected for values at this level i.e between -2 to -3 Stdev. Here Frequency = 0

~ Normally distributed data 100 90 80 70 60 50 40 30 20 10 0

Frequency

55 52 42 65 55 50 65 47 57 53 39 44 63 73 39 37 42 63 48 50 47 44 34 50 44 60 47 63 50 44 60 73 68 55 47 55 68 31 47 63 36 68 63 55 55 52 34 50

55 52 63 68 39 44 50 71 63 34 63 68 47 47 63 52 55 60 35 47 71 57 44 65 68 73 36 43 73 52 41 60 50 50 47 47 55 50 39 50 34 57 57 68 42 61 76 47

46 39 52 28 42 47 47 52 47 50 44 47 45 47 65 43 47 57 68 52 42 42 66 47 57 47 57 52 44 50 39 57 57 42 47 42 60 44 63 65 39 50 52 60 44 52 55 50

65 52 47 63 50 42 36 50 41 47 55 42 57 55 63

and H 0 is rejected.

to data tab -> Data lysis -> Histogram and ct data range and bin ge to populate uenc`y

Rejection area. H0 is rejected for values at this level i.e between -2 to -3 Stdev. Here Frequency = 0

Frequency

N15:

Rejection area. H0 is rejected for values at this level i.e between -2 to -3 Stdev. Here Frequency = 0

P14:

Go to data tab -> Data analysis -> Histogram and select data range and bin range to populate frequenc`y

id

Gender

race

ses

schtyp

prog

read

write

math

science

121 Female

4

2

1

3

68

59

53

63

82 Female

4

3

1

2

68

62

65

69

8 Female

1

1

1

2

39

44

52

44

129 Female

4

1

1

1

44

44

46

47

173 Female

4

1

1

1

50

62

61

63

57 Female

4

2

1

2

71

65

72

66

100 Female

4

3

1

2

63

65

71

69

1 Female

1

1

1

3

34

44

40

39

194 Female

4

3

2

2

63

63

69

61

88 Female

4

3

1

2

68

60

64

69

99 Female

4

3

1

1

47

59

56

66

Cov = (x-x{mean})* (Y-Y{mean})

socst

Correctes SS Covariance

Covariance and corrected Sum of squares calculation between Maths (x) and Science(y)

61

3.95825

65.163

61

211.88825

1845.183

48

5.06325

-10.707

51

32.22825

-43.407

51

93.15825

-121.017

56

273.87325

998.593

71

314.78825

5839.823

41

162.48825

-1848.567

61

149.64825

1264.023

66

194.73825

2676.843

61

47.47325

441.003

11642.35 58.50

L1:

Covariance and corrected Sum of squares calculation between Maths (x) and Science(y)

Related Documents

Basic Normal
April 2020 13
Hypothesis
October 2019 50
Hypothesis
May 2020 26
Hypothesis
June 2020 18
Hypothesis
May 2020 22

More Documents from "shivakumar N"