SUNDARAM PUBLISHERS For free home delivery send us a mail immediately: e-mail:
[email protected] MOST FAMOUS Categories Motivational Strategy Human Resource Management Management Communication Leadership Skills Finance / Reference Investment Economics
The Economist Books
General Research CRM
Public Relations International Statistics
Marketing / Branding B General Training Business Life Small Business / Enterprenership Accounting Industries / Profession Insurance
NEW RELEASES : Common Admision Test (CAT) Preparation Kit- 2 CDs-Rs. 295/IIT-JEE Preparation Kit- 2 CDs-Rs. 295/INTERVIEW SKILLS AND MISTAKES Preparation Kit- 2 CDs-Rs. 295/GROUP DISCUSSION Preparation Kit- 2 CDs-Rs. 295/-
Ch - 1 : Page 1
UPSC Preparation Kit- 2 CDs-Rs. 295/How to play guitar in 12 days-2 CDs-Rs. 295/Hair and skin care guide-2 CDs-Rs. 295/The ITALIAN Language Speed Learning Course: Speak ITALIAN Confidently … in 12 Days or Less2- CDs-Rs. 295/The SPANISH Language Speed Learning Course: Speak ITALIAN Confidently … in 12 Days or Less-2 CDs-Rs. 295/Learn Chess and Checkers-2 CDs-Rs. 295/Body Language Magic: How to Read and Make Body Movements for Maximum Success-2 CDsRs. 295/HOW TO STOP SMIKING IN A WEEK-2 CDs-Rs. 295/Run For Your Life: A Joggers Handbook-2 CDs-Rs. 295/597 Ready To Use Sales Letters and Business Forms 2 CDs-Rs. 295/How To Be An Ace Athlete: Peak Performance Secrets Every Aspiring Athlete Should Know-2 CDs-Rs. 295/Ch - 1 : Page 2
Recipes From Around The World-2 CDs-Rs. 295/THE WONDER BOOK OF BIBLE STORIES-2 CDs-Rs. 295/The Speed Reading Course-2 CDs-Rs. 295/Lessons in Yoga Exercises-2 CDs-Rs. 295/Hathayoga-2 CDs-Rs. 295/-
1
Classification and Tabulation of Data
1. Data and Data Collection Collection of useful informations for a particular purpose or field is called data. They are used for making analysis and interpretations. There are two kinds of data: 1. Primary Data: are the original data which the researcher directly find themselves. They are not published. 2. Secondary Data: are compiled by one researcher and used by many. They are published.
2. Classification of Data Definition: Data is a meaningful information. These informations are related to a particular field and, such informations are used to make interpretations. They are mostly in large shape, size and number. It is difficult for a human being to draw conclusions out of these raw data. Thus on the basis of their features, the data are divided into separate classes, for easy understanding. This process of distribution of data is called "Classification of data". According to Prof. L.R. Connor, "Classification, is the process of arranging things in groups or classes according to their resemblance and affinities and gives an expression to the unity of attributes that may subsist among a diversity of individuals. Basic features and characteristics of classification: 1. Classification is done on the basis of facts. 2. It is the division of whole data in different groups.
Ch - 1 : Page 3
3. Grouping depend upon uniformity of attributes. 4. It is real or imaginary. Objectives of Classification: 1. To express the mutual relationship. 2. To show the data in a convenient and condensed form. 3. To make the comparative study easier. 4. To clarify the similarities and disparities of different data. 5. It becomes the base for further analysis and interpretation. 6. It makes the data easy to understand. Types of Classification: 1. Qualitative 2. Quantitative 3. Temporal (Chronological) 4. Spatial (Geographical)
3. Data Presentation There are three methods available for statistical presentation of data. They are as follows: 1. Text Presentation 2. Tabular Presentation 3. Graphical Presentation 1. Text Presentation: In this method of data presentation the data are shown by combining together text and figures. Though, it has the advantage of directing attention but it is not an effective device. It is so because the reader finds it difficult in reading and understanding. It takes too much of time also. 2. Tabular Presentation or Tabulation of data: It is very essential to present the data in an appropriate manner because it helps in understanding, making comparison and drawing conclusions. Thus, to achieve this objective data are arranged in tabular form after classification. This is known as Tabulation. According to L.R. Connor, "Tabulation involves the orderly and systematic resentation of numerical data in a form designed to elucidate the problem under consideration". Objects of Tabulation : 1. To present information given by data in a serialised and orderly manner. 2. To present the data in loss space. 3. With the help of tabulation analysis and comparison becomes easier. 4. To provide information at a glance. 5. To show the trend and accuracy of the data. Guidelines for Tabulation : The following general point should be kept in the mind while preparing table: (1) Title (6) Body of the table (2) Table Number (7) Totals (3) Captions and stubs (8) Foot Notes (4) Date (9) Source Note (5) Head Note Types of Tables : Classification of table are done on the following basis: 1. On the basis of purpose, tables are of two types:
Ch - 1 : Page 4
2.
3.
(a) General purpose table: It contains information to be used by public in general (b) Specific purpose table: It contains information relating to particular purpose or organisation. On the basis of origin: (a) Primary Table: Also called original table. This table is made from the data originally collected, for some purpose. (b) Secondary table: Also called Derivative table. Interpretation and analysis are presented in this table. On the basis of construction, tables are of two types: (a) Simple table: This table consists of the data which shows one characteristics only. (b) Complex table: It consists of the figures relating to several characteristics. This table is further of three types: Double table, Treble table and Manifold table.
4. Frequency Distribution To make the analysis of data in an easier way and to simplify the various statistical calculations, the classified data are presented in an appropriate serial order in the form of a frequency distribution. In the words of Morris Humburg, "A frequency distribution or frequency table is simply a table in which the data grouped into classes and the number of cases which fall in each class are recorded. The number in each class are referred to as frequencies." Following terms are related to frequency distribution: Variables, Array, Range, Class Frequency, Class Interval, Class Mark, Size of the class interval etc. Methods of constructing class interval: Class intervals are very essential part of frequency table. The various method of constructing class interval are as follows: 1. Exclusive Method 3. Open End Class Interval 2. Inclusive Method 4. Class Interval with cumulative frequencies Example:
Ch - 1 : Page 5
Class-Interval
Exclusive Method
Inclusive Method
Open-end class interval
Upper limit of one class becomes the lower limit of the next class
Value equal to the upper limit of the classinterval is included. There remain a difference of one between a C.I and its succeeding class Interval
In this method the wor d ‘more than’ and ‘less than’ is used to a specified limit
0 - 9, 10 - 19, 20 - 29, 30 -39 and so on
Less than 10, or More than 10.
For e.g. 0 - 10, 10 -20, 20 - 30 and so on.
C-I with cumulative frequency
In this frequencies and measurement of C.I are cumulated. The word like ‘below’ ‘above’, ‘less than’, ‘morethan are used
Below 10, Below 20, Below 30 or Above 10, Above 20, Above 30
5. Statistical Series According to Prof. Connor. Statistical series means, "If two variable quantities, can be arranged side by side so that the measurable differences in the one correspond with measurable difference in the other the result is said to form statistical series." There are three type of series: 1. Individual Series 2. Discrete Series 3. Continuous Series 1. Individual Series: When the data's are observed individually and are listed as individual cases, it is a form of individual series. They may be placed in ascending or descending order. 1Example: 1 Roll 1 2 3 4 5 6 7 8 9 10 No. Marks 2 8 12 14 45 17 32 29 23 16 2. Discrete Series: In the words of Prof. Boddington, "A discrete variable is one where the variates (individual value) differ from each other by definite amount". 1 Marks 1 2 3 4 5 Example No. of Students 2 4 7 3 8 3. Continuous Series: A continuity is maintained in value of the variables in a series. The measurem ent lies between the class interval. According to Prof. Boddington, "In a continuous series, the variable can take immediate value between the smallest the largest value in the distribution.
Ch - 1 : Page 6
1 Example
Marks No. of Students
2
0-10 10-20 20-30 30-40 2 7 1 5
Measures of Central Tendency and Dispersion
1. Measure of Central Tendency In modern times, the averages have got too much importance in all the spheres of research. According to Croxton and Cowden, "An average is a single value within the range of the data that is used to represent all the values in series. Since an average is somewhere within the range of the data, it is sometimes called a measure of central value." Spurr, Kellog and Smith have defined it as "An average is sometimes called a measure of central tendency because individual values of the variable usually cluster around it."
2. Types of Averages In broad sense average can be divided into two kinds: 1. Mathematical Averages 2. Positional Averages Mathematical Averages: It can further be divided into three types: (a) Arithmetic Average or Mean. (b) Geometric Average. (c) Harmonic Average. Positional Average: It can further be divided into two types: (a) Median. (b) Mode. Arithmetic Average: Arithmetic average is the most popular and best undertaken measure of central tendency for a quantitative set of data. The arithmetic mean of a series is calculated by adding the values of all the scores of the series and divide the sum total by the number of scores. For example. Past five years purchase are 40, 50, 60, 70, 80, which increase by 10% 40 + 50 + 60 + 70 + 70 300 Increase in purchase = = = 60% 5 5 Types of Arithmetic Average: There are two types of arithmetic averages. 1. Simple Arithmetic Average: In this all the items are treated alike for calculating averages. 2. Weighted Arithmetic Average: In this the items are assigned weights and then the averages are calculated. Ch - 1 : Page 7
Arithmetic Mean ( x ) Individual Series Direct method Σx (x) = N
Discrete Series Σ fx (x) = Σf
Continuous Series Σ fx (x) = Σf where x is the mid value of the class Short-cut method Σ fd x Σ fd x Σ fd x +A (x) = +A +A (x) = (x) = N Σf Σf 1where dx = x A 1where dx = x A where dx = x1 A and x is the mid value of the class Step-deviation Σ fd x Σ fd x i Σ fd x i = × i +A ( x ) = × i +A = × i +A ( x ) ( x ) method Σf Σf N X − A X − A X − A where dxi = where dxi = where dxi = i i i x is the mid-value of the class Weighted Arithmetic Average can be calculated by two methods: Σxw 1. Direct Method: x w = Σw where x w = weighted arithmetic average xw = Sum of the products of variable and weights w = Sum of the weights 2. Short Cut-method: Symbolically it is represented as: Σdw x w=a+ Σw where, x w = weighted arithmetic mean a = assumed mean ∑dw = sum of the product of deviation of variable x and weights. ∑w = Sum of the weights Advantages of Arithmetic Average: 1. Calculation are easily understandable. 2. It is commonly used method. 3. Adaptable to algebric treatment. 4. Observations are the basis of the average. Disadvantages: 1. Large number of items are there, so the values are distorted. 2. Calculations cannot be done if the items are missing. Geometric Average G.M. Geometric average is the proper average to get the real satisfaction. It deals with quantities that change over a period of time, and we get to now the average rate of change. The geometric mean g of a set of n positive number x1, x2, x3, .... xn is the nth root of the product.
Ch - 1 : Page 8
In Symbols, g = n x 1 . x 2 . x 3 . . . . x n Formula for G.M. in different series: 1. Individual Series: Σ lo g x g = Antilog n 2. Discrete and Continuous Series Σ f lo g x g = Antilog n Merits: 1. All the observation are considered in G.M. 2. Algebric manipulation are possible. 3. It is a precise result as it is defined rigidly. Demerits: 1. Calculation is difficult and is not generally understood. 2. Geometric becomes imaginary, where an item is zero or negative. Harmonic Mean: The harmonic mean is the total number of items of a variable divided by the sum of reciprocals of the value of the variable. In symbols n 1 1 1 h= 1 + + + . . . . .+ x1 x2 x3 xn Where, h is harmonic mean x1, x2, x3, ....., xn = values of n variables. n = Total number of variables. Formula for calculating H.M. in different series: n 1 1 1 1. Individual series → 1h = 1 + + + . . . . .+ x1 x2 x3 xn n 1 2. Discrete series → h = Σ f × x n 1 where x is the mid-value of a class 3. Continuous series → h = Σ f × x Merits: 1. Based upon observations. 2. Appropriate when more weights are to be given to small values. 3. Lends itself to algebric changes. Demerits: 1. It is a complicated method. 2. No importance is given when both positive and negative values are included. Median
Ch - 1 : Page 9
According to Connor, "The median is that value of the variable, which divides the group into two equal parts. One part compromising all values greater and the others all values less than the median." Formulas used in determining median are: 1. Individual Series: The numbers are arranged in ascending and descending order n + 1 th 1 item M = the size of 2 2. Discrete Series: In this series cumulative frequencies is taken out and then total of frequencies is divided by two. The figure obtained after division is found out in cumulative frequencies and then the item in front of it is taken as median. C . F . o f th e ite m n + 1 M= = 2 2 3. Continuous Series: This series is slightly difficult as we have to locate the class interval in which the median lies by using the discrete series; but only difference is that one is not added to it. Then after identifying the class-interval median is determined by adopting the formula. n M = the size of 1th item 2 n − c Or M = l1 + 2 × t f Where, l1 = lower limit of class interval where group median belongs, c = cumulative frequency upto the lower limit, i = length of the class interval to which group median belongs, f = frequency of the class interval to which group median belongs,n = ∑f Mode According to Croxton and Cowden, "The mode of a distribution is the value at the point around which the items tend to be most heavily concentrated. It may be regarded as the most typical value of a series. In individual series → item which appear more than one time is taken as mode. In discrete series → Grouping method is adopted in which frequencies are grouped by grouping two items twice and three items thrice and then analytical table is prepared. In continuous series → the formula used is ∆1 z = l1 + × i OR ∆1 + ∆ 2 f1 − f0 z = l1 + × (l2 1l1) 2 f1 − f0 − f2 Where, l1 = Lower limit of modal class. L2 = Upper limit of modal class. F0 = Frequency of the class preceding the modal class. f1 = Frequency of modal class. F2 = Frequency of the succeeding modal class. ∆1 = f1 1 f0. Ch - 1 : Page 10
∆2 = i =
f1 1 f2. l1 1 l2.
3. Need for finding out dispresion The central tendency of the series are explained by the averages, but they are unable to explain the deviations of items of a series from its central tendency. In future when values are compared they give misleading conclusions. It has got several limitation. Thus for proper and scientific analysis one has to go beyond the averages and study the measure of dispersion.
4. Methods of Determining Dispersion 1. Range Method: Range is the most easiest and simplest method. In this method the highest and lowest limit of the value is determined and the difference between the two is termed as range. R=LS L − S Coefficient of Range = L + S Where, L = Largest limit. S = Smallest limit. 2. Mean Deviation: As the range method of dispersion does not consider all the observations, so the mean deviation method is adopted. Formula for computing mean deviation are: In individual series: Σ |d | 1δ = n Σ |d m | 1 δm = n Where, δ = delta = mean deviation from mean. δm = M.D. from median. |d | = deviations ignoring algebric signs. n = number of items. In discrete and continuous series: Σ f |d | δ = n Σ f |d m | δm = n Σ f |d z | δz = n Coefficient of all types of series are: δ Coefficient of δ = a δ 1 Coefficient of δm = m M δ Coefficient of δz = z Z
5. Standard Deviations Ch - 1 : Page 11
It is the most popular and widely used technique of deriving dispersion. It removes the drawbacks of other techniques. It is the square root of the second moment of dispersion which is always calculated from arithmetic average. Formula of standard deviation by direct method: (i) Individual Series Σ (d )2 n Where, σ = Standard deviation. d = Deviation of item. n = Number of items. (ii) Discrete and Continuous Series Σ f (d )2 S.D. = σ = n Formula of Standard deviation by short cut method: (i) Individual Series S.D. = 1σ =
σ
=
Σd n
2
Σd − n
2
Where,
d = x 1a ∑d2 = sum of square of deviations. ∑d = sum of deviation. n = number of items. (ii) Discrete Series When deviations are not taken: Σ fx 2 Σ fx S.D. = σ = − n n When deviations are taken: S.D. = σ =
2
Σ fd n
2
Σ fd − n
2
Σ fd n
2
Σ fd − n
2
(iii) Continuous Series S.D. = σ =
×c
6. Coefficient of Variation based on Standard Deviation δ × 100 x σ = Standard Deviation x = Mean
Coefficient of variation or C.V. =
Ch - 1 : Page 12
3
Correlation
1. Correlation We can define correlation as the relationship between two or more variables where with a change in the value of one variable arises a change in the other variable also. According to Croxton and Cowden, "When the relationship is of quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as Correlation."
2. Types of Correlation Correlation are of three types: 1. Positive and Negative Correlation: When the values of two variables vary in the same direction so that an increase in one variable leads to increase in other variable and vice-versa. This correlation is said to be positive correlation. On the other hand, if the values of variable moves in the opposite direction, than it leads to negative correlation. 2. Linear and Curvi-linear Correlation: Linear correlation exists between the two variables, when a change in one variable causes a change in other variable in same ratio. Whereas Curvi-linear Correlation exists when their is no constant ratio between the two variables. 3. Simple, Multiple and Partial Correlation: When only two variables are taken into consideration, there exists simple correlation. On the contrary, when more than two variables are considered it is multiple correlation. The correlation is partial when we study the correlation between two variables neglecting the influence of some other variable in both the variables.
3. Measurement of Correlation Correlation can be measured by any of the following given methods. They are as follows: 1. Scatter Diagram. 2. Two-way Table. 3. Karl Pearson's Coefficient of Correlation. 4. Spearmen's Rank Correlation. 5. Concurrent Deviation Method. 6. Method of Least Squares. 1. Scatter Diagram: For Details [Refer Question No. 10 and 11 on Page 2B.64, 2B.65 of Q & A Zone] 2. Two-way Table: The second method of measuring correlation is the two-way table. In this the correlation, in a frequency distribution has to draw a table of double entry, showing the values of the two variables and looks after the distribution of the frequencies in the cells of the table.
Merits and Demerits of Two-way Table Merits: Ch - 1 : Page 13
1. This is the simplest method to study the relationship. 2. Every item is deeply analysed and shown. Demerits: 1. It is very difficult to draw and interpret as the number of items are more. 2. To give a precise degree of correlation is impossible. 3. It is not good for future mathematical treatment. 3. Karl Pearson's Coefficient of Correlation: The above two discussed methods have no practical utility. Therefore, for practical and applied purpose such methods are used which are able to express relationship in quantitative terms. Karl Pearson's Coefficient of correlation is one of such methods. It is most widely used in practice. It is expressed by the symbol 'r'. Pearson's Coefficient of Correlation is based upon three assumptions: 1. A large variety of independent causes are operating in the series, so as to give a normal distribution. 2. The forces so operating are related in a casual way. 3. The relationship between the two series is linear. 'r' can be calculated in various ways, depending upon the choice of the user.
CASE-I: Deviations are taken from arithmetic Mean r= Where,
Σxy N σ xσ
= y
Σxy Σx2 × Σy
x = (x − x
2
)
y = (y − y) x = Standard deviation of variable x y = Standard deviation of variable y n = Number of Observations r = Correlation coefficient
CASE-II: Deviation taken from assumed mean r=
N Σ d x d y − [( Σ d x ) × ( Σ d y )]
N Σdx 2 − (Σdx)2 N Σdy 2 − (Σdy)2 Where, x = (x − assumed mean). y = (y − assumed mean). dx = Sum of deviation of x series from its assumed mean. dy = Sum of deviation of y series from its assumed mean. dx2 = Sum of squares of deviation of x series from assumed mean. dy2 = Sum of squares of deviation of y series from assumed mean. dxdy = Sum of the product of deviation of x and y series from assumed mean. CASE-III: Deviation are not taken at all. N Σ x y − (Σ x )(Σ y ) r= 2 N Σx − (Σx)2 N Σy 2 − (Σy)2 Where, x = Sum of variable x. y = Sum of variable y. Ch - 1 : Page 14
xy = Sum of products of variable x and y. x2 = Sum of squares of values of variable x. y2 = Sum of squares of values of variable y. N = Number of observation. r = Coefficient Correlation. CASE-IV: Calculation of correlation in grouped distribution. (Σ x )(Σ y ) Σ fd x d y − N r= 2 ( Σ fd x ) ( Σ fd y ) 2 2 2 Σ fd x − Σ fd y − N N Where, dx = (x − mean). dy = (y − mean). dxdy = Sum of products of frequency with the deviation of both the series. fdx = Sum of the products of frequency with the deviation of series x. fdy = Sum of the products of frequency with the deviation of series y. N = Total of frequencies. r = Correlation Coefficient. Merits and Demerits of Pearson's Coefficient of correlation. Merits: 1. Good for further algebric treatment. 2. Both the direction and degree of the correlation between two variables are measured. Demerits: 1. Linear relationship is assumed which is not correct. 2. It is tedious to calculate. 3. Chances of misinterpretation is more.
4. Spearman's Rank Coefficient of Correlation In 1904, Charles Edward Spearman, developed a formula which consists or helps in determining the coefficient of correlation between the ranks of individual in the two attributes. It is also called as Rank Correlation. In this method, the variable are given ranks on the basis of the size of numbers, than the difference between ranks are obtained by deducting the rank of one series from ranks of other series. After this correlation is calculated on the basis of squares of these differences of ranks.
Rank Correlation Formula CASE-I: When ranks are given: 6Σd 2 r = 1− 3 n − n Where, d = difference of ranks of two series. n = number of observations. r = coefficient of correlation. CASE-II: When ranks are not given. 6Σd 2 r = 1− 3 n − n Rank both the series x and y according to magnitude of data and use the formula. Ch - 1 : Page 15
CASE-III: When two or more value of series have same magnitude resulting in tied ranks. 6 Σ d R = 1− Where,
2
+
1 12
(m
3
− m ) +
1 (m 12
3
− m ) + . . . .
n3 − n m = number of items whose ranks are common.
Merits and Demerits of Rank Correlation Merits: 1. It is easy to apply. 2. It provides a check on the calculation, because the sum of the differences between ranks is always equal to zero. 3. It is a way of studying qualitative data. Demerits: 1. Convenient only when the n is small. 2. Not based on full set of information.
5. Concurrent Deviation Method The cases where trends are not noticed in values or trends have no significance in values, concurrent deviation methods are used. In most of the cases it gives accurate result. It is suitable for studying the correlation between short-term fluctuations. In this only the positive or negative direction are considered not the extent. The formula used under this method is: (2 c − n ) rc = ± n Where, rc = Coefficient of concurrent deviations. c = the number of concurrent deviations. n = the number of pairs of observations.
6. Methods of Least Squares One of the assumption upon which the Karl Pearson's coefficient of correlation is based is that there is linear relationship between the variables. By finding out the values of x and y the linear relationship can be studied. Thus, the value of x and y is found out by least squares method. The least square method is used in the following ways: 1. Straight line equation is formed, y = a + bx 2. To find the values of a and b two equations are used: ∑(y) = n(a) + b(∑x) . . . . . . (i) 2 ∑(xy) = a∑(x) + b∑(x ) . . . . . . (ii) 3. By putting the values of x and y in above equation we can get values for the equation y = a + bx
Ch - 1 : Page 16
4. 5.
By this we find the linear values of y for value of x Coefficient of correlation is found out by dividing standard deviation of the linear coefficient of deviation at the original value of y.
4
Regression
1. Regression Regression means "returning backward" or 'stepping down' or 'going back'. Sir Francis Galton in 1877 was the first person to study the title of Regression. It is a statistical technique due to which we are in a position to estimate the unknown values of the other variable. 'Independent' or 'explaining' variable is the name given to predict the other variable. The name allotted to the variable whose value is to be predicted is 'dependent' or 'explained' variable. According to M.M. Blair, "Regression Analysis is a mathematical measure of the average relationship between two or more variables in the term of the original unit of data."
2. Kinds of Regression Analysis There are two types of Regression Analysis: 1. Linear and Curvilinear Regression [Refer to Answer No. 4 on Page 2B.82] 2. Simple and Multiple Regression: Simple regression is the study and analysis of two variables that is x and y, whereas multiple regression means the analysis which is done between more than two variables. In this only one variable is dependent and the others are independent.
3. Methods of Regression Analysis There are two methods for the calculation of Regression Analysis: 1. Graphical Method 2. Algebraic Method 1. Graphical Method: The first step in this method is to draw a scatter diagram on which every observation is shown by a dot. The dependent variables are shown on yaxis and independent variables on x-axis. After this with the help of dots regression lines are drawn. Regression lines are those lines which depicts the best mean value of one variable corresponding to the mean values of the other. For example, x corresponding to mean y and vice-versa. This line is the line which fits best in the scatter diagram and is used to summarise the data. 2. Algebraic Method: In this method regression lines are shown with the help of equation formulated that is the regression line of X on Y and the regression line of Y on X. These are y = a + bx and x = a + by where a is the intercept of the line and b is the slope.
Ch - 1 : Page 17
4. Coefficient of Regression The coefficient of regression helps in determining the value by which one variable increases for a unit increase in other variable. As there are two equations of regression there are two coefficient of regression.
Coefficient of regression of x on y = r Coefficient of regression of y on x = r
σ σ
x y
σ
y
σ
x
5. What is Coefficient of Determination? Coefficient of determination is the square of the coefficient of correlation. It is the measure of the proportion of explained variance with the total variance. Coefficient of determination = r2
6. What is Coefficient of Non-determination? Coefficient of non-determination is the measure of that proportion of total variance in the dependent variable which is not explained by the independent variable. It is represented by K2 and is K2 = 1 − r 2 U n e x p la in e d v a r ia n c e = T o ta l v a r ia n c e
5
Probability
1. Probability The chances of happening or non-happening of an event is termed as probability. The statement depicts that, there is an element of uncertainty. A numerical measure of uncertainty is provided by the theory of probability. In the words of Morris Hamburg, "Probability measures provides the decision maker in business an in government with the means for quantifying the uncertainties which affect his choice of appropriate actions." There are three approaches to the definition of Probability — (1) Classical or Mathematical Definition (2) Statistical or Empirical Definition (3) Modern Definition Ch - 1 : Page 18
2. Classical or Mathematical Definition of Probability Definition If an experiment has 'n' mutually exclusive, equally likely and exhaustive cases, out of which 'm' are favourable to happening of event 'A' then the probability of happening of 'A' is defined as N o . o f c a s e s fa v o u r a b le to A m P(A) = = T o ta l ( e x h a u s tiv e ) n o . o f c a s e s n
3. Statistical or Empirical Definition The following statistical or empirical definition of probability has been given by Von Mises. If the experiment is repeated a large number of time under essential identical conditions the limiting value of the ratio of the number of times the event A happens to the total number of trials of the experiment as the number of trial increases indefinately is called the probabilities of happening of the event A. Symbolically let P(A) denote the probabilities of the occurrence of A. Let m be the number of times in which an event A occurs in a series of n trails then lim m P(A) = n → ∞ n provide the limit is finite and unique. Limitation of Statistical Definition: 1. Under the same condition, the experiment is required to be conducted a large number of times. 2. The relative frequency may not obtain a unique value.
4. Modern Definition Probability is the limit of the proportion of times that a certain event A will occur in repeated trials of an experiment. Let S be sample space. Let be the class of events and let P be a real valued function defined on . Then P is called a probability measure and P(A) is called the probability of the events A if P satisfies the following axioms. (1) 0 ≤ P(A) ≤ 1 for every event A belonging to∑. (2) P(S) = 1 (3) For every finite or infinite sequence of disjoint events A1, A2, . . . ., P(A1 ∪ A2 ∪ ...) = P(A1) + P(A2) + . . . . .
5. Types of Events Sub set of a sample space is called an event. There are many types of events: 1. Sure event: On the performance of a random experiment, an event which is sure to happen is called sure event. 2. Impossible event: On the performance of a random experiment, an event which is impossible to take place is called impossible event. 3. Simple event: On the performance of a random experiment an event with a single possible outcome is called simple event. 4. Elementary event: An element of a sample space which show all possible outcomes of a random experiment is called elementary event. 5. Compound event: On the performance of a random experiment an event with joint occurrence of two or more simple events is called compound event. Ch - 1 : Page 19
6.
Independent event: Two events are said to be independent, if occurrence or nonoccurrence of one event does not affect the occurrence of the other event. 7. Dependent events: When one event affects the probability of the other then the second event is said to be dependent on the first. 8. Mutually Exclusive events: Two events are mutually exclusive events if the occurrence of one prevents the occurrence of another. 9. Overlapping events: When two or more events takes place together we called it overlapping events. 10. Equally likely events: Two or more events are said to be equally likely if each of them has equal chance of happening or non-happening in preference to others. 11. Complementary events: An events which consists in the negation of another event is called complementary event of the latter event. In the experiment of rolling a die the complementary event of "multiple of 3" is obviously "not a multiple of 3". P (E 1)P (R / E 1) P(E1/R) = …. P (E 1)P (R / E 1) + P (E 2)P (R / E 2)
6
Elements of Theoretical Distribution
1. Theoretical Distribution Distribution which are not obtained by actual observation or experiments but are deduced mathematically based on certain assumption are called as elements of theoretical distribution. The following distribution which are most common in use are: (1) Binomial distribution (2) Poisson distribution (3) Normal distribution Binomial distribution Assumptions – (1) The number of trial or 'n' is finite and fixed (2) In every trial there are two possible outcomes of the event which are mutually exclusive. (3) The probability of success p and the probability of failure q (= 1 − p) remains constant in all the trials. (4) All the trials are independent of each other i.e one trial is not affected by the other.
Ch - 1 : Page 20
Properties of a Binomial distribution are 1. Standard deviation = n p q 2. Means = np 3. Variance = npq Poisson distribution Assumptions – 1. Every event must be independent of any other event. 2. The probability of happening of more than one event in a very small interval is nil. 3. The probability of success for a short time interval is proportional to the length of time interval. Constant of Poisson Distribution are1. mean = m 2. Standard deviation = m 3. Variance = m 4. Mode = integral part of m; where m is not an integer. Utility of Poisson-'Distribution-In variety of field poisson distribution is used 1. The number of defective material in a packing. 2. The number of bacteria per unit. 3. The number of accident in a city. 4. The number of person born deaf and dumb in a city. 5. The number of customers arriving at super market. Normal distribution Normal distribution is a distribution which takes different values of a continuous random variable. No work is done upon probability, when dealing with a continuous random variable. It is useful in business and economic applications because a wide range of data-set takes the form of normal distribution. Normal distribution of density function is given by −(x−µ) 1 2 f(x) = e 2σ σ 2π where μ = mean σ = S.D.
7
Sampling & Estimation
1. Regression Data's are very useful for the purpose of comparison and estimation. There are two ways of collecting the statistical data. The first one is census enquiry and other is sampling enquiry. In statistical language, the study of the aggregate of the objects is called population or universe. This becomes the subject of investigation.
Ch - 1 : Page 21
Following are some of the definitions. Which are considered in the theory of sampling. Sample The part of population which is examined with a view to estimate the characteristics of that population is called sample. Sample Survey The way of choosing only a part of information and then examining it for the purpose of inqu is known as sample survey. Sampling Sampling is a process of generalizing the results and findings from the sample survey and makes them applicable to every field of enquiry. Sampling is based upon two statistical Laws, which are as follows: 1. Law of statistical regularity. 2. Law of Incrtia of Large numbers. 1. Law of statistical regularity This law states that when the samples are selected at random from the population, they have the same characterstics as that of whole population. 2. Law of Inertia of Large numbers This law states that when the samples are large in numbers, they are near to that of the population.
2. Method of Sampling For getting a sample, there are many methods. Broadly these methods are categorised in two parts – 1. Random Sampling Methods 2. Non-random sampling methods. These two methods are further classified again which are as follows: 1. Random Sampling Method is further categorised in four methods: (a) Random Sampling (b) Strarified Sampling (c) Systematic sampling (d) Multistage Sampling 2. Non - Random Sampling Method is further categorised in six methods (a) Purposive Sampling (b) Quota Sampling In this method of sampling a quota is arranged for every investigator. He has to select sample from this quota by his personal skill and judgement. (c) Cluster Sampling In this method of sampling population is first divided into clusters or block, after this from these clusters or blocks the samples are selected for investigation purpose (d) Area Sampling In this method of sampling, the total geographical area is divided into smaller areas and then from these small area some are selected and they become the sample. (e) Sequential Sampling In this method of sampling, the sample are taken one after the another from the population are arranged sequentially. After this they are
Ch - 1 : Page 22
detected one by on until and unless an acceptable result is achieved. The samples which are unable to reach the result are rejected (f) Convenience Sampling In this method of sampling, primary data become the part of sample. They are not very sound.
3. Statistical estimation Statistical estimation is a theory that deals with the way how to estimate a parameter (like, dispersion, moments etc.) from the given sampled data. The main object behind the theory of sampling is to estimate feate for the population from which the sample is selected. In this theory there are two types of estimates, such as 1. Point Estimate 2. Interval estimate
4. Sample size In the purpose of estimation size of the sample plays a very important role, it should neither be very small in size nor it should be large in size. Because if the size of the sample are small it will not be beneficial in estimating correct population parameter where as if the sample size is too large this will lead to wastage of time, money and energy. Thus, it is good to have the right sample size. Sample size for estimating mean is 2 σ .z n = E Where n = sample size σ = S.D z = Confidence level E = Sampling Error. Sample size for estimating proportion is z 2 p (1 − p ) n= E 2
Ch - 1 : Page 23