Student Name : Pankaj Vohra Roll No. 530910143 MBA – SEM 1 Statistics – MB0027 Set 1 1. What do you mean by sample survey? What are the different sampling methods? Briefly describe them. Answer: Sample: It is a finite subset of a population drawn from it to estimate the characteristics of the population. Sampling is a tool which enables us to draw conclusions about the characteristics of the population. Sample survey can also be described as the technique used to study about a population with the help of a sample. Population is the totality all objects about which the study is proposed. Sample is only a portion of this population, which is selected using certain statistical principles called sampling designs A sample is selected on the basis of following laws 1) Law of statistical Regularity: The group chosen tends to posses the characteristic of the large population. 2) Principle of inertia of large no: Other things being equal, as sample size increases the results tend to be more reliable and accurate. 3) Principle of persistence of small no. : If population posses a markedly distinct character, then it will be reflected in the sample too. 4) Principle of validity; Sample is valid if only it enables to test and estimation about population parameters. 5) Principle of optimization; to obtain desired value of efficiency at minimum cost Different Types of Sampling i. ii.
Probability Sampling Non-Probability Sampling
1) Probability Sampling. Different ways of assigning probability are i. Each unit has the same chance of being selected. ii. Sampling units have varying probability
iii. Units have probability proportional to the sample size.
Some of the important sampling designs are: i. Simple Random Sampling Sample units are drawn so that each and every unit in the population has an equal and independent chance of being included in the sample. If sample unit is replaced before drawing next unit, then it is known as Simple Random Sampling with replacement [SRSWR]. Here probability of drawing a unit is 1/N. If the sample unit is not replaced before drawing next unit, then it is called Simple Random Sampling without replacement [SRSWOR]. Here probability of drawing a unit is 1/N n. N is the population size. Selection of Simple Random Sampling can be done by a) Lottery Method b) The use of table of random numbers. ii. Stratified Random Sampling It is used when the population is heterogeneous with respect to characteristic under study or the population distribution is highly skewed. We subdivide the population into several groups or strata such that i) ii) iii)
Units within each stratum is more homogeneous Units between stratum are heterogeneous and Strata do not overlap, in other words every unit of population belongs to one and only one stratum.
The criterion used for stratification are geographical, sociological, age, sex, income etc. The population of size N is divided into ‘K’ strata relatively homogenous of size N1, N2….Nk such that N1 + N2 +……… + Nk = N. Then we draw a simple random sample from each stratum either proportional to size of stratum OR equal units from each stratum. Merits a. Sample is more representative. b. Provides more efficient estimate
c. Administratively more convenient d. Can be applied in situation where different degrees of accuracy is desired for different segments of population Demerits a. Many times the stratification is not effective. b. Appropriate sample sizes are not drawn from each of the stratum. Systematic Sampling This design is recommended if we have a complete list of sampling units arranged in some systematic order such as geographical, chronological or alphabetical order. Merits a. Very easy to operate and easy to check b. It saves time and labour c. More efficient than Simple Random Sampling if we have upto-date frame Demerits a. Many case we do not get up-to-date list b. It gives biased results if periodic feature exist in the data. Cluster Sampling The total population is divided into recognizable sub-divisions, known as clusters such that within each cluster units are more heterogeneous and between clusters they are homogenous. The units are selected from each cluster by suitable sampling techniques. Multi-stage Sampling The total population is divided into several stages. The sampling process is carried out through several stages. Merits a. Greater flexibility in Sampling method b. Existing division can be used.
Demerits are a. Estimates are less accurate b. Investigator should have knowledge of the entire population that will be sampled.
Non-Probability Sampling A predetermined number of sample units is selected purposely so that they represent the true characteristics of the population, depending upon the object of enquiry and other considerations
Demerit a. It is highly subjective in nature. b. The selection of sample units depends entirely upon the personal convenience, biases, prejudices and beliefs of the investigator. Judgment Sampling The investigator’s experience and knowledge about the population will help to select the sample units, as the choice of sample depends exclusively on the judgment of the investigator. It is most suitable method if the population size is less. Merits a. Most useful for small population b. Most useful to study some unknown traits of a population some of whose characteristics are known. c. To solve day-to-day problem Demerits a. It is not a scientific method b. It has a risk of investigator’s bias being introduced. Convenience Sampling
It is also called “chunk” which refers to the fraction of the population being investigated which is selected neither by probability nor by judgment. The sample units are selected according to convenience of the investigator. Quota Sampling It is a type of judgment sampling. Under this design Quotas are set up according to some specified characteristic such as age group, income groups etc. From each group a specified number of units are sampled according to the Quota allotted to the group.
2. What is the different between correlation and regression? What do you understand by rank Correlation. When we use rank correlation and when we use Pearsonian correlation Coefficient? Fit a linear regression line in the following data – X 12 15 18 20 27 34 28 48 Y 123 150 158 170 180 184 176 130 Answer-
Difference between Correlation and Regression.
1. Correlation: When two or more variables move in sympathy with other, then they are said to be correlated. If both variables move in the same direction then they are said to be positively correlated. If the variables move in opposite direction then they are said to be negatively correlated. If they move haphazardly then there is no correlation between them. Regression: Regression is defined as, “the measure of the average relationship between two or more variables in terms of the original units of the data.” 2. Correlation analysis deals with 1) Measuring the relationship between variables. 2) Testing the relationship for its significance. 3) Giving confidence interval for population correlation measure. Regression analysis is used to estimate the value of dependent variables from the values of independent variables. 3. Correlation analysis -; To study the relationship between the two variables x and y. Regression analysis-: To predict the average x for a given y. In Regression it is attempted to quantify the dependence of one variable on the other
4. Correlation quantifies the degree to which two variables are
related. Correlation does not find a best-fit line while regression an be fit. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. 5. With correlation you don't have to think about cause and effect. You simply quantify how well two variables relate to each other. With regression, you have to consider about cause and effect as the regression line is determined as the best way to predict Y from X. 6. With correlation, it doesn't matter which of the two variables you call "X" and which you call "Y". You'll get the same correlation coefficient if you swap the two. with linear regression, the decision of which variable you call "X" and which you call "Y" matters a lot, as you'll get a different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line that predicts X from Y. 7. Correlation is almost always used when you measure both variables. It rarely is appropriate when one variable is something you experimentally manipulate. With linear regression, the X variable is often something you experimental manipulate (time, concentration...) and the Y variable is something you measure. 8. The correlation answers the STRENGTH of linear association between paired variables, say X and Y. On the other hand, the regression tells us the from of linear association that best predicts Y from the values of X. 9. Correlation is calculated whenever: a. Both X and Y is measured in each subject and quantifies how much they are linearly associated. b. In particular the Pearson's product moment correlation coefficient is used when the assumption of both X and Y are sampled from normally distributed populations are satisfied c. Or the Spearman's moment order correlation coefficient is used if the assumption of normality is not satisfied. d. Correlation is not used when the variables are manipulated, for example, in experiments. Linear regression is used whenever: a. At least one of the independent variables (Xi's) is to predict the dependent variable Y. Note: Some of the Xi's are dummy variables, i.e. Xi = 0 or 1, which are used to code some nominal variables. b. If one manipulates the X variable, e.g. in an experiment.
10. Linear regression are not symmetric in terms of X and Y. That is interchanging X and Y will give a different regression model (i.e. X in terms of Y) against the original Y in terms of X. On the other hand, if you interchange variables X and Y in the calculation of correlation coefficient you will get the same value of this correlation coefficient. 11. The "best" linear regression model is obtained by selecting the variables (X's) with at least strong correlation to Y, i.e. >= 0.80 or <= -0.80. 12. The same underlying distribution is assumed for all variables in linear regression. Thus, linear regression will underestimate the correlation of the independent and dependent when they (X's and Y) come from different underlying distributions. Spearman’s Rank Correlation Coefficient Charles Spearman rank correlation is denoted by the Greek letter ρ (rho) or as rs, is a nonparametric It assumes i)
Samples are drawn from a normal population.
ii) The variables under study are affected by a large number of independent causes so as to form a normal distribution. When we do not know the shape of population distribution and when the data is qualitative type Spearman’s Ranks correlation coefficient is used to measure relationship. It is defined as
Where D is the difference between ranks assigned to the variables. Value of r lies between – 1 and +1 and its interpretation is same as that of Karl Pearson’s correlation coefficient.
If tied ranks exist, classic Pearson's correlation coefficient between ranks has to be used.
One has to assign the same rank to each of the equal values. It is an average of their positions in the ascending order of the values.
X 12 15 18 20 27 34 28 48 Y 123 150 158 170 180 184 176 130 Linear Regression Line for the above data
Total Numbers : 8 Slope (b) :0.16701 Y-Intercept (a) : 154.65 Regression Equation : 154.66 + 0.17x _____________________________________________________________________ ___
Q3. What do you mean by business forecasting? What are the different methods of business Forecasting. Describe the effectiveness of time-series analysis as a mode of business forecasting. Describe the method of moving averages. Answer- Business Forecasting Business forecasting is the analysis of past and present economic conditions with the object of drawing inferences about probable future business conditions. The process of making definite estimates of future course of events is referred to as forecasting and the figure or statements obtained from the process is known as ‘forecast’ future course of events is rarely known These are two aspects of scientific business forecasting.
i. ii.
Analysis of past economic conditions Analysis of present economic conditions:
Main methods of business forecasting:1. Business Barometers Business indices are the indicators of future conditions, so they are also known as “Business Barometers” or ‘Economic Barometers’. Which can help in forecasting and decision making. It consist of gross national product, wholesale prices, consumer prices, industrial production, stock prices, bank deposits etc. These quantities may be concerted into relatives on a certain base. The relatives so obtained may be weighted and their average be computed. The index thus arrived at in the business barometer. The business barometers are of three types: i. Barometers relating to general business activities. ii. Business barometers for specific business or industry:. iii. Business barometers concerning to individual business firm 2. Time Series Analysis The forecasting through time series analysis is possible only when the business data of various years are available which reflects a definite trend and seasonal variation
3. Extrapolation Extrapolation is the simplest method of business forecasting. By extrapolation, a businessman find out the possible trend of demand of his goods and about their future price trends also. The accuracy of extrapolation depends on two factors: i. Knowledge about the fluctuations of the figures, ii. Knowledge about the course of events relating to the problem under consideration. Thus there are two assumptions on which extrapolation is based: i.) There is no sudden jumps in figures from one period to another, ii.) There is regularity in fluctuations and the rise and fall in uniform. 4. Regression Analysis It is the means by which we select from among the many possible relationships between variables in a complex economy those which will be useful for forecasting. Regression relationship may involve one predicted or dependent and one independent variables simple regression, or it may involve relationships between the variable to be forecast and several independent variables under multiple regressions. Statistical techniques to estimate the regression equations are often fairly complex and time-consuming but there are many computer programs now available that estimate simple and multiple regressions quickly.
5.
Modern Econometric Methods
The term econometrics refers to the application of mathematical economic theory and statistical procedures to economic data in order to verify economic theorems. Models take the form of a set of simultaneous equations. The value of the constants in such equations are supplied by a study of statistical time series, and a large number of equation may be necessary to produce an adequate model. 6. Exponential Smoothing Method This method is regarded as the best method of business forecasting as compared to other methods. Exponential smoothing is a special kind of weighted average and is found extremely useful in shortterm forecasting of inventories and sales.
7. Choice of a Method of Forecasting The selection of an appropriate method depends on many factors – the context of the forecast, the relevance and availability of historical data, the degree of accuracy desired, the time period for which forecasts are required, the cost benefit of the forecast to the company, and the time available for making the analysis.
Effectiveness of Time Series Analysis : Time series analysis is also used for the purpose of making business forecasting. The forecasting through time series analysis is possible only when the business data of various years are available which reflects a definite trend and seasonal variation. By time series analysis the long term trend, secular trend, seasonal and cyclical variations are ascertained, analyzed and separated from the data of various years. Merits: i) It is an easy method of forecasting. ii)
By this method a comparative study of variations can be made.
iii) Reliable results of forecasting are obtained as this method is based on mathematical model. The following are the possible uses of the time series: i.
Comparative study of the behavior of the variable over different periods of time can be done. The variable may be export figures, quantity of industrial production etc:
ii.
Forecasting can be done using the time series. By studying the variations and other behavior of the variables over a sufficiently long period of time, it may be possible to forecast the future behavior of the variables. However, such a forecast has meaning only if the period of forecast is a normal period. For example, various five-year plans by the Government of India are formulated by studying the time series and forecasting.
iii.
Study of the time series helps in analysing the post behavior of the variables. This helps in identifying the various forces that effect its behavior.
Method of Moving Averages This method is used for smoothing the time series. That is, it smoothens the fluctuations of the data by the method of moving averages. A) When Period of moving average is odd: To determine the trend by this method, we use the following method: i.) Obtain the time series ii). Select a period of moving average such as 3 years, 5 years etc. iii) Compute moving totals according to the length of the period of moving average. If the length of the period of moving average is 3 i,e., 3-yearly moving average is to be calculated, compute moving totals as follows: a + b + c, b + c + d, c + d + e, d + e + f….. Placing the moving totals at the centre of the time span from which they are computed. iv.) Compute moving averages by moving totals in step (3) by the length of the period of moving average and place them at the centre of the time span from which the moving totals are computed. These moving averages are also called the trend values. By plotting these trend values (if desired) one can obtain the trend curve with the help of which we can determine the trend whether it is increasing or decreasing. If needed, one can also compute short-term fluctuations by subtracting the trend values from the actual values. B). When period of moving averages is even: when period of moving average is even (4years etc) we compute the moving averages by using the following steps: i) . Obtain the time series ii.) Obtain the length of the period of moving average. Let the length of the moving averages period be 4-years.
iii.) Compute 4 yearly moving totals and place them at the centre of time span. The four – yearly moving totals are computed as follows: a + b + c + d, b + c + d + e, c + d + e + f, iv.) Compute 4 – yearly moving average and place them at the centre of the time span. Note that this placement is inconvenient, because the moving average so placed would not coincide with original time period. v.) Take two – period moving average of moving averages and place them at the middle of the periods. This process is called centring of moving averages. Merits of method of moving averages: i.) This method is simple. ii.) This method is objective in the sense that any body working on a problem with this method will get the same results. iii.) This method is used for determining seasonal, cyclic and irregular variations besides the trend values. iv.) This method is flexible enough to add more figures to the data because the entire calculations are not changed. v.) If the period of moving averages coincides with the period of cyclic fluctuations in the data, such fluctuations are automatically eliminated. Limitations: i.) There is no functional relationship between the values and the time. Thus, this method is not helpful in forecasting and predicting the values on the basis of time. ii). There are no trend values for some year in the beginning and some in the end. For example, for 5 – yearly moving average there will be no trend values for the first two years and the last three years. iii.) In case of non – linear trend the values obtained by this method are biased in one or the other direction. iv. )The selection of the period of moving average is a difficult task. Therefore great care has to be taken in selecting the period, particularly, when there is no business cycle during that time.
____________________________________________________________________ 4. What is definition of Statistics? What are the different characteristics of statistics? What are the different functions of Statistics? What are the limitations of Statistics? Answer ; Definition for “Statistics” Different authors provide different definitions for statistics 1. Boddington “Statistics is the science of estimates and probabilities’. 2. Croxton and Cowden, ‘Statistics is the science of collection, presentation, analysis and interpretation of numerical data.’ 3.
Prof.Horace Secrit Statistics deals with aggregate of facts, affected to marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to a reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other.
Characteristic of Statistics Statistics Deals with aggregate of facts: Single figure cannot be analyzed.. 1. Statistics Deals with aggregate of facts: Single Figure cannot be analyzed. Thus, €the fact "Mr Lee is 170cms tall" cannot be statistically analysed. On the other hand, if we know the heights of 60 students of a class, we can comment upon the average height, variations etc. 2. Statistics are affected to a marked extent by multiplicity of causes: The statistics of yield of paddy is the result of factors such as fertility of soil, amount of rainfall, quality of seed used, quality and quantity of fertilizer used, etc. 3. Statistics are numerically expressed: Only numerical facts can be statistically analyzed. Therefore, facts as ‘price decreases with increasing production’ cannot be called statistics. 4. Statistics are enumerated or estimated according to reasonable standards of accuracy: The facts should be enumerated (collected from the field) or estimated (computed) with required degree of accuracy. The degree of accuracy differs from purpose to purpose. 5. Statistics are collected in a systematic manner: The facts should be collected according to planned and scientific methods. Otherwise, they are likely to be wrong and misleading.
6. Statistics are collected for a pre-determined purpose; There must be a definite purpose for collecting facts. Eg. Movement of wholesale price of a commodity. 7. Statistics are placed in relation to each other: The facts must be placed in such a way that a comparative and analytical study becomes possible. Thus, only related facts which are arranged in logical order can be called statistics. Functions of Statistics • • • • •
It simplifies mass data It makes comparison easier It brings out trends and tendencies in the data It brings out hidden relations between variables. Decision making process becomes easier.
Limitations of Statistics 1. Statistics does not deal with qualitative data. It deals only with quantitative data. 2. Statistics does not deal with individual fact: Statistical methods can be applied only to aggregate to facts. 3. Statistical inferences (conclusions) are not exact: Statistical inferences are true only on an average. They are probabilistic statements. 4. Statistics can be misused and misinterpreted: Increasing misuse of Statistics has led to increasing distrust in statistics. 5. Common men cannot handle Statistics properly: Only statisticians can handle statistics properly. ____________________________________________________________________ 5. What are the different stages of planning a statistical survey? Describe the various methods for collecting data in a statistical survey. AnswersThe planning stage consists of the following sequence of activities. 1. Nature of the problem to be investigated should be clearly defined in an un ambiguous manner. 2. Objectives of investigation should be stated at the outset. Objectives could be to obtain certain estimates or to establish a theory or to verify a existing statement to find relationship between characteristics etc. 3. The scope of investigation has to be made clear. It refers to area to be covered, identification of units to be studied, nature of
characteristics to be observed, accuracy of measurements, analytical methods, time, cost and other resources required. 4. Whether to use data collected from primary or secondary source should be determined in advance. 5. The organization of investigation is the final step in the process. It encompasses the determination of number of investigators required, their training, supervision work needed, funds required etc. Methods of Collection Data 1) Primary data collection. i. Direct personal observation ii. Indirect oral interview iii. Information through agencies iv. Information through mailed questionnaires v. Information through schedule filled by investigators. 2) In Direct personal observation the investigator collects data by having direct contact with units of investigation. 3) Indirect oral interview is used when area to be covered is large. The data is collected from a third party or witness or head of institution. 4) Through local agencies and correspondents. 5)
Through Questionnaires. Generally adopted by research workers and other official and non-official agencies
6) Through schedules filled by investigator through personal contact . 7) Secondary data may be collected either by census or sampling methods. 8)
Pilot survey: It is a small trial survey undertaken before main survey. It gives a measure of efficiency of the Questionnaire ____________________________________________________________________ 6. What are the functions of classification? What are the requisites of a good classification? What is Table and describe the usefulness of a table in mode of presentation of data? Answer-> Functions of Classification a. It reduce the bulk data b. It simplifies comprehensible.
the
data
and
makes
the
data
more
c. It facilitates comparison of characteristics. d. It renders the data ready for any statistical analysis. Requisites of a good classification i. Unambiguous: It should not lead to any confusion ii. Exhaustive: every unit should be allotted to one and only one class iii. Mutually exclusive: There should not be any overlapping. iv. Flexibility: It should be capable of being adjusted to changing situation. v. Suitability: It should be suitable to objectives of survey. vi. Stability: investigation
It
should
remain
stable
through
out
the
vii. Homogeneity: Similar units are placed in the same class. viii. Revealing: Should bring out essential features of the collected data. Table . It is a logical listing of related data in rows and columns. Objectives of tabulation are: i. To simplify complex data ii. To highlight important characteristics iii. To present data in minimum space iv. To facilitate comparison v. To bring out trends and tendencies vi. To facilitate further analysis Usefulness of a table in mode of presentation of data Parts of a Table. i.
Table number:
ii. iii. iv. v. vi. vii. viii.
Title Captions Stubs Body of the table Ruling and Spacing Head Note Source Note
Types of Table a. Purpose of investigation: two types. i. General purpose table or also known as reference table. They are formed without specific objective, but can be used for any specific purpose. They contain large mass of data. Example: Census. ii. Specific purpose table or text table or summary table deals with specific problems. They are smaller in size and they highlight relationship between characteristics. Example: Cost of living indices. b. The nature of presented figures: two types: i. Primary Table: They contain data in the form in which it were originally collected ii.Derived Table: They represents figures like totals, averages, ratios etc. derived from original data. c. Construction: 3 types i. Simple table: Presents only one characteristic. ii. Complex table: Presents Two or more characteristics. iii. The cross – classified Table: entries are classified in both directions.