Why Statistics: Complexity of the situations make a process of decision making difficult Statistics provides the method of collecting , presenting, analyzing, and meaningfully arranging data. Type of situations: When data need to be presented in a form which helps in easy grouping(Graphs, Charts, Table) To test some Hypothesis and draw inference Unknown quantities are to be estmated through observed data A decision is to be made under uncertainty regarding a course of action
Statics Descriptive
Inductive
Stastical Decision Theory
Data collection
Stastistical inferences
- Decision problems
&presentation
Hypothesis testing
- Alternatives
and Inferences
- Uncertainties
(eg. Regression
- Criterion of choices
Correlation)
ARRANGING DATA
Learning Goals MEANING OF DATA TYPES OF DATA DATA COLLECTION DATA PRESENTATION DEVICES
MEANING OF DATA
Data is a collection of related observations, facts or figures.
Collection of data is called a data-set, and each observation a data point.
TYPES OF DATA PRIMARY
DATA
SECONDARY
DATA
DATA COLLECTION
Following questions can pose to test the validity of the data: Where does the data originate from? Is the source reliable? Does the data support or contradict the previous decisions? Are the conclusions derived from the data? What is the size of the sample? does it represents the entire population under consideration for decision making?
METHODS OF COLLECTING DATA
COMPLETE ENUMERATION
SAMPLE METHOD
CLASSIFICATION OF DATA
GEOGRAPHICAL
CHRONOLOGICAL
QUALITATIVE
BY MAGNITUDE
TABULAR PRESENTATION OF DATA OBJECTIVES are: To condense complex data To show a trend To display huge volumes of data in less space To highlight key characteristics of data To facilitate comparison of data elements To help decision making using statistical methods To serve as reference for future decisions
PARTS OF AN IDEAL TABLE
Table number: acts as an identity to the table Title: given an idea about the nature of data in the table Captions: these are headings given to vertical columns that explains the mode of classification i.e. time, quantity, region etc.
Contd..
Stubs: these are the headings explaining the basis for classifying the rows
Body: the data posted in rows and columns, where row and column headings explain the data. Footnote: any other information to explain the data in the table. Source: source of information
Table Title Table No
Captions Stub (Headings of the row)
Table 1.1: Product wise Sales Product
Year wise Sales 2001
2002
2003
2004
P1
40
45
40
50
P2
15
20
22
30
P3
20
30
40
50
Source :Economics Time, 22nd Feb.2005
Body of the Table
GRAPHICAL PRESENTATION OF DATA
LINE CHARTS
BAR CHARTS
PIE CHARTS
PICTOGRAMS
SCATTER DIAGRAMS
LINE CHART 500 450 400 350 300 250 200 150 100 50 0
SALES
1990 1991 1992 1993 1994 1995 1996 1997
1. Line Graph
BAR CHARTS 4500 4000 3500 3000 2500
EXPORTS IMPORTS
2000 1500 1000 500 0
1995
1996
1997
1997
ARRANGING DATA
PIE CHART HISTOGRAMS FREQUENCY POLYGONS SKEWNESS KURTOSIS
PIE DIAGRAMS
Indian Promoters Indian institutions/ mutual funds FIIS Public
HISTOGRAMS
The histogram graphically shows the following: center (i.e., the location) of the data; spread (i.e., the scale) of the data; Skewness of the data; presence of outliers; and presence of multiple modes in the data.
HISTOGRAMS are as "sorting bins." You have one variable, and you sort data by this variable by placing them into "bins." Then you count how many pieces of data are in each bin. The height of the rectangle you draw on top of each bin is proportional to the number of pieces in that bin. On the other hand, in bar graphs you have several measurements of different items, and you compare them. The main question a histogram answers is: "How many measurements are there in each of the classes of measurements?" The main question a bar graph answers is: "What is the measurement for each item?" HISTOGRAMS
Situation
Bar Graph or Histogram?
We want to compare total revenues of Bar graph. Key question: What is five different companies. the revenue for each company? We have measured revenues of several companies. We want to compare numbers of companies that make from 0 to 10,000; from 10,000 to 20,000; from 20,000 to 30,000 and so on.
Histogram. Key question: How many companies are there in each class of revenues?
We want to compare heights of ten oak trees in a city park.
Bar graph. Key question: What is the height of each tree?
We have measured several trees in a city park. We want to compare numbers of trees that are from 0 to 5 meters high; from 5 to 10; from 10 to 15 and so on.
Histogram. Key question: How many trees are there in each class of heights?
FREQUENCY POLYGONS "Less than" Ogive of the distribution of 50 employees 60 50 40
Cumulative frequency
30 20 10 0 <25
<30
<35
<40
<45
<50
<55
<60
SKEWNESS Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.
SKEWNESS
A curve is said to be skewed when the values in the frequency distribution are concentrated more towards the left or right side of the curve i.e. the values are not equally distributed from the centre of the curve. A curve is said to be positively skewed when the tail of the curve is more stretched towards the right side. It is said to be negatively skewed when the tail is more stretched towards the left side.
KURTOSIS
KURTOSIS is the degree of peakness of a distribution of points. It measures the peakedness of a distribution Two curves with same central location and dispersion may have different degree of kurtosis
MEASURES OF CENTRAL TENDENCY Objectives of Averaging Requisites of a Good Average Types of Averages Mathematical Averages Positional Averages
CENTRAL TENDENCY The tendency of the data to cluster around the central value is known as CENTRAL TENDENCY. &
Corresponding numerical measure of this tendency is known as measurement of central tendency The average is of great significance because it depicts the characteristics of the whole group. Since an average represents the entire data, its value lies somewhere in between the two extremes, i.e. the largest and the smallest items. For this reason an average is frequently referred to as a measure of Central Tendency.
MAIN OBJECTIVES • To find out one value that represents the whole • • • •
mass of data. To facilitate comparison. To establish relationship. To derive inference about a universe from a sample. To aid decision making.
Requisites of a Good Average • • • • • •
It should be rigidly defined. It should be mathematically expressed. It should be readily comprehensible and easy to calculate. It should be calculated based on all the observations. It should be least affected by extreme fluctuations in sampling data. It should be suitable for further mathematical treatment.
Types of Averages AVERAGES Mathematical Averages Arithmetic Mean (A.M.)
Geometric Mean (G.M.)
Positional Averages Harmonic Mean (H.M.)
Median (Md)
MODE (Mo)
ARITHMETIC MEAN • It is a ratio obtained on dividing the sum of observations by the total _
•
number of observations is known as ARITHMETIC MEAN. Arithmetic mean is represented by notation X( read X-bar)
CALCULATING THE MEAN FROM UNGROUPED DATA The mean X OF A Collection of observations x1,x2….xn is given by: _ X= (1/n) (x1 +x2 ….xn ) = ∑x/n n
= (1/n) ∑xi
i=1
In statistics the collection of all the elements under study is called a POPULATION whereas a collection of some (but not all) of the elements under study is called a sample. It is necessary to distinguish whether we are considering a population or a sample because certain formulas, like those for computing standard deviation of a population are different from those for computing the standard deviation of a sample. Hence population mean is denoted by
µ= Sum of all the data points in the population Size of population X= sum of all the data points in the sample Size of sample
The following table gives the annual profits of 10 financial services companies for the year2007-2008. Calculate arithmetic mean profit of companies. Companies
Net Profit (Rs. crore)
A B C D E F G H I J
9.19 4.27 1.74 5.71 4.80 4.01 9.22 3.00 15.16 3.93
CALCULATION FOR GROUPED DATA
Discrete Series:
fx ∑ X= ∑f
E.G. In a survey of 50 chemical industries, the following data was calculated:
Xi= Level of Profit (Rs. Lakh) Earned during 2002-2003
fi= No. of companies That earned Xi amount of profit
Xi fi
20 16 24 25 31
12 15 8 7 8
240 240 192 175 248
TOTAL
50
1095
USES OF A.M. • Mean is the simplest average to understand and easy to compute • It is relatively reliable in the sense that it does not vary too much when repeated samples are taken from one and the same population, at least not as much as other kind of statistical descriptions.
• The mean is typical in the sense that it is the centre of gravity balancing the values on the either side of it.
Advantages and Disadvantages of A.M. + Its concept is familiar and clear to all. + It is easy to understand and easy to calculate. + Provides a good basis for comparison. It may be affected the highly fluctuating values that are not far
from other values of the group. It is very difficult to find actual mean. Calculation of mean for a data set with open-ended classes, is not possible.