Graphical representation of data
Compiled by Pedup Dukpa (For third year Paro College of Education students)
Now, Let’s get it started What do we mean by graphs? Who invented this versatile device called graphs? Why are graphs important? When do we use/apply graphs? How do we know what kinds of graphs to use? Where do we usually see graphs?
My personal definition of graphs: I believe graphs to be a universal language that speaks volumes in terms of visual representation of data; that can be easily read and understood by everyone.
Inventor of graphs: William Playfair (1759-1823),
Scottish engineer and political economist, is the principal inventor of statistical graphs. In 1786, he published “Commercial and Political Atlas” that contained 44 charts.
He invented three of the four basic forms of graph:
The statistical line graph The bar graph The pie graph
Different forms/types of Graphical representation of data: Pictograph Line graph Bar graph Histogram Dots Scatter diagram Pie Chart Dendrogram
Contd’ Frequency polygon Ogive Stem and leaf Box and whisker Directed and undirected graphs Polar coordinate graphs Three dimensional graphs etc.
Choosing the right graphs As discussed in the previous class… Skills and techniques required… are….????
Pictograph/ Pictorial graph: A pictograph or
pictorial graph involves categories and counts of the number of people or things in a category (frequency). The layout of the graph can be horizontal or
Purpose of Pictograph: To simply and clearly illustrate a mathematical relation. No attempt is made to show data points or errors on such a graph. Here, we have two types of graphs: Concrete Object Graph Symbolic Graph
Bar Graphs: Bar graph is a pictorial representation of frequency distribution of ungroup data by a number of bars (rectangles) of uniform width erected either vertically or horizontally with equal spacing between them.
Notice that all data does not fall evenly on a multiple of 20, in fact, the bar is in between two grid lines. Bar graphs are useful to get an overall idea of trends in responses
For example:
Activity
#
Visit W/Friends
175
Talk on Phone
168
Play Sports
120
Earn Money
120
Use Computers
65
Example 1: The number of trees planted by Paro College of Education students in different years on June 2 is given below: Years
1 997
# of trees planted
400
1998 450
1999
2000
2001
700
750
900
2002 1500
Total 4700
For the class to do: Problem 1: The data below shows the number of students present in different classes on a particular day:
Represent/draw the above data as bar graph.
Solution:
Homework Question 1: The data regarding causes of accidents in factories are given below:
Draw a bar graph to represent the data given above.
Interpretation/Reading of bar graphs: Referring to Example 1: Read the bar graph and answer the following questions: In which year was the maximum number of trees planted by the Paro College of Education students? What trend does the number of trees planted show? In which years, the number of trees planted differ by 50 only?
Homework questions on reading/interpretation of bar graph: Referring to homework question 1: Answer the following questions: Which cause is responsible for the maximum number of accidents in factories? Which cause is the minimum? Can you think of one of the “other” causes? How many percent of accident could have been avoided by timely action?
Histogram and frequency polygon: Histogram is a graphical representation of a continuous frequency distribution (i.e. grouped frequency distribution with no space between the rectangles/bars. Traditionally, class-intervals are taken along the horizontal axis while the respective class frequencies are taken along the vertical axis. Note: The areas of the rectangles are proportional to the frequencies. Frequency polygon is formed by the joining of the mid-points of the tops of the adjoining rectangles in a histogram
For example: Consider the following frequency distribution of weights of 30 students of class third year math-physic/IT students.
C.I. (in kgs)
45-50
50-55
55-60
60-65
65-70
Total
Frequency
3
7
12
5
3
30
Draw a histogram and a frequency polygon based on the above data.
Example of constructing the frequency polygon without the help of histogram: If we were to draw a frequency polygon for the amount of pocket allowance that a student in third year math-physic/IT gets (remember, this is just arbitrary) provided the following data: Pocket money
0-50
50-100
100150
150200
200250
250300
# of student s
16
25
13
26
15
5
For the class to do: Problem 2: The daily earnings of 100 shopkeepers in Paro Valley are given below: Daily earnin g (in Nu.) # of shops
200300
300400
400500
500600
600700
700800
800900
3
12
15
30
25
12
3
Draw a histogram and a frequency polygon to represent the above data.
Solution:
Stem and Leaf Plot: A stem and leaf plot is a graphical data analysis technique for summarizing the distributional information of a variable. It is similar to a histogram, but it preserves the original numeric values in the data. As such, it is an effective alternative to the histogram for small to moderate size data sets. However, it is not recommended for large data sets. In a stem-and-leaf plot each data value is split into a "stem" and a "leaf". The "leaf" is usually the last digit of the number and the other digits to the left of the "leaf" form the "stem". The number 123 would be split as:
Stem 12
Leaf 3
Constructing a stem and leaf plot: The Math test scores out of 50 marks are as follows: 35, 36, 38, 40, 42, 42, 44, 45, 45, 47, 48, 49, 50, 50, 50. Solution: The stem and leaf plot should look like, Math Test Scores (out of 50 pts) Stem
Leaf
3
568
4
022455 789
5
000
A stem-and-leaf plot shows the shape and distribution of data. It can be clearly seen in the diagram above that the data clusters around the row with a stem of 4.
Points to remember: Leaf is the digit in the place farthest to the right in the number, and the stem is the digit, or digits, in the number that remain when the leaf is dropped. To show a one-digit number (such as 7) using a stem-and-leaf plot, use a stem of 0 and a leaf of 7. To find the median in a stem-and-leaf plot, count off half the total number of leaves.
For comparing two sets of data: We use back-to-back stem-and-leaf plot. For example: The numbers 40, 42, and 43 are from Data Set A & the numbers 41, 45, 46, and 47 are from Data Set B. Construct a back-to-back stem- and-leaf plot. Solution:
Data Set A
Data Set B
Leaf
Stem
Leaf
320
4
1567
Advantage of stem and leaf plot: The stem-and-leaf plot over the histogram is that the stem-and-leaf plot displays not only the frequency for each interval, but also displays all of the individual values within that interval. Moreover, the median and mode are easily readable.
Home-Work on stem and leaf plot:
Construct a stem and leaf plot, find the median and mode of the data using the plot created.
Special Case: (when the one of the stem and leaf values are missing) For example, take the following data set: 10, 11, 20, 21, 24, 27, 27, 27, 28, 28, 29, 29, 29, 31, 33, 33, 33, 33, 33, 39, 53 (Notice here, 40’s are missing) The stem and leaf plot would then be: 1|01 2|01477788999 3|1333339 4| 5|3 Even though the peak corresponds with the 20s cohort, it's clear that the most frequently occurring value is 33, and hence the mode, is 33.
BOX-AND-WHISKER PLOT / 5 NUMBER SUMMARY:
They allow people to explore data and to draw informal conclusions when two or more variables are present. It shows only certain statistics rather than all the data. Five-number summary is another name for the visual representations of the box-and-whisker plot. The fivenumber summary consists of the median, the quartiles, and the smallest and greatest values in the distribution. Immediate visuals of a box-andwhisker plot are the center, the spread, and the overall range of distribution. There are two types of box and whisker plot: Traditional box and whisker plot Modified version of the box and whisker plot.
Traditional box and whisker/ The 5 Number Summary The five number summary is another name for the visual representation of the box and whisker plot.
The five number summary consist of :
The The The The The
median ( 2nd quartile) 1st quartile 3rd quartile maximum value in a data set minimum value in a data set
Review on The Median The median is the middle value of a set of data once the data has been ordered. Example 1. Ugyen hits 11 balls at T/phu driving
range. The recorded distances of his drives, measured in yards, are given below. Find the median distance for his/her drives. 85, 125, 130, 65, 100, 70, 75, 50, 140, 95, 70 50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 140
Single middle value Median drive = 85 yards
Ordered data
Review on The Median The median is the middle value of a set of data once the data has been ordered. Example 2. Rinzin hit 12 balls at T/phu driving range.
The recorded distances of his drives, measured in yards, are given below. Find the median distance for his/her drives. 85, 125, 130, 65, 100, 70, 75, 50, 140, 135, 95, 70 50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 135, 140
Two middle values so take the mean.
Ordered data
Median drive = 90 yards
Finding the median, quartiles and inter-quartile range. Example 3: Find the median and quartiles for the data below.
12,
6,
4,
9,
8,
4,
9,
8,
5,
9,
8,
10
10,
12
Order the data Q2
Q1
4,
4,
5,
6,
Lower Quartile = 5½
8,
8,
Q3
8,
Median = 8
9,
9,
9,
Upper Quartile = 9
Inter- Quartile Range = 9 - 5½ = 3½
Finding the median, quartiles and inter-quartile range. Example 4: Find the median and quartiles for the data below.
6,
3,
9,
8,
4,
10,
8,
4,
15,
8,
10
Order the data Q2
Q1
3,
4,
4,
6,
Lower Quartile = 4
8,
8,
Median = 8
Q3
8,
9,
10, Upper Quartile = 10
Inter- Quartile Range = 10 - 4 = 6
10,
15,
Anatomy of a Box and Whisker Diagram.
Lower Lowest Quartile Value Whisker
4
5
6
Median
Upper Quartile Whisker
Box
7
Highest Value
8
9
10
11
12
Note: plotting the median, lower quartile and upper quartile i.e. the box portion shows the range of middle 50% of the members with the median being the midpoint.
Drawing a Box Plot. Example 5: Draw a Box plot for the data below Q2
Q1
4,
4,
5,
6,
8,
8,
Lower Quartile = 5½
4
5
Q3
8,
Median = 8
6
7
8
9
9,
9,
9,
Upper Quartile = 9
10 11
12
10,
12
Drawing a Box Plot. Example 6: Draw a Box plot for the data below Q2
Q1
3,
4,
4,
6,
8,
Lower Quartile = 4
3
4
5
6
Q3
8,
8,
Median = 8
7
8
9
9,
10,
10,
15,
Upper Quartile = 10
10 11
12 13
14 15
Drawing a Box Plot. Question: Sonam recorded the heights in cm of boys in his class as shown below. Draw a box plot for this data. Q2
QL
Qu
137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186
Lower Quartile = 158
130
140
Upper Quartile = 180
Median = 171
150
160
170
180
cm
190
Drawing a Box Plot. Question: Tashi recorded the heights in cm of girls in the same class and
constructed a box plot from the data. The box plots for both boys and girls are shown below. Use the box plots to choose some correct statements comparing heights of boys and girls in the class. Justify your answers.
Boys
130
140
150
160
170
180
cm
Girls 1. The girls are taller on average.
2. The boys are taller on average.
3. The girls show less variability in height.
5. The smallest person is a girl
4. The boys show less variability in height.
6. The tallest person is a boy
190
Problem for the class to do: Suppose you caught 13 fish, during the after-math of the Paro Flood along the river side and you measured the length of the fish to be: (in cms)
12, 13,5,8,9,20,16,14,14,6,9,12,12 Draw a box and whisker plot based on medians. Solution: Step 1: Rewrite the data in order, from smallest length to largest:
5,6,8,9,9,12,12,12,13,14,14,16,20 Step 2: Now find the median of all the numbers. Notice that since there are 13 numbers, the middle one will be the seventh number: i.e. 12 This must be the median (middle number) because there are six numbers on each side.
Step 3: Is to find the lower quartile. This is the middle of the lower six numbers. The exact centre is half-way between 8 and 9 ... which would be 8.5 Step 4: Now find the upper quartile. This is the middle of the upper six numbers. The exact centre is half-way between 14 and 14 ... which must be 14 Now we are ready to start drawing the actual box and whisker diagram Step 5: Draw an ordinary number line that extends far enough in both directions to include all the numbers in your data: locate the 5 number…
5
10
15
20
Final box and whisker plot:
Modified version of the box and whisker plot They do not typical contain the median and the quartiles though they do show the range of the data. It is easier to compute the maximum, minimum, mean and the standard deviation of the data than it is to bin the data to compute the other variables. Especially, when the data has a probability density function (PDF) which is similar to that of the normal distribution. The diagram normally includes the range, the mean, and value one standard deviation about the mean. These diagrams clearly show the location of 66% of the values by the range of the box. Recall in the traditional box and whisker diagram the bars show 50% of the data about the median.
Short-comings of this plot: This method of display fails to show if the data does not have a near normal PDF. For example: Highly skewed and bimodal data are more difficult to discern using this data display method. The median and the traditional boxand-whisker diagram are often more representative when the data is bimodal.
Homework: Investigate the Modified Box and whisker plot Find out the difference between the two types of box and whisker plot When and where should be use traditional and the modified box and whisker plot?
The End (Tashi Delek & Have a Great Day)