EXPLORING, DISPLAYING, AND EXAMINING DATA
1
Types of Data Analysis •
Exploratory data analysis
• the data guide the choice of analysis--or a revision of the planned analysis
•
Confirmatory data analysis
• closer to classical statistical inference in its •
2
use of significance and confidence may use information from a closely related data set or by validating findings through the gathering and analyzing of new data
Techniques to Display and Examine Distributions Frequency
Table Visual Displays
• Histograms
• Stem-and-leaf display • Box-plot Crosstabulation
3
of Variables
Techniques to Display and Examine Distributions Histograms
• Display all intervals in a distribution, even •
4
without observed values Examine the shape of the distribution for skewness, kurtosis, and the modal pattern
Techniques to Display and Examine Distributions Box-plot
(cont.)
(box and whisker-plot)
• Rectangular plot encompasses 50% of the data values
• Edges of the box (hinges)
• Center line through the width of the box
marks the median • Whiskers extend from the right and left hinges to the largest and smallest values 5
Techniques to Display and Examine Distributions
(cont.)
Transformation
• To improve interpretation and compatibility with other data sets • To enhance symmetry and stabilize spread • To improve linear relationships between and among variables
6
Improvement & Control Analysis Statistical
process control
• Uses statistical tools to analyze, monitor, and • •
improve process performance Total Quality Management Control chart
• Displays sequential measurements of a process together with a center line and control limits
• Upper control limit • Lower control limit 7
Types of Control Charts Variables
data (ratio or interval measurements)
• X-bar • R-charts • s-charts • Pareto Diagrams
• Bar chart whose percentages sum to 100 percent
8
Geographic Information Systems Systems
of hardware, software, and procedures that capture, store, manipulate, integrate, and display spatially-referenced data
9
Geographic Information Systems Minimum
four components
• Integrating information from various sources • Capturing data • Projection and restructuring • Modeling
10
Crosstabulation A
technique for comparing two classification variables
–Cells –Marginals –Contingency tables
11
Percentaging Errors Averaging
percentages without weighting Using too-large percentages (>100%) Using percentage with very small sample Citing percentage decrease exceeding 100 percent
12
Other Table-based Analysis Automatic
Interaction Detection (AID)
• Sequential partitioning procedure that uses a • • •
13
dependent variable and set of predictors Searches among up to 300 variables for the best single division of data into subsets according to each predictor variable, Chooses one division approach Splits the sample using chi-square tests to create multi-way splits.