Management Development Program Data Analysis using SPSS PRESENTER MR VENKAT
SPSS • Statistical • Package for • Social • Sciences
VERSIONS OF SPSS • SPSS Ver-1 to Ver-5 : DOS VERSIONS • SPSS Ver-6 to Ver-15 : WINDOWS VERSIONS • SPSS-X : For MAIN FRAMES (on various operating system platforms) • SPSS-LAN: For LANs • Web site: http://www.spss.com
BASIC APPLICATIONS • Creating data as Spreadsheet • Generating Reports as Tables • Statistical Analysis of Data • Graphic Presentations
MAIN STEPS IN USING SPSS
• • • •
Creating data or Getting data Defining data Modifying data Processing data – generating tables – statistical analysis – generating graphs
Structure of SPSS data file • Variables (Fields) in columns • Cases (Respondents) in rows • A case contains several variables
Data Definition • • • • • • • • • •
Variables Name Variable Type Field Width Decimal Positions Variable Label Value Labels Missing Values Column Width Alignment Scale
Variable Name
• Maxi. 8 characters (up to Ver 10) • First letter must be alphabet • Arithmetic operators, special symbols and blank spaces not permitted • Two variables can not have same name in one data file
Variable Label • It helps in reading outputs. • No restriction on characters.
Variable Type • Numeric (Floating point) • String (Character / Text) • Date • Currency
Value Labels • It helps in reading tables and other outputs. • For example variable “Marital Status” has five values (codes): – value 1 means “Never Married” – value 2 means “Currently Married” – value 3 means “Widow/Widower” – value 4 means “Divorced” – value 5 means “Separated”
Missing Values • These are values indicating “No Response” or “Not Applicable” in any variable. • Declaring missing values tells the SPSS package to ignore the cases containing these values during analysis. • A blank in Excel or dBase/FoxPro file is treated as missing value. • In SPSS data file, blanks appear as dots (.) denoting that theses are missing values.
Creating Data directly in SPSS • After opening SPSS click on “file - new data” on the menu bar. • On getting “SPSS data editor” window, click on “variable view” (right bottom) and start defining data file i.e. variable name, variable type, variable label, value labels, missing values etc.
MANIPULATING FILES • Insert variable • Sort cases • Transpose - Interchange rows and columns • Merge Files - Add cases, Add variables • Aggregate • Select cases - Select with “if” condition • Weight cases - for estimation / projection
VIEW
• Status Bar - process, selection, weight, n of cases • Tool Bar - for data, syntax, chart, navigator (output) • Fonts - type, size • Grid Lines • Value Labels
Data Modifications
• Compute - create new variable in existing data file through an arithmetic expression. • Recode - reorganize values of a variable. • Rank cases • Auto recode • Create Time Series • Replace missing values
STATISTICAL PROCEDURES OLAP Cubes • On Line Analytical Processing Cubes • Calculates uni-variate summary statistics with-in one or more categorical variables
DESCRIPTIVE STATISTICS – Frequencies - one variable at a time with various uni-variate statistics. – Descriptives - uni-variate statistics. – Explore - studying behaviour of variables. – Crosstabs - Two-way, Three-way – Ratio Statistics
MEANS • • • •
Display mean & S.D. by groups. One sample t-test. Two independent sample t-test. Two related samples or paired samples t-test. • One-way ANalysis Of VAriance (ANOVA) with post-hoc tests.
LINEAR REGRESSION • Methods: Enter, Stepwise, Remove, Backward, Forward. • Regression Coefficients: Estimate, Standard Error, Standardized coefficients, Significance. • Residuals: Durbin-Watson test (for autocorrelation) • Save: Predicted values, Residuals etc. • Plot: Histogram, Normal Probability plot. • Others: Multi-colinearity diagnosis, partial correlation, R-square change etc.
CORRELATIONS • Bivariate Correlations. • Partial Correlations. • Distances - Similarities and Dissimilarities
CLASSIFY • K-means Cluster • Hierarchical Cluster • Discriminant Analysis
DATA REDUCTION • Factor Analysis. • Correspondence Analysis. • Optimal Scaling - Homals, Princals, Overals.
FACTOR ANALYSIS • Methods: Principal Components, Principal Axis factoring, Maximum Likelihood etc. • Criteria: Minimum Eigen value, N of factors, Number of Iterations. • Rotation: Varimax, Quartimax, Equamax, Promax, Oblimin. • Display: Initial factor matrix, Rotated factor matrix. • Plot: Scree plot.
SCALES • Reliability Analysis - Alpha, Splithalf, Guttman, Parallel. • Multi Dimensional Scaling (MDS)
• • • • • • • •
NON-PARAMETRIC TESTS
Chi-square Binomial Runs test One sample K-S test Two independent samples tests Several independent samples tests Two related samples tests Several related samples tests
TIME SERIES ANALYSIS • Exponential Smoothing. • Autoregression. • Auto Regressive Integrated Moving Averages (ARIMA). • X11ARIMA. • Seasonal Decomposition.
MULTIPLE RESPONSE ANALYSIS • Defining sets. • Frequencies • Crosstabulation.
CHARTS • Bar, Line, Area, Pie, Hi-Low • Pareto Charts, Control Charts (Xbar,R,p,c) • Box Plot, Error Bar • Scatter Plot, Histogram, P-P Plot, Q-Q Plot, Sequence Charts • ROC Curve (Receivers’ Op Characteristic) • Time Series : Autocorrelations, Spectral Plots, Cross-correlations,
Types of Data • Nominal: A variable can be treated as nominal when its values represent categories with no intrinsic ranking; for example, the department of the company in which an employee works. • Examples of nominal variables include • region • zip code • religious affiliation etc.
Ordinal Data • A variable can be treated as ordinal when its values represent categories with some intrinsic ranking; for example, levels of service satisfaction from highly dissatisfied to highly satisfied. Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating scores.
Scale Data • A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate. Examples of scale variables include age in years and income in thousands of dollars.
Data Analysis • • • • • •
Simple Tabulation and Cross Tabulation Univariate and Bivariate Analysis Dependent and Independent variables First Stage Analysis- Simple Tabulation Second Stage Analysis- Cross Tabulation The Chi-square test for cross tabulation
Anova and the design of Experiments • The analysis of variance technique is used when the independent variables are of nominal scale (categorical) and the dependent variable is metric. • The independent variable could be different level of prices, different pack sizes, or different product colors and the dependent variable could be sales of the product.
Experimental Designs • Completely Randomized design in a one way ANOVA (single Factor) • Randomized Block Design (single blocking factor) • Latin Square Design (two blocking factor) • Factoral design with two or more factors.
Correlation and Regression • Correlation Analysis- to measure the degree of association between two sets of quantitative data e.g. how are sales of product A correlated with sales of product B etc. • Regression Analysis- to explain the variation in one variable based on the variation in one or more variables.
Regression • Basically two approaches: • 1. Hit and trial approach (stepwise regression)exploratory research • 2. A preconceived approach • The output consist of the beta coefficient for all the independent variables in the model. The output also gives the result of a t-test for significance of each variable in the model, and the result of F-test for model on the whole. • The coefficient of determination R2 is the total varience in y explained by all independent variables in the regression equation.
Problem • A manufacturer and marketer of electric motors would like to build a regression model consisting of 5 or 6 independent variables, to predict sales. Past data has been collected for 15 sales territories, on sales and 6 independent variables. Build a regression model and recommend whether or not it should be used by the company
Dependent variable Y= Sales in Rs. Lakh in the territory Independent Variable X1= Mkt potential in the territory X2= No. of dealers of the company in the territory X3= No. of sales people in the territory X4= Index of Competitor activity on a 5 point scale (1= low, 5= high) X5= No. of service people in the territory X6= No. of existing customers in the territory
Factor Analysis • • • •
For Data reduction There are two stages in Factor analysis Factor Extraction process Rotation of principal components
•ANY QUESTIONS PLEASE????? ???
THANK YOU