Manipulation and Transformation of Data Recode Compute Replace missing values Select cases Sort cases Merge files Aggregate data
Methods for transforming data Computing a new variable Recode
into same variable different variable
Select
subset of cases Random sample Replace missing values
Compute a new variable You
can calculate different variables from the existing variables. For this you need to know the way to compute your target variable from the existing variables. You can perform operations like addition, subtraction, division and multiplication of variables to create a new variable.
Recode into same variable Using
SPSS you can recode a variable into the same variable?
Recode into different variable You
can Recode existing variable into a different variable. Recode into Different Variables reassigns the values of existing variables or collapses ranges of existing values into new values for a new variable. For example, you could collapse salaries into a new variable containing salary-range categories.
Select subset of cases You
can select subset of cases for your analysis using SPSS. For example, you can use select procedure if you want to do analysis of the relation between education of females and their income from the data set that has information of both males and females.
Replace missing values Missing
observations can be problematic in analysis, and some time series measures cannot be computed if there are missing values in the series. Replace Missing Values creates new time series variables from existing ones, replacing missing values with estimates computed with one of several methods.
Aggregate data Aggregate
Data combines groups of cases into single summary cases and creates a new aggregated data file. Cases are aggregated based on the value of one or more grouping variables. The new data file contains one case for each group.
Create time series Create
Time Series creates new variables based on functions of existing numeric time series variables. These transformed values are useful in many time series analysis procedures. Available functions for creating time series variables include differences, moving averages.
Sort cases You
can sort cases of the data file based on the values of one or more sorting variables. You can sort cases in ascending or descending order. If you select multiple sort variables, cases are sorted by each variable within categories of the prior variable on the Sort list.
Merge files There
are two types of merging:
Adding new cases for the same
variables. Adding new variables for the same cases. Depending
on what you want to add you select this option.
Add cases
Add Cases merges the working data file with a second data file that contains the same variables but different cases. For example, you might record the same information for customers in two different sales regions and maintain the data for each region in separate files. Variables from the working data file are identified with an asterisk (*). Variables from the external data file are identified with a plus sign (+).
Add variables Add
Variables merges the working data file with an external data file that contains the same cases but different variables. For example, you might want to merge a data file that contains pre-test results with one that contains post-test results. You can save this new file with a new name after merging.
Before merging…
Cases must be sorted in the same order in both data files. If one or more key variables are used to match cases, the two data files must be sorted by ascending order of the key variable(s). Variable names in the second data file that duplicate variable names in the working data file are excluded by default because Add Variables assumes that these variables contain duplicate information.
Data Analysis
Types of Variables
Nominal example: nationality, race, gender… based on a concept (two categories variable called
“dichotomous nominal”)
Ordinal example: knowledge, skill... (more than, equal, less than) rank-ordered in terms of a criterion from highest to lowest
Interval/Ratio example: age, income, speed... based on arithmetic qualities and have a fixed zero point
Types of Analysis Univariate Analysis Descriptive Statistics (Summarising Data)
Frequency Distributions Frequency tables Histograms
Types of Analysis Univariate Analysis Descriptive Statistics (Summarising Data)
Central Tendency The mean The median The mode
Types of Analysis Univariate Analysis
Descriptive Statistics (Summarising Data)
Central Tendency
The mean the arithmetic average
µ = (Σ X) / N identifies the balance point in a distribution of scores.
Types of Analysis Univariate Analysis Descriptive Statistics (Summarising Data)
Variance spread of data around the mean The range ○ Standard deviation ○
Types of Analysis Univariate Analysis The Range The range is the difference between the highest and lowest scores.
= Range = Highest Score - Lowest Score
Types of Analysis Univariate Analysis
Standard Deviation The standard deviation is the average amount of deviation from the mean within a group of scores. The greater the spread of scores, the greater the standard deviation.
Types of Analysis Skewness Skewness refers to the degree and direction of asymmetry in a distribution.
No Skew
Positively Skewed
Negatively Skewed
Types of Analysis Bivariate Analysis Exploring differences relationships between two variables
Types of Analysis Bivariate Analysis
Exploring differences between two variables Criteria for selecting bivariate tests of differences Type of data (nominal/ordinal/interval) ○ Purpose of investigation (means/varience) ○ Relationship between groups (independent/dependent) ○ Number of groups (two/more) ○
Types of Analysis Bivariate Analysis
Exploring differences between two variables Parametric vs non-parametric tests The scale of measurment is of equal interval. ○ The distribution is normal. ○ The variences of both variables are homogenous. ○
Types of Analysis Bivariate Analysis
Exploring differences between two variables
1. Non-parametric tests Categorical variables Non-categorical variables 2. Parametric tests Non-categorical variables
Types of Analysis Bivariate Analysis
Exploring differences between two variables
Non-parametric tests - Categorical variables - Binomial test: to compare frequencies, two categories, one sample Example: Ratio of male to female in specific industry compared to industry in general. - Chi-square test: to compare frequencies, more than two categories, one sample Example: Number of workers from four different ethnic groups
Types of Analysis Bivariate Analysis
Exploring differences between two variables
Non-parametric tests - Categorical variables - Crosstabulation: two or more categories, unrelated samples Example: The proportion of male to female workers in both white and black workers. - Q test: three or more categories, related samples Example: The number of people who didn’t attend the three meetings.
Types of Analysis Bivariate Analysis
Exploring differences between two variables
Non-parametric tests - Non-categorical variables - Kolmogorov-Smirnov test: one sample & two unrelated samples - Median test: two or more unrelated samples - Mann-Whitney U test: two unrelated samples - Kruskal-Wallis H test: three or more unrelated samples - Wilcoxon test: two related samples - Friedman test: three or more related samples
Types of Analysis Bivariate Analysis
Exploring differences between two variables
Non-parametric tests - Non-categorical variables - Mann-Whitney U test: two unrelated samples Example: Rated quality of work for men and women. - Wilcoxon test: two related samples Example: Rated quality of work is the same in the first and second month.
Types of Analysis Bivariate Analysis
Exploring differences between two variables
Parametric tests - Non-categorical variables - t test: one sample Example: The mean of a sample to that of the population - t test: two unrelated samples Example: Job satisfaction between men and women - One-way ANOVA (analysis of variance): three or more unrelated means Example: Job satisfaction of four ethnic groups
Types of Analysis Bivariate Analysis
Exploring differences between two variables
Parametric tests - Non-categorical variables - Levene’s test: three or more unrelated variances Example: The variances of job satisfaction across four ethnic groups - t test: two related means Example: Means of the same subject s in two conditions
Types of Analysis Bivariate Analysis Exploring relationships between two variables: Crosstabulation To demonstrate the presence or absence of a relationship (nominal and ordinal variables)
Count
Absenteesim Total
no yes
Satisfaction no yes 5 9 4 2 9 11
Total 14 6 20
Types of Analysis Bivariate Analysis Exploring relationships between two variables: Correlation To show the strength and the direction of a relationship (ordinal and interval variables)
1. Rank correlation (ordinal variables) 2. Linear correlation (interval variables)
Types of Analysis Bivariate Analysis
Exploring relationships between two variables Rank correlation for ordinal variables and non-parametric samples ○ Spearman’s rho Kendall’s tau
Types of Analysis Bivariate Analysis
Exploring relationships between two variables Linear correlation for interval variables and parametric samples Pearson’s r ○ Regression (for making predications of likely values of the dependent variable)
www.spss.com