An Introduction to SAS Part I Department of Computing Services Jie Chen Ph.D.
[email protected] March, 2003 3/13/2003
1
SAS (Statistical Analysis System) • SAS , short for Statistical Analysis System, is a software system designed for data management and analysis. • With base SAS software you can store data values and retrieve them, modify data. • Obtain statistical analysis and create reports 3/13/2003
2
Goal of This Workshop Learning enough to use the SAS System to input and output data, to create a simple SAS program file, to run SAS programs and to obtain data analysis using Statistical procedures and graphs.
3/13/2003
3
References • SAS/STAT User’s Guide Volume 1,2 and 3 (version 8.0) • SAS Language (version 6) • SAS Language and Procedures (version 6) • SAS Procedures Guide for Personal Computers (version 6.03)
3/13/2003
4
Table of Contents • • • •
Introduction To read and create data sets Submitting SAS programs Basic statistical procedures and graphs – – – –
PROC PRINT PROC SORT PROC MEANS PROC FREQ
3/13/2003
5
1. Introduction • • • •
SAS data management SAS procedures SAS program Open the SAS system
3/13/2003
6
SAS data Management • The SAS system works with numerical and character data. • The data must be in a SAS data set or an external file which can be read from SAS program. • Some external files can be imported to SAS system 3/13/2003
7
SAS Procedures • SAS procedures use data values from SAS data sets to produce preprogrammed reports requiring minimal effort from you • An example: PROC PRINT data = example; title ‘This is a subset of data’; run; 3/13/2003
8
A SAS Program The statements in a SAS program are divided into two kinds of steps: • DATA steps: – to create one or more new SAS data sets.
• PROC steps: – To call a procedure from SAS library and to execute that procedure. 3/13/2003
9
Open The SAS System • Click Start/Programs • Select The SAS System and – The SAS System for Windows v 8
• The PROGRAM EDITOR-(Untitled ) window is active for syntax.
3/13/2003
10
2. To Read and Create Data Sets • Type of text data file (ASCII format) – with delimiters – without delimiters
• Read data from the external text files • Output data to the external text files
3/13/2003
11
Entering data at the program editor window- CARDS statement data example; input age educ $ race sex ; cards; 84 8 1 1 65 bd 1 1 82 hg 1 0 ; run; 3/13/2003
12
Using PROC PRINT to view data data example; input age educ $ race sex ; cards; 84 8 1 1 65 bd 1 1 82 hg 1 0 ; run; proc print data = example; run; 3/13/2003
13
Submitting the SAS program • To execute the statements – Type in F3 or Click the run button – Or Select Local/Submit
.
• To recall SAS statement you have submitted – Click Window/Program Editor – Click Locals /Recall text 3/13/2003
14
After Running A SAS Program When you execute a SAS program, the output generated by SAS is divided into two major parts: • SAS log : contains information about the processing of the SAS program, including warning and error messages • Output: contains reports generated by SAS procedures and DATA steps. 3/13/2003
15
Saving a SAS program file containing data 1. Click on File…Save As 2. A dialog box will appear. 3. Verify that the desired folder and extension name (.sas) are chosen in the dialog box. 4. Type a file name in the File Name text box, for example ‘myfile’. 5. Click the Save button to save the data. 3/13/2003
16
SAS Naming Conventions SAS file names can be up to eight characters long. The first character must be letter (A-Z). The following characters can be letters and numbers (0-9). Blanks cannot appear in SAS file names and special characters such as $, @, %, &, and # are not allowed. 3/13/2003
17
Clearing Data Editor Window • Click Edit • Click Clear All • The program editor window is clear
3/13/2003
18
Reading Data from a Text File with Delimiters data sample; infile ‘a:\sample1.txt’; input age educ $ race sex ptotinc famsize fincome region; run; proc print data = sample; run; 3/13/2003
19
Reading Data from a Text File without Delimiters data sample2; infile ‘a:\sample2.txt’; input name $ 1-20 age 21-22 educ 23-24 race 25 sex 26 ptoi92 27-31 famsize 32 fincome 33-37 region 48; run; proc print data = sample2; run; 3/13/2003
20
To Create a Text Data File with fewer variables data _null_; set sample2; file ‘a:\sub1.dat’; put name $ age region; run;
3/13/2003
21
To Create a Text Data File with fewer observations data _null_; set sample; if age > 30; file ‘a:\sub2.dat’; put name $ age region; run; 3/13/2003
22
4. Basic Statistical Procedures and Graphs • • • • • •
PROC MEANS – for all sample – with BY statement – with CLASS statement
PROC SORT PROC FREQ PROC REG PROC PLOT PROC UNIVARIATE
3/13/2003
23
PROC MEANS data sample; infile ‘a:\sample1.txt’; input age educ $ race sex ptotinc famsize fincome region; run; proc means data = sample; var age ptotinc famsize ; run; 3/13/2003
24
PROC SORT proc sort data = sample out = list; by sex region; run; proc print data = list; run;
3/13/2003
25
PROC MEANS with BY Statement proc means data = list; var ptotinc famsize famsize; by sex; run;
3/13/2003
26
PROC MEANS with class Statement proc means data = sample; class sex; var ptotinc famsize famsize; run;
3/13/2003
27
PROC FREQ (one way) proc freq data = sample; table region race; run;
3/13/2003
28
PROC FREQ (two ways) proc freq data = sample; table region*race / nopercent chisq; run;
3/13/2003
29
3/13/2003
30
y
An Example of Linear Regression Y=a+bX+e
where
x:
is the regressor variable (famsize). a, b: are the unknown parameters. y: is the response variable (fincome). e : is the unknown error.
3/13/2003
31
PROC REG proc reg data = sample; model fincome = famsize ; output out = out1 p = pred1 r = resid1; run;
3/13/2003
32
The output of the regression model • • • • •
R-square = .2507 F Value = 9.37 P Value = .0048 < .05 Estimated intercept = 11617 Estimated slop = 8165, when the family size increases one unite the family income will increase 8146.
3/13/2003
33
=
β
0
A Fitted Equation y = 11617 + 8165 x fincome = 11617 + 8165 famsize
3/13/2003
34
Checking the Assumptions of normality and randomness proc rank data = out1 normal = blom; var resid1; ranks nscore; proc plot ; plot resid1*pred1; plot resid1*nscore; run; 3/13/2003
35
Q-Q Plot
3/13/2003
36
PROC UNIVARIATE proc univariate data = out1 normal plot; var resid1; run;
3/13/2003
37
The distribution of residuals
3/13/2003
38