Manipulating SAS Datasets Last Updated : 29 June, 2004
Center of Excellence
Objectives Control which variables are written to an output data set during a DATA step. Control which variables are read from an input data set during a DATA step. Control how many observations are processed from an input data set during a DATA or PROC step.
Controlling Variable Output By default, the SAS System writes all variables from every input data set to every output data set. In the DATA step, the DROP and KEEP statements can be used to control which variables are written to output data sets.
The DROP and KEEP Statements (Review) Input SAS data set
Raw data file
PDV
DROP and KEEP statements Output SAS data set ...
Creating Multiple SAS Data Sets (Review) proc contents data=prog2.military; run;
Partial PROC CONTENTS Output -----Alphabetic List of Variables and Attributes----# Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 6 Airport Char 40 37 3 City Char 20 12 2 Code Char 3 9 5 Country Char 3 34 4 State Char 2 32 1 Type Char 9 0
Creating Multiple SAS Data Sets (Review) data army navy airforce marines; drop Type; set prog2.military; if Type eq 'Army' then output army; else if Type eq 'Naval' then output navy; else if Type eq 'Air Force' then output airforce; else if Type eq 'Marine' then output marines; run;
Creating Multiple SAS Data Sets (Review) Partial Log NOTE: There were 137 observations read from the data set PROG2.MILITARY. NOTE: The data set WORK.ARMY has 41 observations and 5 variables. NOTE: The data set WORK.NAVY has 28 observations and 5 variables. NOTE: The data set WORK.AIRFORCE has 64 observations and 5 variables. NOTE: The data set WORK.MARINES has 4 observations and 5 variables.
Controlling Variable Output The DROP and KEEP statements apply to all output data sets. However, when you create multiple output data sets, you can use the DROP= and KEEP= data set options to write different variables to different data sets.
The DROP= Data Set Option The DROP= data set option excludes variables from processing or from output SAS data sets. When the DROP= data set option is associated with an output data set, SAS does not write the specified variables to the output data set. However, the specified variables are available for SAS-data-set(DROP=variable-1 SAS-data-set(DROP=variable-1variable-2 variable-2 …variable-n) …variable-n) processing.
The KEEP= Data Set Option The KEEP= data set option specifies variables for processing or for writing to output SAS data sets. When the KEEP= data set option is associated with an output data set, only the specified variables are written to the output data set. However, all variables are available for processing. SAS-data-set(KEEP=variable-1 SAS-data-set(KEEP=variable-1variable-2 variable-2…variable-n) …variable-n)
The DROP= and KEEP= Data Set Options Input SAS data set
Raw data file
PDV
DROP and KEEP statements
DROP= and KEEP= data set options associated with an output data set
Output SAS data set ...
Controlling Variable Output data army(drop=City State Country Type) navy(drop=Type) airforce(drop=Code Type) marines; set prog2.military; if Type eq 'Army' then output army; else if Type eq 'Naval' then output navy; else if Type eq 'Air Force' then output airforce; else if Type eq 'Marine' then output marines;
Controlling Variable Output Partial Log NOTE: There were 137 observations read from the data set PROG2.MILITARY. NOTE: The data set WORK.ARMY has 41 observations and 2 variables. NOTE: The data set WORK.NAVY has 28 observations and 5 variables. NOTE: The data set WORK.AIRFORCE has 64 observations and 4 variables. NOTE: The data set WORK.MARINES has 4 observations and 6 variables.
Controlling Variable Output data army(keep=Code Airport) navy(keep=Code Airport City State Country) airforce(keep=Airport City State Country) marines; set prog2.military; if Type eq 'Army' then output army; else if Type eq 'Naval' then output navy; else if Type eq 'Air Force' then output airforce; else if Type eq 'Marine' then output marines;
Controlling Variable Output Partial Log NOTE: There were 137 observations read from the data set PROG2.MILITARY. NOTE: The data set WORK.ARMY has 41 observations and 2 variables. NOTE: The data set WORK.NAVY has 28 observations and 5 variables. NOTE: The data set WORK.AIRFORCE has 64 observations and 4 variables. NOTE: The data set WORK.MARINES has 4 observations and 6 variables.
Controlling Variable Input In the DATA step, the DROP and KEEP statements apply only to output SAS data sets. However, the DROP= and KEEP= data set options can apply to both input and output SAS data sets.
...
Controlling Variable Input Input SAS data set
Raw Data File
DROP= and KEEP= data set options in “read” statement
PDV
DROP and KEEP statements
DROP= and KEEP= data set options associated with an output data set
Output SAS data set ...
Controlling Variable Input SAS applies data set options to input data sets before it evaluates programming statements applies data set options to output data sets. data army(keep=Code Airport); set prog2.military(drop=City State Country); if Type eq 'Army' then output; run;
prog2.military TYPE $ 9
Compile
CODE $ 3
CITY $ 20
STATE COUNTRY AIRPORT $ $ $ 3 40 2
data army(keep=Code Airport); set prog2.military(drop=City State Country); if Type eq 'Army' then output; run;
PDV
...
prog2.military TYPE $ 9
CODE $ 3
CITY $ 20
STATE COUNTRY AIRPORT $ $ $ 3 40 2
data army(keep=Code Airport); set prog2.military(drop=City State Country); if Type eq 'Army' then output; run;
PDV TYPE $ 9
CODE $ 3
AIRPORT $ 40
...
prog2.military TYPE $ 9
CODE $ 3
CITY $ 20
STATE COUNTRY AIRPORT $ $ $ 3 40 2
data army(keep=Code Airport); set prog2.military(drop=City State Country); if Type eq 'Army' then output; run;
PDV TYPE $ D 9
CODE $ 3
AIRPORT $ 40
...
Controlling Which Observations Are Read By default, SAS begins processing a SAS data set with the first observation and continues processing until the last observation. The FIRSTOBS= and OBS= data set options can be used to control which observations are processed. You can use FIRSTOBS= and OBS= with input data sets only. You cannot use either data set option in the DATA statement.
The OBS= Data Set Option The OBS= data set option specifies an ending point for processing an input data set. SAS-data-set(OBS=n) SAS-data-set(OBS=n)
This option specifies the number of the last observation to process, not how many observations should be processed.
Controlling Which Observations Are Read The OBS= data set option in the SET statement stops reading after observation 25 in the prog2.military data set. data army; set prog2.military(obs=25); if Type eq 'Army' then output; run;
Controlling Which Observations Are Read Partial Log 60 61 62 63
data army; set prog2.military(obs=25); if Type eq 'Army' then output; run;
NOTE: There were 25 observations read from the data set PROG2.MILITARY. NOTE: The data set WORK.ARMY has 10 observations and 6 variables.
The FIRSTOBS= Data Set Option The FIRSTOBS= data set option specifies a starting point for processing an input data set. SAS-data-set(FIRSTOBS=n) SAS-data-set(FIRSTOBS=n)
FIRSTOBS= and OBS= are often used together to define a range of observations to be processed.
Controlling Which Observations Are Read The FIRSTOBS= and OBS= data set options in the SET statement read 15 observations from prog2.military. Processing begins with observation 11 and data ends army;with observation 25. set prog2.military(firstobs=11 obs=25); if Type eq 'Army' then output; run;
Controlling Which Observations Are Read Partial Log 67 68 69 70
data army; set prog2.military(firstobs=11 obs=25); if Type eq 'Army' then output; run;
NOTE: There were 15 observations read from the data set PROG2.MILITARY. NOTE: The data set WORK.ARMY has 5 observations and 6 variables.
Controlling Which Observations Are Read The FIRSTOBS= and OBS= data set options can also be used in a PROC step. The following PROC PRINT step begins processing the army data set at observation 2 and stops processing the army data set after observation 4. proc print data=army(firstobs=2 obs=4); var Code Airport; run;
Controlling Which Observations Are Read Partial Log 75 76 77
proc print data=army(firstobs=2 obs=4); var Code Airport; run;
NOTE: There were 3 observations read from the data set WORK.ARMY.
Controlling Which Observations Are Read Partial Output Obs 2 3 4
Code LGF SYL HGT
Airport Laguna Army Air Field Roberts Army Air Field Tusi Army Heliport
c02s3d1.sas
Questions