Sas Sort Accum Total

  • Uploaded by: sarath.annapareddy
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Sas Sort Accum Total as PDF for free.

More details

  • Words: 2,756
  • Pages: 77
Sort And Accumulating Totals Last Updated : 29June, 2004

Center of Excellence

Objectives  Understand how the SAS System initializes the value of a variable in the PDV.  Prevent reinitialization of a variable in the PDV.  Create an accumulating variable.

Creating an Accumulating Variable SaleDate

SaleAmt

01APR2001 02APR2001 03APR2001 04APR2001 05APR2001 06APR2001 07APR2001 08APR2001 09APR2001 10APR2001 11APR2001 12APR2001

498.49 946.50 994.97 564.59 783.01 228.82 930.57 211.47 156.23 117.69 374.73 252.73

The SAS data set prog2.daysales contains daily sales data for a retail store. There is one observation for each day in April showing the date (SaleDate) and the total receipts for that day (SaleAmt).

Creating an Accumulating Variable The store manager also wants to see a running total of sales for the month as of each day. Partial Output SaleDate

Sale Amt

Mth2Dte

01APR2001 02APR2001 03APR2001 04APR2001 05APR2001

498.49 946.50 994.97 564.59 783.01

498.49 1444.99 2439.96 3004.55 3787.56

Creating Mth2Dte By default, variables created with an assignment statement are initialized to missing at the top of the DATA step.

Mth2Dte=Mth2Dte+SaleAmt;

An accumulating variable must retain its value from one observation to the next.

The RETAIN Statement General form of the RETAIN statement:

RETAIN RETAINvariable-name variable-name … …;; The RETAIN statement prevents SAS from reinitializing the values of new variables at the top of the DATA step. Previous values of retained variables are available for processing across iterations of the DATA step.

The RETAIN Statement The RETAIN statement  retains the value of the variable in the PDV across iterations of the DATA step  initializes the retained variable to missing before the first execution of the DATA step if an initial value is not specified  is a compile-time-only statement.

Retain Mth2Dte and Set an Initial Value If you do not supply an initial value, all the values of Mth2Dte will be missing. retain Mth2Dte 0;

Creating an Accumulating Variable

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

Compile SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

R SALEDATE SALEAMT

MTH2DTE

... ...

Execute SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

R SALEDATE SALEAMT

.

.

MNTH2DTE

0 ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

R SALEDATE SALEAMT

15066 .

498.49 .

MNTH2DTE

0 ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

0+498.49 SALEDATE SALEAMT

15066 .

498.49 .

R MNTH2DTE

498.49 0 ... ...

SaleDate 15066 15067 15068 15069 15070

data mnthtot; set prog2.daysales; SaleAmt Implicit retain Mth2Dte 0; Implicit Return Output Mth2Dte=Mth2Dte+SaleAmt; 498.49 run; 946.50 994.97 564.59 R 783.01

SALEDATE SALEAMT

15066

498.49

MNTH2DTE

498.49

Write out observation to mnthtot. ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; Implicit Output

R SALEDATE SALEAMT

15066

498.49

MNTH2DTE

498.49

Write out observation to mnthtot. ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; Implicit Return

R SALEDATE SALEAMT

15066

498.49

MNTH2DTE

498.49 ... ...

SaleDate SaleDate 15066 15066 15067 15067 15068 15068 15069 15069 15070 15070

SaleAmt SaleAmt 498.49 498.49 946.50 946.50 994.97 994.97 564.59 564.59 783.01 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

R SALEDATE SALEAMT

15066

498.49

MNTH2DTE

498.49 ... ...

SaleDate SaleDate 15066 15066 15067 15067 15068 15068 15069 15069 15070 15070

SaleAmt SaleAmt 498.49 498.49 946.50 946.50 994.97 994.97 564.59 564.59 783.01 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

R SALEDATE SALEAMT

15066 15067

498.49 946.50

MNTH2DTE

498.49 ... ...

SaleDate SaleDate 15066 15066 15067 15067 15068 15068 15069 15069 15070 15070

SaleAmt SaleAmt 498.49 498.49 946.50 946.50 994.97 994.97 564.59 564.59 783.01 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

498.49+946.50 SALEDATE SALEAMT

15066 15067

498.49 946.50

R

MNTH2DTE

1444.99 498.49 ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; Implicit Return

Implicit Output

R SALEDATE SALEAMT

15067

946.50

MNTH2DTE

1444.99

Write out observation to mnthtot. ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; Implicit Output

R SALEDATE SALEAMT

15067

946.50

MNTH2DTE

1444.99

Write out observation to mnthtot. ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; Implicit Return

R SALEDATE SALEAMT

15067

946.50

MNTH2DTE

1444.99 ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

R SALEDATE SALEAMT

15067

946.50

MNTH2DTE

1444.99 ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

R SALEDATE SALEAMT

15068 15067

946.50 994.97

MNTH2DTE

1444.99 ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

1444.99+994.97 SALEDATE SALEAMT

15068 15067

946.50 994.97

R

MNTH2DTE

2439.96 1444.99 ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; Implicit Return

Implicit Output

R SALEDATE SALEAMT

15068

994.97

MNTH2DTE

2439.96

Write out observation to mnthtot. ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; Implicit Output

R SALEDATE SALEAMT

15068

994.97

MNTH2DTE

2439.96

Write out observation to mnthtot. ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; Implicit Return

R SALEDATE SALEAMT

15068

994.97

MNTH2DTE

2439.96 ... ...

SaleDate 15066 15067 15068 15069 15070

SaleAmt 498.49 946.50 994.97 564.59 783.01

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

Continue processing until end of SAS data set SALEDATE SALEAMT

15068

994.97

R

MNTH2DTE

2439.96

Creating an Accumulating Variable proc print data=mnthtot noobs; format SaleDate date9.; run; Partial PROC PRINT Output SaleDate 01APR2001 02APR2001 03APR2001 04APR2001 05APR2001

Sale Amt

Mth2Dte

498.49 946.50 994.97 564.59 783.01

498.49 1444.99 2439.96 3004.55 3787.56

Accumulating Totals: Missing Values What happens if there are missing values for SaleAmt?

data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2dte=Mth2Dte+SaleAmt; run;

Undesirable Output

SaleDate

Sale Amt

Mth2Dte

01APR2001 02APR2001 03APR2001 04APR2001 05APR2001

498.49 . 994.97 564.59 783.01

498.49 . . . .

Missing value

Subsequent values of Mth2Dte are missing ... ...

The Sum Statement When creating an accumulating variable, an alternative to the RETAIN statement is the sum statement. General form of the sum statement:

variable variable ++expression; expression;

The Sum Statement The sum statement  creates the variable on the left side of the plus sign if it does not already exist  initializes the variable to zero before the first iteration of the DATA step  automatically retains the variable  adds the value of the expression to the variable at execution  ignores missing values.

Accumulating Totals: Missing Values

data mnthtot2; set prog2.daysales2; Mth2Dte+SaleAmt; run;

Accumulating Totals: Missing Values proc print data=mnthtot2 noobs; format SaleDate date9.; run; Partial PROC PRINT Output SaleDate

SaleAmt

Mth2Dte

01APR2001 02APR2001 03APR2001 04APR2001 05APR2001

498.49 . 994.97 564.59 783.01

498.49 498.49 1493.46 2058.05 2841.06 c03s1d1.sas

Objectives  Define First. and Last. processing.  Calculate an accumulating total for groups of data.  Use a subsetting IF statement to output selected observations.

Accumulating Totals for Groups EmpID

Salary

Div

E00004 E00009 E00011 E00036 E00037 E00048 E00077 E00097 E00107 E00123 E00155 E00171

42000 34000 27000 20000 19000 19000 27000 20000 31000 20000 27000 44000

HUMRES FINACE FLTOPS FINACE FINACE FLTOPS APTOPS APTOPS FINACE APTOPS APTOPS SALES

SAS data set The prog2.empsals contains each employee’s identification number (EmpID), salary (Salary), and division (Div). There is one observation for each employee.

Desired Output Human resources wants a new data set that shows total salary paid for each division. Div

DivSal

APTOPS FINACE FLTOPS HUMRES SALES

410000 163000 318000 181000 373000

Grouping the Data A B

E

D

C

You must group the data in the SAS data set before you can perform processing.

Review of the SORT Procedure You can rearrange the observations into groups using the SORT procedure. General form of a PROC SORT step: PROC PROCSORT SORTDATA=input-SAS-data-set DATA=input-SAS-data-set ; ; BY BY BY-variable BY-variable ...; ...; RUN; RUN;

The SORT Procedure  The SORT procedure  rearranges the observations in a DATA set  can sort on multiple variables  creates a SAS data set that is a sorted copy of the input SAS data set  replaces the input data set by default.

Sorting by Div

proc sort data=prog2.empsals out=salsort; by Div; run;

Processing Data in Groups Div

Salary

APTOPS APTOPS APTOPS FINACE FINACE FINACE FINACE SALES SALES

20000 100000 50000 25000 20000 23000 27000 10000 12000

DivSal 170000

95000 22000

... ...

BY-Group Processing

DATA DATAoutput-SAS-data-set; output-SAS-data-set; SET SETinput-SAS-data-set; input-SAS-data-set; BY BYBY-variable BY-variable … …;; statements> RUN; RUN;

BY-Group Processing

data divsals(keep=Div DivSal); set salsort; by Div; additional SAS statements run;

BY-Group Processing A BY statement in a DATA step creates temporary variables for each variable listed in the BY statement. General form of the names of BY variables in a DATA step:

First.BY-variable First.BY-variable Last.BY-variable Last.BY-variable

First. and Last. Values  The First. variable has a value of 1 for the first observation in a BY group; otherwise, it equals 0.  The Last. variable has a value of 1 for the last observation in a BY group; otherwise, it equals 0.

Use these temporary variables to conditionally process sorted, grouped, or indexed data.

First. / Last. Example Look Ahead Div

Salary

APTOPS APTOPS APTOPS FINACE FINACE FINACE FINACE SALES SALES

20000 100000 50000 25000 20000 23000 27000 10000 12000

First.Div 1 Last.Div 0

... ...

First. / Last. Example

Div APTOPS APTOPS APTOPS FINACE FINACE FINACE FINACE SALES SALES

Look Ahead Salary First.Div 0 20000 100000 50000 Last.Div 25000 0 20000 23000 27000 10000 12000 ... ...

First. / Last. Example

Div APTOPS APTOPS APTOPS FINACE FINACE FINACE FINACE SALES SALES

Salary Look Ahead 20000 100000 50000 25000 20000 23000 27000 10000 12000

First.Div 0 Last.Div 1

... ...

First. / Last. Example

Div

Salary

APTOPS APTOPS APTOPS FINACE FINACE FINACE FINACE SALES SALES

20000 Look Ahead 100000 50000 25000 20000 23000 27000 10000 12000

First.Div 1 Last.Div 0

... ...

First. / Last. Example

Div

Salary

APTOPS APTOPS APTOPS FINACE FINACE FINACE FINACE SALES SALES

20000 Look Ahead 100000 50000 25000 20000 23000 27000 10000 12000

First.Div 1 Last.Div 0

What Must Happen When?  There is a three-step process for accumulating totals:  1. Set the accumulating variable to 0 at the start of each BY group.  2. Increment the accumulating variable with a sum statement (automatically retains).  3. Output only the last observation of each BY group.

Accumulating Totals for Groups 1.

Set the accumulating variable to 0 at the start of each BY group.

data divsals(keep=Div DivSal); set salsort; by Div; if First.Div then DivSal=0; additional SAS statements run;

Accumulating Totals for Groups 2. Increment the accumulating variable with a sum statement (automatically retains). data divsals(keep=Div DivSal); set salsort; by Div; if First.Div then DivSal=0; DivSal+Salary; additional SAS statements run;

First. / Last. Example Div

Salary

DivSal

APTOPS APTOPS APTOPS FINACE FINACE FINACE FINACE SALES SALES

20000 100000 50000 25000 20000 23000 27000 10000 12000

20000 120000 170000 25000 45000 68000 91000 10000 22000

Subsetting IF Statement The subsetting IF defines a condition that the observation must meet to be further processed by the DATA step. General form of the subsetting IF statement:

IF IFexpression; expression;  If the expression is true, the DATA step continues processing the current observation.  If the expression is false, SAS returns to the top of the DATA step.

Accumulating Totals for Groups

3. Output only the last observation of each BY group. data divsals(keep=Div DivSal); set salsort; by Div; if First.Div then DivSal=0; DivSal+Salary; if Last.Div; run;

Subsetting IF Statement (Review) Initialize InitializePDV. PDV. Execute Executeprogram program statements. statements.

NO Is the condition true?

If condition; Execute Executeadditional additional program programstatements. statements.

Output Outputobservation observationto to SAS SASdata dataset. set.

YES

... ...

Accumulating Totals for Groups Partial Log NOTE: There were 39 observations read from the data set WORK.SALSORT. NOTE: The data set WORK.DIVSALS has 5 observations and 2 variables. NOTE: DATA statement used: real time 0.74 seconds cpu time 0.33 seconds

Accumulating Totals for Groups proc print data=divsals noobs; run; PROC PRINT Output Div DivSal APTOPS FINACE FLTOPS HUMRES SALES

410000 163000 318000 181000 373000

c03s2d1.sas

Input Data EmpID E00004 E00009 E00011 E00036 E00037 E00077 E00097 E00107 E00123 E00155 E00171 E00188 E00196 E00210 E00222 E00236

Salary 42000 34000 27000 20000 19000 27000 20000 31000 20000 27000 44000 37000 43000 31000 250000 41000

Region E W W W E C E E NC W W W C E NC W

Div HUMRES FINACE FLTOPS FINACE FINACE APTOPS APTOPS FINACE APTOPS APTOPS SALES HUMRES APTOPS APTOPS SALES APTOPS

The SAS data set prog2.regsals contains each employee’s ID number (EmpID), salary (Salary), region (Region), and division (Div). There is one observation for each employee.

Desired Output Human resources wants a new data set that shows the total salary paid and the total number of employees for each division in each region. Partial Output Num Region

Div

DivSal

Emps

C E E E E NC NC

APTOPS APTOPS FINACE FLTOPS HUMRES APTOPS FLTOPS

70000 83000 109000 122000 178000 37000 28000

2 3 4 3 5 2 1

Sorting by Region and Div The data must be sorted by Region and Div. Region is the primary sort variable. Div is the secondary sort variable. proc sort data=prog2.regsals out=regsort; by Region Div; run;

Sorting by Region and Div proc print data=regsort noobs; run; Partial PROC PRINT Output Region Div C C E E E E E

APTOPS APTOPS APTOPS APTOPS APTOPS FINACE FINACE

Salary 27000 43000 20000 31000 32000 19000 31000

Multiple BY Variables data regdivsals; set regsort; by Region Div; additional SAS statements run;

Multiple BY Variables: Example Look Ahead Region

Div

C C C E E E NC NC NC NC NC

APTOPS APTOPS APTOPS APTOPS FINACE FINACE FINACE SALES SALES SALES SALES

First.Region 1 First.Div 1 Last.Region 0 Last.Div 0 ... ...

Multiple BY Variables: Example Region

Look Ahead Div

C C C E E E NC NC NC NC NC

APTOPS APTOPS APTOPS APTOPS FINACE FINACE FINACE SALES SALES SALES SALES

First.Region 0 First.Div 0 Last.Region 0 Last.Div 0 ... ...

Multiple BY Variables: Example Region

Div Look Ahead

C C C E E E NC NC NC NC NC

APTOPS APTOPS APTOPS APTOPS FINACE FINACE FINACE SALES SALES SALES SALES

First.Region 0 First.Div 0 Last.Region 1 Last.Div 1 ... ...

Multiple BY Variables: Example Region C C C E E E NC NC NC NC NC

Div Look Ahead APTOPS APTOPS APTOPS APTOPS FINACE FINACE FINACE SALES SALES SALES SALES

First.Region 1 First.Div 1 Last.Region 0 Last.Div 1 ... ...

Multiple BY Variables: Example Region C C C E E E NC NC NC NC NC

Div Look Ahead APTOPS APTOPS APTOPS APTOPS FINACE FINACE FINACE SALES SALES SALES SALES

First.Region 1 First.Div 1 Last.Region 0 Last.Div 1

Multiple BY Variables When you use more than one variable in the BY statement, a change in the primary variable forces Last.BY-variable=1 for the secondary variable. Region C C E E E E

Div APTOPS APTOPS APTOPS APTOPS APTOPS FINACE

First. Region

Last. Region

1 0 1 0 0 0

0 1 0 0 0 0

First. Div 1 0 1 0 0 1

Last.Div 0 1 0 0 1 0

Multiple BY Variables /*Summarize salaries by division*/ data regdivsals(keep=Region Div DivSal NumEmps); set regsort; by Region Div; if First.Div then do; DivSal=0; NumEmps=0; end; DivSal+Salary; NumEmps+1; if Last.Div; run;

Multiple BY Variables Partial Log NOTE: There were 39 observations read from the data set WORK.REGSORT. NOTE: The data set WORK.REGDIVSALS has 14 observations and 4 variables. NOTE: DATA statement used: real time 0.07 seconds cpu time 0.07 seconds

Multiple BY Variables proc print data=regdivsals noobs; run; Partial PROC PRINT Output Region C E E E

Div

DivSal

APTOPS APTOPS FINACE FLTOPS

70000 83000 109000 122000

c03s2d2.sas

Questions

Related Documents

Sas Sort Accum Total
June 2020 0
Sas And Asa To Sort
May 2020 6
Sort
November 2019 21
Sort
November 2019 23
Sas
April 2020 20
Sas
June 2020 17