Wednesday, April 19, 2017

Drop and Keep (SAS)

In a previous article, I discussed how to re-arrange columns within SAS. Also discussed previously, was a method that allows for the re-formatting of column data. The latter utilizes the DROP statement. In this article I will discuss how to optimally utilize both the KEEP and DROP statements within SAS.

Keep

Keep instructs SAS to keep the variables which are specified within a newly created data set, all variables which are not referenced are deleted.

Keep, like Drop, can be utilized either in a DATA step or in a SET step. The difference between the placement of the statement is critical.

Ex.

Data SetA (Keep = VARA VARB VARC); /* No commas, but parentheses and “=” are utilized */
Set SetB;
Run;


(OR)

Data SetA;
Set SetB;
Keep VARA VARB VARC; /* No commas, parentheses or “=” */
Run;


In the first example, the data variables in the KEEP statement are referenced before the data is read to create the new set. Meaning, that all variables not referenced within the KEEP statement are never read at all when the new set is being created. When referencing a larger set to create a smaller sub-set, the placement of the KEEP statement could save a large amount of time in processing.

In the second example, the data variables in the KEEP statement is referenced after the data set has been created. First the data set is compiled in full, after which the KEEP statement is utilized.

In both cases, only the variables: VARA, VARB, VARC will remain in SetA, all other variables from the original set, SetB, will be removed.

Drop

Drop instructs SAS to drop the variables which are specified, thereby removing their inclusion from a newly created data set, all variables which are referenced are removed.

The placing of a Drop statement within a SAS code block is more important than Keep statement placement. The reason for such, is that often when creating new data sets, variables will be utilized from the prior set to create new variables. This means, that if some of the variables in SetB are referenced to create new variables in SetA, that the data variables must be dropped after the variables which referenced their values are created.

Ex.

Data SetA (Drop = VARA VARB VARC); /* No commas, but parentheses and “=” are utilized */
Set SetB;
VARD = VARA + VARB; /* VARD will not be created correctly in the new set because VARA and VARB are dropped prior to VARD’s creation */
Run;


(OR)

Data SetA;
Set SetB;
VARD = VARA + VARB;
Drop VARA VARB VARC; /* In this case, VARD will be created correctly as the reference variables are dropped after VARD’s creation */
/* No commas, parentheses or “=” */
Run;


Armed with the understanding of this knowledge, you can push forth as a SAS programmer, instructing SAS to create custom data sets which suit your specific needs and preferences.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.