Tuesday, May 23, 2017

SAS Macro – Splitter (SAS) (Pt. 1)

Throughout the previous entries, I promised that I would upload some of the basic macros that I have created in SAS. After scouring large amounts of program code that I’ve written for previous work assignments, I was able to decide on the macros that I found to be most helpful and re-useable. These macros will be posted to this blog.

The first macro that I will be demonstrating is the splitter macro. This macro was created to help with the proper transposition of data observations contained within SAS data sets.

Ex Before:

Ex. After:


The Method:


Before beginning the compilation and processing of the macro, I would recommend creating a copy of the data set that you wish to alter.

/* Create a copy of the original data set */

Data DATASETB1;
Set SETA;
Run;


Next, you will need to remove duplicate entries from the copy of the original data set. Make sure to sort based on a primary and unique identifier.

Proc Sort Data = DATASETB1 nodupkey dupout= DupOut1;
By VARA;
Run;


For the macro to perform its function correctly, you must run the PROC FREQ procedure on the unique identifier that was specified in the previous set. The reason for doing such, is that you need to locate the unique identifier observation with the greatest frequency.

PROC FREQ DATA = DupOut1;
TABLES VARA;
RUN;


We are now ready to create and compile the macro, the code for which can be found below:

%macro splitter;

%do i = 1 %to 2; /* Highest frequency of unique re-occuring variable observation (VARA in our example) minus 2 */
%let h = %eval(&i+1); 


proc sort data = DupOut&i nodupkey dupout= DupOut&h;
by VARA;
run;

%end;

%mend;

To reiterate, the “i” variable in the above listed code must be modified to the number produced by the PROC FREQ DATA step. The “i” variable will be the highest frequency occurrence of a unique variable from the PROC FREQ procedure minus 2.

What this code seeks to achieve is the separation of duplicate values from the initial data set and the establishment of separate data sets, each containing a unique list of entries. Such as the illustration below demonstrates:


In the next article, we will review another macro which will help us in re-naming the variables of the newly created data sets. Once we have achieved this step, we can then begin re-assembling the separated data into a single set. 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.