Reflections of a Data Scientist: SCAN / SUBSTR (SAS)

Two somewhat similar, incredibly useful, and commonly used SAS functions, are SCAN and SUBSTR.

SCAN – returns a specified word from a character value.

SUBSTR – extracts a substring or replaces character values.

SCAN

The SCAN function is utilized as follows:

SCAN(<THE COLUMN TO SCAN>, <ORDER OF VALUE WHICH SCAN WILL RETURN>, <DELIMITER>);

Now to send the function into action in the following example set:

For the sake of this demonstration, let’s say that you wanted to scan the “Name” column variable for the middle initial of each individual listed. Once this initial has been identified, you would like to have it a new variable created to house its value.

There are two ways to perform this task utilizing SCAN, the first way is to have SCAN from left to right.

SCAN(Name, 3, ‘ ‘ );

The second method scans from right to left:

SCAN(Name, -1, ‘ ‘ );

In each case, SCAN scans the NAME column for data while utilizing a blank space as the delineator. The first method scans for the third sting of data, occurring after two blanks, and reads it. The second method, scans for the first string, occurring before a blank, and reads it.

To utilize these functions while also creating a variable to house the data of each string returned, you will need to write code similar to the code listed below:

Data TEST2;
Set TEST;
Length MI1 MI2 $ 10; /* 10 is excessive. However, you should get into the habit of setting lengths for variables that you will later create. */
MI1 = SCAN(Name, 3, ' ' );
MI2 = SCAN(Name, -1, ' ' );
Run;

Another demonstration of how SCAN can create a variable to house the city string from the address variable column.

Scanning left to right.

City = SCAN(Address, 2, ',' );

Scanning right to left.

City2 = SCAN(Address, -3, ',' );

In each case, the SCAN function utilized a comma as a delineator.

SUBSTR

The SUBSTR function, or the substring function, is commonly utilized to check for a single character value. However, it can also be utilized to check for multiple character values, or to replace character values.

The SUBSTR function is utilized as follows:

SUBSTR(<THE COLUMN TO SCAN>, <POSITION IN WHICH TO BEGIN RETURNING CHARACTERS>, <NUMBER OF CHARACTERS TO RETURN>);

In the case of our example set, let’s say that you wanted to create a variable which houses the first letter of the last name of each individual listed within the name column.

MISUB = SUBSTR(NAME, 1, 1);

Or you could create a variable which houses the first 3 characters of an address from the address column.

ZIP5 = SUBSTR(Address, 1, 3);

Or for whatever reason, you decided that every address number to 777.

SUBSTR(Address, 1, 3) = ‘777’;

Reflections of a Data Scientist

Thursday, April 27, 2017

SCAN / SUBSTR (SAS)

No comments:

Post a Comment