Thursday, April 27, 2017

SCAN / SUBSTR / INDEX (SAS)

Three somewhat similar, incredibly useful, and commonly used SAS functions, are SCAN, SUBSTR, and INDEX.

SCAN – returns a specified word from a character value.

SUBSTR – extracts a substring or replaces character values.

INDEX – searches a character expression for a string of characters, and returns the position of the string’s first character for the first occurrence of the string.

SCAN

The SCAN function is utilized as follows:

SCAN(<THE COLUMN TO SCAN>, <ORDER OF VALUE WHICH SCAN WILL RETURN>, <DELIMITER>);

Now to send the function into action in the following example set:


For the sake of this demonstration, let’s say that you wanted to scan the “Name” column variable for the middle initial of each individual listed. Once this initial has been identified, you would like to have it a new variable created to house its value.

There are two ways to perform this task utilizing SCAN, the first way is to have SCAN from left to right.

SCAN(Name, 3, ‘ ‘ );

The second method scans from right to left:

SCAN(Name, -1, ‘ ‘ );

In each case, SCAN scans the NAME column for data while utilizing a blank space as the delineator. The first method scans for the third sting of data, occurring after two blanks, and reads it. The second method, scans for the first string, occurring before a blank, and reads it.

To utilize these functions while also creating a variable to house the data of each string returned, you will need to write code similar to the code listed below:

Data TEST2;
Set TEST;
Length MI1 MI2 $ 10; /* 10 is excessive. However, you should get into the habit of setting lengths for variables that you will later create. */
MI1 = SCAN(Name, 3, ' ' );
MI2 = SCAN(Name, -1, ' ' );
Run;


Another demonstration of how SCAN can create a variable to house the city string from the address variable column.

Scanning left to right.

City = SCAN(Address, 2, ',' );

Scanning right to left.

City2 = SCAN(Address, -3, ',' );

In each case, the SCAN function utilized a comma as a delineator.
SUBSTR

The SUBSTR function, or the substring function, is commonly utilized to check for a single character value. However, it can also be utilized to check for multiple character values, or to replace character values.

The SUBSTR function is utilized as follows:

SUBSTR(<THE COLUMN TO SCAN>, <POSITION IN WHICH TO BEGIN RETURNING CHARACTERS>, <NUMBER OF CHARACTERS TO RETURN>);

In the case of our example set, let’s say that you wanted to create a variable which houses the first letter of the last name of each individual listed within the name column.

MISUB = SUBSTR(NAME, 1, 1);

Or you could create a variable which houses the first 3 characters of an address from the address column.

ZIP5 = SUBSTR(Address, 1, 3);

Or for whatever reason, you decided that every address number to 777.

SUBSTR(Address, 1, 3) = ‘777’;

INDEX


Index is typically used, from a pragmatic standpoint, to prove if there is a particular character value located within a character string. The INDEX function, if it encounters such an occurrence, returns the position of the string’s first character. If no occurrence is found, a 0 is returned.

The INDEX function is utilized as follows:

INDEX(<THE COLUMN TO SCAN>, ‘<THE STRING TO SCAN FOR>’);

So, for example, if you wanted to create a new variable from our sample data set that indicated as to whether an individual had the letter “D” in their name, you would run the following code:

Data TEST2;
Data TEST;
INDEXVAR1 = INDEX(name, 'd');
INDEXVAR2 = INDEX(name, 'D');
If INDEXVAR1 NE 0 then D_Name = 1;
If INDEXVAR2 NE 0 then D_Name = 1;
Drop INDEXVAR INDEXVAR2;
Run;


You must keep in mind that SAS is case sensitive when it assesses data in quoted strings. Therefore, for our example, it is necessary to utilize both the uppercase and lowercase characters of the letter “D”.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.