Monday, April 17, 2017

Basic VLOOKUP (MS-Excel)

A very important function that every data scientist should be aware of, as it is very helpful when compiling small data sets, is the VLOOKUP function within Excel. Essentially what this function allows the user to achieve, is the comparison and matching of data from one Excel workbook to another.

For our sample demonstration, let’s imagine that you have data within one Excel workbook that contains various types of cars within your dealer inventory. The second workbook contains the amount of those cars previously sold from your inventory. When comparing matching car variables, instead of having to painstakingly read off of one sheet and manually enter or paste the values of matching entries into the first sheet, you can utilize VLOOKUP.

Example Sheet 1:


Example Sheet 2:


First, you will need to decide on the destination cell for the data that you will be referencing. In the case of our example, the destination cell will be cell C2 on sheet1 of workbook1. The data being referenced exists on sheet1 of workbook2. For this example to work, and for VLOOKUP to work in general, THE DATA BEING REFERENCED MUST BE LOCATED ON A SEPARATE SHEET OF A SEPARATE WORKBOOK. 


The VLOOKUP example looks complicated but don't be intimidated, the VLOOKUP function is actually incredibly easy to utilize.

There are 4 variables that must be entered within the VLOOKUP function for it to work properly.

=VLOOKUP(A,B,C,D)

A. Refers to the column that will act as the primary key for the data set on workbook1. In the case of the example, we are selecting multiple cells within the same column. To re-create this step of the scenario, type =VLOOKUP into the C2 cell and click on cell A2. Now drag the selection downward to include all variables listed within the A column. After you have finished this task, place a comma.

B. Refers to the group of columns that will be referenced in comparison to the A column within workbook1. In the case of the example, I clicked on cell A2 of the sheet on workbook1, and drug my pointer selection over to also select B2. After such, I pulled the selection down to include all of the variables within the A and B columns which existed below the column titles. After this task is completed, place a comma.

C. The first column that is selected in the second workbook acts as the primary key for workbook2. The C variable indicates the column number, from left to right, that will be referenced when the first column of the workbook2 matches the column selected to be the primary key in workbook1 (Column A). After this task is completed, place a comma.

D. You will always want to type FALSE for this portion of the function. This essentially instructs Excel that both primary keys much match exactly to be considered a match. All matches will display the referenced column results within the destination cell.


One final important note, the data referenced in the C column of worksheet2 is just that. Which means that, as soon as the second sheet is deleted, or both workbooks are closed, that the data reference will be lost. To prevent this from happening, select the entire column of destination data (C column) and select Copy. Then paste the data back on top of the referenced data as values. This makes the data permanent and not a temporary reference. If you need a better explanation pertaining to this step, there is an additional article on this blog that explains the concept in greater detail.


With basic mastery of the VLOOKUP function, you should have a much easier time working through the nuances of Excel while navigating small data sets. 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.