Saturday, August 5, 2017

(R) Functions

The function concept, as it exists within R, is probably the most difficult concept that exists within the R programming language. Moving forward, future entries will address various statistical models, and how to create them within the R platform.

R Functions 

I will be creating this description based on the assumption that you have some understanding as to how a function operates.

In the case below, we are first defining the function "Test_Function". Test_Function will be the name of the function that is called to run the code contained within the braces "{" and "}". In the example function, the value of "X" is being assigned as a value to "Q".

Test_Function <- function(X) is indicating to R, that Test_Function is a function. The "function()" that follows the "<-", provides this definition. The (X) after the function definition, "function(X)", is indicating to R, that "X" will be the value that will be passed into the function.

# The sample data frame below will be utilized throughout the exercises provided: #

A <- c(1,1,1,2,2,3,3)
B <- c(6,5,4,3,2,1)
PlayerID <- c(11,12,13,44,55,16,71)
HR <- c(0,4,6,2,7,10,4)

BaseBallPlayers <- data.frame(PlayerID, HR)

# The Code Below Defines the Function #

Test_Function <- function(X)
{
Q <<- (X)
}

# Run the Function #

Test_Function(B)

# The Variable 'Q' now contains 'DataFrameA' #

Q


"Test_Function(B)" illustrates an example of a function being called. Being "called" is synonymous with being initiated.

In our example, “B” is being passed into the function as it is called. This means that every entry within data vector variable “B”, will now stored within (global) variable “Q”. This is achieved through the usage of "<<-", which informs R that "Q" is to exist as a permanent value.

Here is another function example:

Test_Function2 <- function(Y)

{
Star <<- c(ifelse(Y$HR > 5, "X", " "))
}

BaseBallPlayers$Star <- Test_Function2(BaseBallPlayers)


In this example, we will assume that you are working with a data frame that contains information pertaining to baseball players. The "Star" vector will contain player information pertaining to home run hitting abilities. If a player has hit more than 5 home runs, the vector will mark his place on the list of values with an "X".

It should be mentioned before continuing, that R is strange in the way that it returns values from functions. In the above example, a vector is being created from a column which existed in an already established data frame. (Y) is the variable value that will be replaced by the value of whatever variable is passed into the function. However, without the "<<-", which existed in the above function, R will not return the value as it exists outside of the function. Meaning, that the code within the function will be processed, but the variable which is created as a product of such, will cease to exist after the function has completed its process.

For this reason, you will need to assign the function, and the variable which will be passed to the function, to a new variable prior to calling the function. This allows R to store the variable, which would previously have lived a temporary existence, to a permanent location.

Now for our final example, we'll pretend that you are working with the same player data frame. However, in this scenario, you want to generate the previous vector, and then add it to the existing data frame as a new column. The code for achieving such is below:

Test_Function3 <- function(w){
Star <- ifelse(w$HR > 5, "X", " ")
w$Star <- Star
return(w)
}

BaseBallPlayersA <- Test_Function3(BaseBallPlayers)


In this example, we are returning the value of "w". The reason for such, is that without the utilization of return, the variable that is being assigned the value of “w", would be a vector and not a data frame. Why this is the case is somewhat complicated, but stay with me while I explain it.

Test_Function3 <- function(w){
Star <- ifelse(w$HR > 5, "X", " ")
}


This creates the “Star” variable within the function. Remember, this variable would be temporary unless assigned as a part of the function being called.

w$Star <- Star

Here, the temporary value is being assigned to a new temporary data frame, which will contain it.

If we leave out the return, the temporary value will be the ultimate product of the function, and this value will be assigned to the outside variable. However, with the return specified, R is instructed to return the value of “W,” which has now been modified by the function. However, the modified variable W, still needs a place to stay, as it is temporary, and that is where:

BaseBallPlayersA <- Test_Function3(BaseBallPlayers)

Comes in.

Functions are useful in that they can be utilized to automate every day activities. Or from a more pragmatic standpoint, they can be used to generate daily reports.

In the next article, we will cover a few miscellaneous aspects of R that were overlooked in previous articles, but are nevertheless useful. Subsequently, we will proceed to delve into statistical models, and the R code required to generate reports based on such.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.