Thursday, July 27, 2017

(R) Data Frame Maintenance

The topic of today's post is: Data Frame Maintenance. In this article, I will demonstrate various techniques that can be utilized to accomplish the tasks associated with such.

Let's say, for example, that you are working with a data frame named: "DataFrameA". For whatever reason, the third column of this particular data frame needs to be re-named. The code to accomplish this task is below:

colnames(DataFrameA)[<#ofcolumntochange>] <- "New Column Name"

So, if you wanted to change the name of the third column of DataFrameA to, “DataBlog", the code would resemble:

colnames(DataFrameA)[3] <- "DataBlog"


Changing Column Variable Type

Now, let's say that you wanted to change the data type that is contained within a column of an existing data frame. Again, we will use "DataFrameA" for our example.

This code will change a column which contains integers, to a column that contains factors:

DataFrameA$VarA <- as.factor(DataFrameA$VarA)

This code will change a column which contains factors, to a column that contains integers:

DataFrameA$VarA <- as.integer(DataFrameA$VarA)

This code will change a column which contains factors, to a column that contains characters:

DataFrameA$VarA <- as.character(DataFrameA$VarA)


Stacking Data Frames

Perhaps you want to stack two data frames, one on top of the other.

If each data frame has the same column names, then the following code is ideal:

NewDataFrame <- rbind(topdataframe, bottomdataframe)

If one data frame contains an additional column that is not included within the other, you will need to add the missing column to the data frame before stacking the data.

For example, if Data Frame A contains:

A    B     C
1     9      4
2     18    8
3     27   12
4     36   16

And Data Frame B contains:

A    B
1     3
2     6
3     9
4     12

You would first need to add a column containing missing values to the bottom data frame by running the example code:

DataFrameB$C <- NA


This code modifies Data Frame B so that it resembles:

A     B     C
1      3      NA
2      6      NA
3      9      NA
4      12    NA

The data frames can now be stacked with the code:

NewDataFrame <- rbind(DataFrameA, DataFrameB)

And the new data frame will resemble:

A    B      C
1     9       4
2     18     8
3      27    12
4      36    16
1      3      NA
2      6      NA
3      9      NA
4      12    NA


Adding a Vector as a Column

For this example, we'll pretend that you wanted to add a new column, in the form of a vector, to an existing data frame.

If the column is of the same length, row wise, then adding it to a data frame is simple.

Utilize the code:

DataFrameName$NewColumnName <- NewColumntoAdd

If the column is shorter, row wise, in comparison to the data frame in which it is being added, then you will first have to add additional values to the vector before utilizing the above code.

For example, if NewColumntoAdd is 35 rows in length, and DataFameA is 36 rows in length, you could add the additional values needed to complete the subsequent task with the following code:

AdditonalDataVector <- rep(c(NA), times=1) # Or however many NA rows are needed #

NewColumntoAdd <- c(NewColumntoAdd, AdditionalDataVector)


Now you can successfully run the code:

DataFrameName$NewColumnName <- NewColumntoAdd


Re-Ordering Columns within a Data Frame

To accomplish this task you have two options.

The first option is to re-order the column data by column name.

So for example, if you were working on a data frame (DataFrameA), with the column names of ("A", "B", "C", "D"), and you wanted to re-order the columns so that they were displayed such as ("B", "C", "A", "D"), you could run the code:

DataFrameA <- DataFrameA[c("B", "C", "A", "D")]


You also have the option of re-ordering the columns by column number.

If this was the option that you wished to utilize, the code would resemble:

DataFrameA <- DataFrameA[c(2,3,1,4)]

That is all for this entry. I have not yet decided what the topic for the next post, but I promise you that it will contain more helpful R related information.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.