Reflections of a Data Scientist: (R) Histograms w/Standard Error Bars

A patron of the site recently contacted me and asked that we again review the graphing capacity of the R language, specifically addressing R's ability to create standard error bars as auxiliary aspects of output as it applies to histograms.

As I wrote in a previous article, it is often best to utilize Microsoft Excel for all graphical endeavors. However, if for whatever reason, you do not have access to the Excel platform, or, you desire to check your data prior to exporting the results to Excel, than the following method could be considered as a provisional solution.

Example:

Measurements have been collected from various groups. The data pertaining to the observations from each group are below:

# Data #

GroupA <- c(11, 17, 38, 20, 21, 40, 18, 48, 50, 37)

GroupB <- c(1, 37, 36, 13, 27, 50, 12, 24, 19, 3)

GroupC <- c(14, 33, 29, 25, 29, 36, 46, 7, 20, 38)

GroupD <- c(6, 34, 31, 26, 49, 34, 11, 36, 13, 28)

# Calculate the mean score related to each observational series #

MeanA <- mean(GroupA)

MeanB <- mean(GroupB)

MeanC <- mean(GroupC)

MeanD <- mean(GroupD)

# Create a vector which will contain all of the means measurements #

MeansAll <- c(MeanA, MeanB, MeanC, MeanD)

# Calculate the standard error of the mean related to each observational series #

# Requires package: "Plotrix", downloaded and enabled #

stderrA <- std.error(GroupA)

stderrB <- std.error(GroupB)

stderrC <- std.error(GroupC)

stderrD <- std.error(GroupD)

# Create a vector which will contain all of the standard error measurements #

stderrAll <- c(stderrA, stderrB, stderrC, stderrD)

# Combine all of the aggregate values within a single data frame #

AllData <- data.frame(MeansAll, stderrAll)

# Create a basic bar plot #

barCenters <- barplot(height = AllData$MeansAll, main = "Average Measurement per Group",

xlab = "Group", ylab = "Mean Measurment",

yaxt="n",ylim=c(0,40))

This produces the output:

We'll take a break right now to discuss what is occurring in the code block above. As to not cause garish output, the y-axis label is being temporarily disabled (yaxt = "n"). We are also manually setting the height restrictions of the graph (0-40).

The next bit of code is a clever hack which creates the standard error overlays.

# Create standard error overlays #

arrows(barCenters, AllData$MeansAll-AllData$stderrAll,

barCenters, AllData$MeansAll+AllData$stderrAll, length=0.05, angle=90, code=3)

This produces the output:

# Label y-axis #

ticks<-c(0, 10, 20, 30, 40)

axis(2,at=ticks,labels=ticks)

# Label x-axis #

group <- c("A", "B", "C", "D")

axis(1, at=barCenters, labels=group)

This produces the final output:

That's all for now. Stay tuned for more mathematical content!

Reflections of a Data Scientist

Monday, July 16, 2018

(R) Histograms w/Standard Error Bars

No comments:

Post a Comment