Sunday, August 13, 2017

(R) Central Tendency

In this article, I will be discussing R's ability to perform the various methods that are necessary to determine central tendency. I will be using two data vectors as examples to generate the values discussed within this entry. To follow along, please create the following vectors by utilizing the following lines of code:

x <-c(53,46,61,97,44,87,40,15,29,99,85,98,17,3,46,25,15,19,2,32,67,34,39,100,88,40,40,87,89,86,69,67,89,84,98,43,75,66,40,76,48,82,45,99,10,59,15,13,99,45,78,66,59,26,2,91,80,42,94,12,9,24,37,14,18,86,35,96,56,50,22,39,58,82,11,56,50,30,99,64,74,13,14,7,5,97,59,91,57,69,58,36,43,77,36,2,58,86,89)

y <- c(1,1,1,2,2,2,3)


summary()

Summary is a useful R function, in that it provides the user with console output pertaining to the value that was initially passed to it.

Summary will print to the console:

Min (the smallest value within the set)
1st Qu. (the value of the first quartile)
Median (the median value)
Mean (the mean value)
3rd Qu. (the value of the third quartile)
Max (the max value)

If were to utilize this function while passing to it the value of 'x', the following information would be generated and printed to the R console window:

summary(x)

Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00  29.50   53.00    53.15 82.00    100.00

If you wanted to generate each value independently, you could use the following functions:

mean()

For the mean value.

median()

For the median value.

range()

For the lowest and highest values.

Finding the Mode

Unfortunately, R does not have a standard function contained within its library that can be utilized to generate the mode. However, after careful searching, I found a very good substitute. The code is below. This code was taken from a YouTube user named, economicurtis. It was featured in his video: "Calculating Mode with R Software (More on R's Summary Stats". A link exists to this video at the end of the article.*

temp <- table(as.vector(<vectorname>))
names(temp)[temp == max(temp)]

The first line creates a new table for the data vector, and the second line generates the value. If the data is bi-modial, two values will be generated. Here is an example of the code with 'y' being utilized as the vector value.

temp <- table(as.vector(y))
names(temp)[temp == max(temp)]

Since 'y' is bi-modial, the output that is printed to the console window should be:

[1] "1" "2"

Finding the Variance

To derive the variance from a vector, the following function can be utilizied:

var()

Finding the Standard Deviation

This funciton can be used to derive the standard deviation from a vector:

sd()

Tukey's Five Number Summary

This function provides sample percentiles, which can be useful in descriptive statistics:

fivenum()


For example, if were to use this function on x:

fivenum(x)

The following information would be printed to the console window:

[1] 2.0 29.5 53.0 82.0 100.0

(2.0) The first value is the smallest observation.
(29.5) The second value is the value of the first quartile.
(53.0) The third value is the median.
(82.0) The fourth value is the value of the third quartile.
(100.0) And the final value is largest obervation.

Interquartile Range


The interquartile range, or IQR, is the value between the third and first quartiles. This value can be derived with the following function.

IQR()

In the next article, we will begin graphing box plots, and histograms.

* - https://www.youtube.com/watch?v=YvdYwC2YgeI

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.