Reflections of a Data Scientist: (R) Spearman’s Rank Correlation and Kendall Rank Correlation

In previous articles, the topics of ranking, correlation, and non-parametric data were discussed. Spearman’s Rank Correlation, and Kendall Rank Correlation, combine all of these methodologies into singular concepts. In this entry we will be discussing how these correlations are utilized within the R platform, and the appropriate circumstances which warrant their utilization.

Spearman’s Rank Correlation Coefficient

Spearman’s Rank Correlation Coefficient, also referred to as Spearman’s rho, is a non-parametric alternative to the Pearson correlation. The Spearman alternative is utilized in circumstances when either data samples are non-linear*, or the data type contained within those samples are ordinal**. The output variable that this method produces is known as “rho”. Hence the alternative name which this method is referred to as (“Spearman’s Rho).

As is case with non-parametric alternatives, the particular design of this procedure utilizes a rank system.

Example:

We are presented with the following data vectors from two survey prompts:

# Create data vector (scale 1-5) #

x <- c(5, 1, 1, 1, 3, 2, 5, 3, 3, 2, 4, 4, 4, 2, 5, 4, 4, 4, 4, 2)

# Create data vector (scale 1-5) #

y <- c(4,5, 4, 3, 1, 1, 5, 4, 5, 4, 3, 4, 3, 4, 5, 5, 3, 3, 5, 4)

# Create Spearman’s Rank Correlation #

cor.test(x, y, method=c("spearman"))

This produces the output:

Spearman's rank correlation rho

data: x and y
S = 1072.1, p-value = 0.4126
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.1939455

From this output, we can first determine that the model strength is not the best, as the p-value = 0.4126, a value which is far above the common alpha level of .05. Next, we will assess the rho value output, which is 0.192955. This value is measured on a scale similar to the Pearson’s correlation. Since this value is relatively low, we will assume a weak positive correlation.

*- For an example as to what non-linear data might resemble, please refer to the article “(R) Polynomial Regression”, published April 17, 2018.

**- For example, survey response data which asked the respondent to rank a particular item on a scale of 1-10.

Kendall Rank Correlation Coefficient

The Kendall Rank Correlation Coefficient, also referred to as Kendall’s Tau, is also a non-parametric alternative to the Pearson correlation. Like Spearman’s rho, Kendall’s Tau is also utilized in circumstances when either data samples are non-linear, or the data type contained within those samples are ordinal. The output variable that this method produces is known as “rho”. As is case with non-parametric alternatives, the particular design of this procedure utilizes a rank system.

Example:

We are presented with the following data vectors from two survey prompts:

# Create data vector (scale 1-5) #

x <- c(5, 1, 1, 1, 3, 2, 5, 3, 3, 2, 4, 4, 4, 2, 5, 4, 4, 4, 4, 2)

# Create data vector (scale 1-5) #

y <- c(4,5, 4, 3, 1, 1, 5, 4, 5, 4, 3, 4, 3, 4, 5, 5, 3, 3, 5, 4)

# Create Kendall Rank Correlation #

cor.test(x, y, method=c("kendall"))

This produces the output:Kendall's rank correlation tau

data: x and y
z = 0.84528, p-value = 0.398
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.1617271

From this output, we can first determine that the model strength is not the best, as the p-value = 0.398, a value which is far above the common alpha level of .05. Next, we will assess the rho value output, which is 0.1617271. This value is measured on a scale similar to the Pearson’s correlation. Since this value is relatively low, we will assume a weak positive correlation.

Conclusion:

While both methods provide similar functionality, the Spearman’s Rank Correlation is utilized far more frequently than the Kendall Rank Correlation. I typically utilize both methodologies, compare the results of each, and then report my findings in a subsequent research composition.

I hope that you have found this article to be informative and interesting. Until next time, stay inquisitive, Data Heads!

Reflections of a Data Scientist

Monday, April 23, 2018

(R) Spearman’s Rank Correlation and Kendall Rank Correlation

No comments:

Post a Comment