Reflections of a Data Scientist: (R) Exotic Analysis

In prior articles, I explained the various test of correlation which are available within the R programming language. One of those methods which was described but is rarely utilized outside of the textbook, is the Distance Correlation T-Test methodology.

In this entry, I will briefly explain when it is appropriate to utilize the distance correlation, and how to appropriate apply the methodology within the R framework.

Now I must begin by stating that what I am about to describe is uncommon, and should only be utilized in situations which absolutely warrant application.

The distance correlation as described within the context of this blog is:

Distance Correlation – A method which tests model variables for correlation through the utilization of a Euclidean distance formula.

So when would I apply the Distance Correlation T-Test? To answer this question, only in situations in which other correlation methods are inapplicable. In the case which I am about to demonstrate, an example of the inapplicability of other methods would be situations in which one variable is continuous, and the other is categorical.

Example:

(This example requires that the R package: “energy”, be downloaded and enabled.)

# Data Vectors #

x <- c(8, 1, 4, 10, 8, 10, 3, 1, 1, 2)
y <- c(97, 56, 97, 68, 94, 66, 81, 76, 86, 69)

dcor.ttest(x, y)

mean(x)

sd(x)

mean(y)

sd(y)

This produces the output:

dcor t-test of independence

data: x and y
T = -0.1138, df = 34, p-value = 0.545
sample estimates:
Bias corrected dcor
-0.01951283

> mean(x)
[1] 4.8
> sd(x)
[1] 3.794733
> mean(y)
[1] 79
> sd(y)
[1] 14.3527

Conclusion:

There was a not significant difference in GROUP X (M = 4.80, SD = 3.79), as compared to GROUP Y (M = 79, SD = 14.35), t(34) = -0.11, p = .55.

However, you may be wondering, what is the difference between the Distance Correlation T-Test, the Distance Correlation Method, and the Pearson Test of Correlation?

Distance Correlation T-Test – Utilized to test for significance in situations in which one variable is continuous, and the other is categorical. This method can also be utilized in other situations, however, if both variables are continuous, then the Pearson Test of Correlation is most appropriate.

Distance Correlation Method – Utilized to test for correlation between two variables when assessed through the application of the Euclidean Distance Formula. This model output value is similar to coefficient of determination, in that, it can range from 0 (no correlation), to 1 (perfect correlation).

The Pearson Test of Correlation – Utilized to determine if values are correlated. This method should typically be utilized above all other tests of correlation. However, it is only appropriate to utilize this method when both variables are continuous.

Reflections of a Data Scientist

Tuesday, August 25, 2020

(R) Exotic Analysis – Distance Correlation T-Test

No comments:

Post a Comment