Reflections of a Data Scientist: 2020

Monday, November 9, 2020

(R) Cohen’s d

In today’s entry, we are going to discuss Cohen’s d, what it is, and when to utilize it. We will also discuss how to appropriately apply the methodology needed to derive this value, through the utilization of the R software package.

(SPSS does not contain the innate functionality necessary to perform this calculation)

Cohen’s d - (What it is):

Cohen’s d is utilized as a method to assess the magnitude of impact as it relates to two sample groups which are subject to differing conditions. For example, if a two sample t-test was being implemented to test a single group which received a drug, against another group which did not receive the drug, then the p-value of this test would determine whether or not the findings were significant.

Cohen’s d would measure the magnitude of the potential impact.

Cohen’s d - (When to use it):

In your statistics class.

You could also utilize this test to perform post-hoc analysis as it relates to the ANOVA model and the Student’s T-Test. However, I have never witnessed the utilization of this test outside of an academic setting.

Cohen’s d – (How to interpret it):

General Interpretation Guidelines:

Greater than or equal to 0.2 = small
Greater than or equal to 0.5 = medium
Greater than or equal to 0.8 = large

Cohen’s d – (How to state your findings):

The effect size for this analysis (d = x.xx) was found to exceed Cohen’s convention for a [small, medium, large] effect (d = .xx).

Cohen’s d – (How to derive it):

# Within the R-Programming Code Space #

##################################

# length of sample 1 (x) #
lenx <-
# length of sample 2 (y) #
leny <-
# mean of sample 1 (x) #
meanx <-
# mean of sample 2 (y)#
meany <-
# SD of sample 1 (x) #
sdx <-
# SD of sample 2 (y) #
sdy <-

varx <- sdx^2
vary <- sdy^2
lx <- lenx - 1
ly <- leny - 1
md <- abs(meanx - meany) ## mean difference (numerator)
csd <- lx * varx + ly * vary
csd <- csd/(lx + ly)
csd <- sqrt(csd) ## common sd computation
cd <- md/csd ## cohen's d

cd

##################################

# The above code is a modified version of the code found at: #

# https://stackoverflow.com/questions/15436702/estimate-cohens-d-for-effect-size #

Cohen’s d – (Example):

FIRST WE MUST RUN A TEST IN WHICH COHEN’S d CAN BE APPLIED AS AN APPROPRIATE POST-HOC TEST METHODOLOGY.

Two Sample T-Test

This test is utilized if you randomly sample different sets of items from two separate control groups.

Example:

A scientist creates a chemical which he believes changes the temperature of water. He applies this chemical to water and takes the following measurements:

70, 74, 76, 72, 75, 74, 71, 71

He then measures temperature in samples which the chemical was not applied.

74, 75, 73, 76, 74, 77, 78, 75

Can the scientist conclude, with a 95% confidence interval, that his chemical is in some way altering the temperature of the water?

For this, we will use the code:

N1 <- c(70, 74, 76, 72, 75, 74, 71, 71)

N2 <- c(74, 75, 73, 76, 74, 77, 78, 75)

t.test(N2, N1, alternative = "two.sided", var.equal = TRUE, conf.level = 0.95)

Which produces the output:

Two Sample t-test

data: N2 and N1
t = 2.4558, df = 14, p-value = 0.02773
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.3007929 4.4492071
sample estimates:
mean of x mean of y
75.250 72.875

# Note: In this case, the 95 percent confidence interval is measuring the difference of the mean values of the samples. #

# An additional option is available when running a two sample t-test, The Welch Two Sample T-Test. To utilize this option while performing a t-test, the "var.equal = TRUE" must be changed to "var.equal = FALSE". The output produced from a Welch Two Sample t-test is slightly more robust and accounts for differing sample sizes. #

From this output we can conclude:

With a p-value of 0.02773 (.0.02773 < .05), and a corresponding t-value of 2.4558, we can state that, at a 95% confidence interval, that the scientist's chemical is altering the temperature of the water.

Application of Cohen’s d

length(N1) # 8 #
length(N2) # 8 #

mean(N1) # 72.875 #
mean(N2) # 75.25 #

sd(N1) # 2.167124 #
sd(N2) # 1.669046 #

# length of sample 1 (x) #
lenx <- 8
# length of sample 2 (y) #
leny <- 8
# mean of sample 1 (x) #
meanx <- 72.875
# mean of sample 2 (y)#
meany <- 75.25
# SD of sample 1 (x) #
sdx <- 2.167124
# SD of sample 2 (y) #
sdy <- 1.669046

varx <- sdx^2
vary <- sdy^2
lx <- lenx - 1
ly <- leny - 1
md <- abs(meanx - meany) ## mean difference (numerator)
csd <- lx * varx + ly * vary
csd <- csd/(lx + ly)
csd <- sqrt(csd) ## common sd computation
cd <- md/csd ## cohen's d

cd

Which produces the output:

[1] 1.227908

##################################

From this output we can conclude:

The effect size for this analysis (d = 1.23) was found to exceed Cohen’s convention for a large effect (d = .80).

Combining both conclusions, our final written product would resemble:

With a p-value of 0.02773 (.0.02773 < .05), and a corresponding t-value of 2.4558, we can state that, at a 95% confidence interval, that the scientist's chemical is altering the temperature of the water.

The effect size for this analysis (d = 1.23) was found to exceed Cohen’s convention for a large effect (d = .80).

And that is it for this article.

Until next time,

-RD

Friday, October 16, 2020

(R) Fisher’s Exact Test

In today’s entry, we are going to briefly review Fisher’s Exact Test, and its appropriate application within the R programming language.

Like the F-Test, Fisher’s Exact Test utilizes the F-Distribution as its primary mechanism of functionality. The F-Distribution being initially derived by Sir. Ronald Fisher.

(The Man)

(The Distribution)

The Fisher’s Exact Test is very similar to The Chi-Squared Test. Both tests are utilized to assess categorical data classifications. The Fisher’s Exact Test was designed specifically for 2x2 contingency sorted data, though, more rows could theoretically be added if necessary. A general rule for application as it relates to selecting the appropriate test for the given circumstances (Fisher’s Exact vs. Chi-Squared), pertains directly to the sample size. If a cell within the contingency table would contain less than 5 observations, a Fisher’s Exact Test would be more appropriate.

The test itself was created for the purpose of studying small observational samples. For this reason, the test is considered to be “conservative”, as compared to The Chi-Squared Test. Or, in layman terms, you are less likely to reject the null hypothesis when utilizing a Fisher’s Exact Test, as the test errs on the side of caution. As previously mentioned, the test was designed for smaller observational series, therefore, its conservative nature is a feature, not an error.

Let’s give it a try in today’s…

Example:

A professor instructs two classes on the subject of Remedial Calculus. He believes, based on a book that he recently completed, that students who consume avocados prior to taking an exam, will generally perform better than students who did not consume avocados prior to taking an exam. To test this hypothesis, the professor has one of classes consume avocados prior to a very difficult pass/fail examination. The other class does not consume avocados, and also completes the same examination. He collects the results of his experiment, which are as follows:

Class 1 (Avocado Consumers)

Pass: 15

Fail: 5

Class 2 (Avocado Abstainers)

Pass: 10

Fail: 15

It is also worth mentioning that professor will be assuming an alpha value of .05.

# The data must first be entered into a matrix #

Model <- matrix(c(15, 10, 5, 15), nrow = 2, ncol=2)

# Let’s examine the matrix to make sure everything was entered correctly #

Model

Console Output:

[,1] [,2]
[1,] 15 5
[2,] 10 15

# Now to apply Fisher’s Exact Test #

fisher.test(Model)

Console Output:

Fisher's Exact Test for Count Data

data: Model
p-value = 0.03373
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.063497 20.550173
sample estimates:
odds ratio
4.341278

Findings:

Fisher’s Exact Test was applied to our experimental findings for analysis. The results of such indicated a significant relationship as it pertains to avocado consumption and examination success: 75% (15/20), as compared to non-consumption and examination success: 40% (10/25); (p = .03).

If we were to apply the Chi-Squared Test to the same data matrix, we would receive the following output:

# Application of Chi-Squared Test to prior experimental observations #

chisq.test(Model, correct = FALSE)

Console Output:

Pearson's Chi-squared test

data: Model
X-squared = 5.5125, df = 1, p-value = 0.01888

Findings:

As you might have expected, the application of the Chi-Squared Test yielded an even smaller p-value! If we were to utilize this test in lieu of The Fisher’s Exact Test, our results would also demonstrate significance.

That is all for this entry.

Thank you for your patronage.

I hope to see you again soon.

-RD

Wednesday, October 14, 2020

Why Isn’t My Excel Function Working?! (MS-Excel)

Even an old data scientist can learn a new trick every once in a while.

Today was such a day.

Imagine my shock, as I spent about two and a half hours trying to get the most basic MS-Excel Functions to correctly execute.

This brings us to today’s example.

I’m not sure if this is now a default option within the latest version of Excel, or why this option would even exist, however, I feel that it is my duty to warn you of its existence.

For the sake this demonstration, we’ll hypothetically assume that you are attempting to write a =COUNTIF function within cell: C2, in order assess the value contained within cell: A2. If we were to drag this formula to the cells beneath: C2, in order to apply the function to cells: C3 and C4, a mis-application occurs, as the value “Car” is not contained within A3 or A4, and yet, the value 1 is returned.

If this “error” arises, it is likely due to the option “Manual” being pre-selected within the “Calculator Options” drop-down menu, which itself, is contained within the “Formulas” ribbon menu. To remedy this situation, change the selection to “Automatic” within the “Calculator Options” drop down.

(Click on image to enlarge)

The result should be the previously expected outcome:

Instead of accidentally and unknowingly encountering this error/feature in a way which is detrimental to your research, I would always recommend checking that “Calculator Options” is set to “Automatic”, prior to beginning your work within the MS-Excel platform.

I hope that you found this article useful.

I’ll see you in the next entry.

-RD

Tuesday, October 6, 2020

Averaging Across Variable Columns (SPSS)

There may be a more efficient way to perform this function, as simpler functionality exists within other programming languages. However, I have not been able to discover a non “ad-hoc” method for performing this task within SPSS.

We will assume that we are operating within the following data set:

Which possesses the following data labels:

Assuming that all variables are on a similar scale, we could create a new variable by utilizing the code below:

COMPUTE CatSum=MEAN(VarA,
VarB,
VarC).
EXECUTE.

This new variable will be named “CatSum”. This variable will be comprised of the mean of the sum of each variable’s corresponding observational data rows: (“VarA”, “VarB”, “VarC”).

To generate the mean value of our newly created “CatSum” variable, we would execute the following code:

DESCRIPTIVES VARIABLES=CatSum
/STATISTICS=MEAN STDDEV.

This produces the output:

To reiterate what we are accomplishing by performing this task, we are simply generating the mean value of the sum of variables: “VarA”, “VarB”, “VarC”.

Another way to conceptually envision this process, is to imagine that we are placing all of the variables together into a single column:

After which, we are generating the mean value of the column which contains all of the combined variable observational values.

And that, is that!

At least, for this article.

Stay studious in the interim, Data Heads!

- RD

Tuesday, September 8, 2020

How to Beautify your (SPSS) Outputs with MODIFY

** (Clicking on the any of the images displayed below will enlarge their contents) **

First, we will address the steps necessary to suppress unnecessary and unwanted columns within the SPSS Frequency tables.

The process to enable the MODIFY functionality is rather complicated. However, if you follow the steps below, you too will be able to have beautiful outputs without having to endeavor upon a lengthy manual cleanup process.

Steps Necessary to Enable the MODIFY Command

1. Un-install SPSS.

2. Install the latest version of Python Programming Language (3.x). The executable installer can be found here: www.python.org.

(NOTE: THIS STEP MUST STILL BE ADHERED TO, EVEN IF ANACONDA PYTHON HAS ALREADY BEEN PREVIOUSLY INSTALLED.)

3. Re-install SPSS. During the installation process, be sure to make all of the appropriate selections necessary to install the SPSS Python Libraries.

4. From the top menu within SPSS’s data view, select the menu title “Extensions”, then select the option “Extension Hub”.

5. Within the “Explore” tab of the “Extension Hub” menu, search for “SPSSINC MODIFY TABLES” within the left search bar.

6. Check the box “Get extension” to the right of “SPSSINC_MODIFY_TABLES”, then click “OK”.

7. The next screen should confirm that the installation of the extension has occurred.

Steps Necessary to Utilize the MODIFY Command

We are now prepared to obliterate all of those pesky ‘Percent’ and ‘Cumulative Percent’ tables from existence! In order to achieve this as it applies to all tables within the output section, create and run the following lines of syntax subsequent to frequency table creation.

SPSSINC MODIFY TABLES subtype="Frequencies"

SELECT='Cumulative Percent' 'Percent'

DIMENSION= COLUMNS

PROCESS = ALL HIDE=TRUE

/STYLES APPLYTO=DATACELLS.

Steps Necessary to Remove the top Frequency Rows Which Accompany Frequency Table Output

In order to suppress the creation of the type of table depicted above, you must modify your initial frequency syntax.

Instead of utilizing syntax such as:

FREQUENCIES VARIABLES=Q1 Q2 Q3

/ORDER=ANALYSIS.

You are instead forced to utilize a more verbose syntax:

OMS SELECT ALL /EXCEPTIF SUBTYPES='Frequencies'

/DESTINATION VIEWER=NO.

FREQUENCIES VARIABLES= Q1 Q2 Q3

/ORDER=ANALYSIS.

OMSEND.

Doing such adds lines of code. However, it is worth the effort. At least, in my opinion. As the offset to the trade is peace of mind.

How to Suppress Syntax from Printing within the SPSS Output

In order to suppress syntax from printing within the SPSS Output widow, prior to creating output, follow the steps below.

1. From the top menu within SPSS’s data view, select the menu title “Edit”, then select the option “Options”.

2. Within the subsequent menu, select the tab “Viewer”. Then, remove the check mark located to the left of “Display commands in the log”. Next, click “Apply”.

You are now prepared to create SPSS session output devoid of syntax.

How to Modify the Visual Style of SPSS Table Output

If you’d prefer a different, perhaps more readable SPSS table output, the following steps allow for the modification of such.

1. Create a table within SPSS which complies with the system default output style.

2. Right click on the table within the output, and select the options “Edit Content”, “In Separate Window” within the drop down menu.

3. Selecting “Format”, followed by “Table Looks” from the top menu, presents a new pop-up menu which allows for general table alterations.

As an example, select “ClassicLook” from the “TableLook Files:” menu.

Next, click the right “Edit Look” button, then click the tab “Cell Formats”. Within this submenu, the general background of table cells can be modified. Be sure to click “Apply” before clicking “OK”.

4. To save a custom “Look”, again select “TableLooks” from the “Format” menu. Select “Save Look”, with “<As Displayed>” selected within the right “TableLook Files” menu.

5. To load this look so that it is applied to all future outputs, select “Edit” from the top main SPSS Data View menu. Then select “Options” from the drop down menu followed by the tab “Pivot Tables”. Select the “Browse” button from beneath the “Table View” menu, then select the new look which you created.

6. Clicking “Apply”, followed by “OK”, will apply this look to all future tables created during the duration of the SPSS session.

If you ever want to revert back to the default look, follow the previous steps, and select “<System Default>” from the leftmost “TableLook” menu.

Monday, August 31, 2020

(R) Markov Chains

Per Wikipedia, “A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state of the attained in the previous event”.

Explained in a less broad manner, a Markov chain could be described as a way of assessing probabilistic systems by assessing fluidity as it applies to both a single variable, and the other variables contained within a system.

For example, in the case of weather systems, a day which is cloudy may subsequently be followed by a day which is also cloudy, a day without clouds, or a rainy day. However, the probability of each subsequent event will undoubtedly be impacted by the composition of the current state.

Another example of the applied methodology is assessment of market share. If company A offers a product which potentially retains 60% of its current consumers annually, but also has the potential to lose 40% of that consumer base to company B on an annual basis, and company B potentially retains 80% of its current annually, but also has the potential to lose 20% of that consumer base to company A, what is the impact of the phenomenon described on an annual basis?

Let’s explore both examples:

First, we’ll create a model which can predict weather.

We’ll assume that the following probabilities appropriately describe the autumn forecasts for weather in Winnipeg.

Cloudy Clear Snowy Rainy

Cloudy 33% 17% 25% 25%

Clear 25% 50% 12% 13%

Snowy 19% 15% 33% 33%

Rainy 20% 20% 10% 50%

To further understand this probability matrix, assume that currently the day’s forecast in Winnipeg is “Cloudy”. This would typically indicate that the following day would have weather which is either “Cloudy” (33%), “Clear” (17%), “Snowy” (25%), or “Rainy” (25%).

Now, we’ll run the information through the R-Studio platform:

EXAMPLE A – Weather Model

# With the libraries ‘markovchain’ and ‘diagram’ downloaded and enabled #

# Create a Transition Matrix #

trans_mat <- matrix(c(.33, .17, .25, .25, .25, .50, .12, .13, .19, .15, .33, .33, .20, .20, .10, .50),nrow = 4, byrow = TRUE)

stateNames <- c("Cloudy","Clear", "Snowy", "Rainy")

row.names(trans_mat) <- stateNames

colnames(trans_mat) <- stateNames

# Check input #

trans_mat

# Console Output #

Cloudy Clear Snowy Rainy
Cloudy 0.33 0.17 0.25 0.25
Clear 0.25 0.50 0.12 0.13
Snowy 0.19 0.15 0.33 0.33
Rainy 0.20 0.20 0.10 0.50

# Create a Discrete Time Markov Chain #

disc_trans <- new("markovchain",transitionMatrix=trans_mat, states=c("Cloudy","Clear", "Snowy", "Rainy"), name="Weather")

# Check input #

disc_trans

# Console Output #

Weather
A 4 - dimensional discrete Markov Chain defined by the following states:
Cloudy, Clear, Snowy, Rainy
The transition matrix (by rows) is defined as follows:
Cloudy Clear Snowy Rainy
Cloudy 0.33 0.17 0.25 0.25
Clear 0.25 0.50 0.12 0.13
Snowy 0.19 0.15 0.33 0.33
Rainy 0.20 0.20 0.10 0.50

# Illustrate the Matrix Transitions #

plotmat(trans_mat,pos = NULL,

lwd = 1, box.lwd = 2,

cex.txt = 0.8,

box.size = 0.1,

box.type = "circle",

box.prop = 0.5,

box.col = "light yellow",

arr.length=.1,

arr.width=.1,

self.cex = .4,

self.shifty = -.01,

self.shiftx = .13,

main = "")

This produces the output graphic:

(As it pertains to the graphic- something important to note is the direction of the arrows. The arrow direction in the graphic is inverted. Therefore, I would only use the graphic as an auxiliary for personal reference.)

# We will assume that the current forecast is cloudy by creating the vector below #

Current_state<-c(1, 0, 0, 0)

# Now we will utilize the following code to predict the weather for tomorrow #

steps<-1

finalState<-Current_state*disc_trans^steps

finalState

# Console Output #

Cloudy Clear Snowy Rainy
[1,] 0.33 0.17 0.25 0.25

This output indicates that tomorrow will have a 33% chance of being cloudy, a 17% chance of being clear, a 25% chance of being snowy, and a 25% chance of being rainy.

# Let’s predict the weather for the following day #

steps<-2

finalState<-Current_state*disc_trans^steps

finalState

# Console Output #

Cloudy Clear Snowy Rainy
[1,] 0.2428372 0.2621651 0.1839856 0.311012

With this information, we can assume that generally there is a 24% chance of rain, a 26% chance of the day being clear, an 18% of the day being snowy, and a 31% chance of the day being rainy.

It would be helpful if the rounded figures summed to 1. But I think that you probably understand the example regardless.

EXAMPLE A – Market Share

Let’s re-visit our market share example:

Company A offers a product which potentially retains 60% of its current consumers annually, but also has the potential to lose 40% of that consumer base to company B on an annual basis, and company B potentially retains 80% of its current annually, but also has the potential to lose 20% of that consumer base to company A, what is the impact of the phenomenon described on an annual basis?

Let’s make a few assumptions.

First, we will assume that the projection given above is accurate.

Next, we’ll assume that the total customer base as it pertains to the product is 60,000,000.

Finally, we’ll assume that the Company A possesses 20% of this market, and Company B possesses 80% of this market. 12,000,000 individuals and 48,000,000 respectively.

# With the libraries ‘markovchain’ and ‘diagram’ downloaded and enabled #

# Create a Transition Matrix #

trans_mat <- matrix(c(0.6,0.4,0.8,0.2),nrow = 2, byrow = TRUE)

stateNames <- c("Company A","Company B")

row.names(trans_mat) <- stateNames

colnames(trans_mat) <- stateNames

# Check input #

trans_mat

# Console Output #

Company A Company B
Company A 0.6 0.4
Company B 0.8 0.2

# Create a Discrete Time Markov Chain #

disc_trans <- new("markovchain",transitionMatrix=trans_mat, states=c("Company A","Company B"), name="Market Share")

disc_trans

# Check input #

disc_trans

# Console Output #

Market Share
A 2 - dimensional discrete Markov Chain defined by the following states:
Company A, Company B

The transition matrix (by rows) is defined as follows:
Company A Company B
Company A 0.6 0.4
Company B 0.8 0.2

# Illustrate the Matrix Transitions #

plotmat(trans_mat,pos = NULL,

lwd = 1, box.lwd = 2,

cex.txt = 0.8,

box.size = 0.1,

box.type = "circle",

box.prop = 0.5,

box.col = "light yellow",

arr.length=.1,

arr.width=.1,

self.cex = .4,

self.shifty = -.01,

self.shiftx = .13,

main = "")

This produces the output graphic:

(Again, as it pertains to the graphic- something important to note is the direction of the arrows. The arrow direction in the graphic is inverted. Therefore, I would only use the graphic as an auxiliary for personal reference.)

# We will assume that the market share is as follows #

# This reflects the information provided in the example description above #

Current_state<- c(0.20,0.80)

# Now we will utilize the following code to predict the market share for the next year #

steps<-1

finalState<-Current_state*disc_trans^steps

finalState

# Console Output #

Company A Company B
[1,] 0.76 0.24

As illustrated, one year out, Company A now controls 76% of the market share (45,600,000)*, and Company B controls 24% of the market share (14,400,000).

* Assuming that original market share does not increase or decline in overall individuals. The calculation for the figures is: 60,000,000 * .76 and 60,000,000 * .24.

Similar to our previous example, we can also project the current trend for multiple consecutive time periods.

# The following code to predicts the market share for the following two years #

steps<-2

finalState<-Current_state*disc_trans^steps

finalState

# Console Output #

Company A Company B
[1,] 0.648 0.352

Steady state in the case of this example, will predict the potential equilibrium which will be reached if the trends continue ad infinitum.

# Steady state Matrix #

steadyStates(disc_trans)

# Console Output #

Company A Company B
[1,] 0.6666667 0.3333333

Company A in this scenario now controls approximately 66.66% of the market share, and Company B controls 33.33% of the market share.

Tuesday, August 25, 2020

(R) Exotic Analysis – Distance Correlation T-Test

In prior articles, I explained the various test of correlation which are available within the R programming language. One of those methods which was described but is rarely utilized outside of the textbook, is the Distance Correlation T-Test methodology.

In this entry, I will briefly explain when it is appropriate to utilize the distance correlation, and how to appropriate apply the methodology within the R framework.

Now I must begin by stating that what I am about to describe is uncommon, and should only be utilized in situations which absolutely warrant application.

The distance correlation as described within the context of this blog is:

Distance Correlation – A method which tests model variables for correlation through the utilization of a Euclidean distance formula.

So when would I apply the Distance Correlation T-Test? To answer this question, only in situations in which other correlation methods are inapplicable. In the case which I am about to demonstrate, an example of the inapplicability of other methods would be situations in which one variable is continuous, and the other is categorical.

Example:

(This example requires that the R package: “energy”, be downloaded and enabled.)

# Data Vectors #

x <- c(8, 1, 4, 10, 8, 10, 3, 1, 1, 2)
y <- c(97, 56, 97, 68, 94, 66, 81, 76, 86, 69)

dcor.ttest(x, y)

mean(x)

sd(x)

mean(y)

sd(y)

This produces the output:

dcor t-test of independence

data: x and y
T = -0.1138, df = 34, p-value = 0.545
sample estimates:
Bias corrected dcor
-0.01951283

> mean(x)
[1] 4.8
> sd(x)
[1] 3.794733
> mean(y)
[1] 79
> sd(y)
[1] 14.3527

Conclusion:

There was a not significant difference in GROUP X (M = 4.80, SD = 3.79), as compared to GROUP Y (M = 79, SD = 14.35), t(34) = -0.11, p = .55.

However, you may be wondering, what is the difference between the Distance Correlation T-Test, the Distance Correlation Method, and the Pearson Test of Correlation?

Distance Correlation T-Test – Utilized to test for significance in situations in which one variable is continuous, and the other is categorical. This method can also be utilized in other situations, however, if both variables are continuous, then the Pearson Test of Correlation is most appropriate.

Distance Correlation Method – Utilized to test for correlation between two variables when assessed through the application of the Euclidean Distance Formula. This model output value is similar to coefficient of determination, in that, it can range from 0 (no correlation), to 1 (perfect correlation).

The Pearson Test of Correlation – Utilized to determine if values are correlated. This method should typically be utilized above all other tests of correlation. However, it is only appropriate to utilize this method when both variables are continuous.