Reflections of a Data Scientist: (R) Fisher’s Exact Test

In today’s entry, we are going to briefly review Fisher’s Exact Test, and its appropriate application within the R programming language.

Like the F-Test, Fisher’s Exact Test utilizes the F-Distribution as its primary mechanism of functionality. The F-Distribution being initially derived by Sir. Ronald Fisher.

(The Man)

(The Distribution)

The Fisher’s Exact Test is very similar to The Chi-Squared Test. Both tests are utilized to assess categorical data classifications. The Fisher’s Exact Test was designed specifically for 2x2 contingency sorted data, though, more rows could theoretically be added if necessary. A general rule for application as it relates to selecting the appropriate test for the given circumstances (Fisher’s Exact vs. Chi-Squared), pertains directly to the sample size. If a cell within the contingency table would contain less than 5 observations, a Fisher’s Exact Test would be more appropriate.

The test itself was created for the purpose of studying small observational samples. For this reason, the test is considered to be “conservative”, as compared to The Chi-Squared Test. Or, in layman terms, you are less likely to reject the null hypothesis when utilizing a Fisher’s Exact Test, as the test errs on the side of caution. As previously mentioned, the test was designed for smaller observational series, therefore, its conservative nature is a feature, not an error.

Let’s give it a try in today’s…

Example:

A professor instructs two classes on the subject of Remedial Calculus. He believes, based on a book that he recently completed, that students who consume avocados prior to taking an exam, will generally perform better than students who did not consume avocados prior to taking an exam. To test this hypothesis, the professor has one of classes consume avocados prior to a very difficult pass/fail examination. The other class does not consume avocados, and also completes the same examination. He collects the results of his experiment, which are as follows:

Class 1 (Avocado Consumers)

Pass: 15

Fail: 5

Class 2 (Avocado Abstainers)

Pass: 10

Fail: 15

It is also worth mentioning that professor will be assuming an alpha value of .05.

# The data must first be entered into a matrix #

Model <- matrix(c(15, 10, 5, 15), nrow = 2, ncol=2)

# Let’s examine the matrix to make sure everything was entered correctly #

Model

Console Output:

[,1] [,2]
[1,] 15 5
[2,] 10 15

# Now to apply Fisher’s Exact Test #

fisher.test(Model)

Console Output:

Fisher's Exact Test for Count Data

data: Model
p-value = 0.03373
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.063497 20.550173
sample estimates:
odds ratio
4.341278

Findings:

Fisher’s Exact Test was applied to our experimental findings for analysis. The results of such indicated a significant relationship as it pertains to avocado consumption and examination success: 75% (15/20), as compared to non-consumption and examination success: 40% (10/25); (p = .03).

If we were to apply the Chi-Squared Test to the same data matrix, we would receive the following output:

# Application of Chi-Squared Test to prior experimental observations #

chisq.test(Model, correct = FALSE)

Console Output:

Pearson's Chi-squared test

data: Model
X-squared = 5.5125, df = 1, p-value = 0.01888

Findings:

As you might have expected, the application of the Chi-Squared Test yielded an even smaller p-value! If we were to utilize this test in lieu of The Fisher’s Exact Test, our results would also demonstrate significance.

That is all for this entry.

Thank you for your patronage.

I hope to see you again soon.

-RD

Reflections of a Data Scientist

Friday, October 16, 2020

(R) Fisher’s Exact Test

No comments:

Post a Comment