Reflections of a Data Scientist: (R) Fisher's Exact Test

Previously, a series of articles were featured on this site which specifically addressed the concept of non-parametric tests.

In continuing to explore this subject matter, today's entry will present an additional test of similar nature, the Fisher's Exact Test.

If you are un-familiar with the premise of non-parametric data types, please search the term: "non -parametric", in the search box located on the right.

Fisher's Exact Test

Fisher's Exact Test provides a non-parametric alternative to the chi-square test, as normality is a requirement for the chi-square test's utilization. This test is sometimes confused with the "F-Test", which was named in honor of biologist and statistician Ronald Fisher. However, Ronald Fisher himself derived this particular test, and as such, it bears his name.

Example:

While working as a statistician at a local university, you are tasked to evaluate, based on survey data, the level of job satisfaction that each member of the staff currently has for their occupational role. The data that you gather from the surveys is as follows:

General Faculty
130 Satisfied 20 Unsatisfied (Total 150 Members of General Faculty)

Professors
30 Satisfied 20 Unsatisfied (Total 50 Professors)

First, we will need to input this survey data into R as a matrix. This can be achieved by utilizing the code below:

# Fisher's Exact Test #

model <- matrix(c(130, 30, 20, 20), ncol=2)

The result should resemble:

To perform the Fisher's exact test, the following code is utilized:

fisher.test(model, conf.level = .95)

This produces the output:

Fisher's Exact Test for Count Data

data: model
p-value = 0.0001467
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.933511 9.623258
sample estimates:
odds ratio

4.294116

The p-value output will be examined for significance. It is this value which is analyzed to assess the stated hypothesis, which is as follows:

H0: The association is random.
HA: The association is NOT random.

Since the p-value contained within the output (0.0001467), is less than our assumed alpha (.05), we will reject the null hypothesis, and assume that the association is not random.

The odds ratio is calculated in the following manner:

a <- (130 / 20)

b <- (30 / 20)

c <- a/b

c

Which produces the output:

[1] 4.333333

That’s it for now, Data Heads. Stay subscribed for more interesting articles!

Reflections of a Data Scientist

Monday, April 23, 2018

(R) Fisher's Exact Test

No comments:

Post a Comment