Reflections of a Data Scientist: (R) Scheirer-Ray-Hare Test

In this final article in a series of articles pertaining to non-parametric tests, I’ve decided to conclude with a test which is not incredibly well know, and therefore, is likely under-utilized. The Scheirer-Ray-Hare Test is so currently obscure at this present time, that not even a Wikipedia article exists to discuss its purpose. Since this test has not yet been adopted by the wider research community, it therefore has not been subject to the scrutiny which establishes certain methods as legitimate methods of analysis. I would be hesitant to utilize this model for this reason, and I would certainly recommend considering the experiment which is being conducted prior to utilization.

The Scheirer-Ray-Hare Test is one of a kind, in that, it is the only test that I am aware of that can be utilized as an alternative to the two way analysis of variance. The critical feature that The Scheirer-Ray-Hare Test includes that is absence in its ANOVA counterpart, is the ability to analyze two way analysis of variance data of varying levels.

Example:

This test does not have pre-established SPSS functionality, therefore, it must be utilized within the R platform.

Two way, in this scenario, is referring to the two independent variables which will be utilized within this ANOVA model.

The hypothesis for this model type will be:

1.

H0: uVar1 = uVar2 (Var1’s value does not significantly differ from Var2’s value)

H1: uVar1 NE uVar2

2.

H0: u1 = u2 = u3 =…..etc. (All means are equal)

H1: Not all means are equal.

3.

H0: An interaction is absent.

H1: An interaction is present.

Example Problem:

Researchers want to test study habits within two schools as they pertain to student life satisfaction. The researchers also believe that the school that each group of students is attending may also have an impact on study habits. Students from each school are assigned study material which in sum, totals to 1 hour, 2 hours, and 3 hours on a daily basis. Measured is the satisfaction of each student group on a scale from 1-10 after a 1 month duration. (We will assume an alpha of .05).

School A:

1 Hour of Study Time: 7, 2, 10, 2, 2, 9, 3
2 Hours of Study Time: 9, 10, 3, 10, 8, 4
3 Hours of Study Time: 3, 6, 4, 7, 1, 8, 5

School B:

1 Hour of Study Time: 8, 5, 1, 3, 10
2 Hours of Study Time: 7, 5, 6, 4, 10
3 Hours of Study Time: 5, 5, 2, 2, 2

BOLD numbers indicate un-even levels.

Entering this into R can be tricky, but stay with me:

# Scheirer-Ray-Hare Test #

# Create Set #

satisfaction <- c(7, 2, 10, 2, 2, 9, 3, 8, 5, 1, 3, 10, 9, 10, 3, 10, 8, 4, 8, 7, 5, 6, 4, 10, 3, 6, 4, 7, 1, 8, 5, 5, 5, 2, 2, 2)

studytime <- c(rep("One Hour",12), rep("Two Hours",12), rep("Three Hours",12))

school <- c(rep("SchoolA",7), rep("SchoolB",5), rep("SchoolA",7), rep("SchoolB",5), rep("SchoolA",7), rep("SchoolB",5))

schooltest <- data.frame(satisfaction, studytime, school)

# Create Ranks #

Rank_satisfaction = rank(satisfaction)

Rank_studytime = rank(studytime)
Rank_schooltest = rank(schooltest)
schooltestA <- data.frame(Rank_satisfaction, Rank_studytime, Rank_schooltest)

# Perform Test #

aov.results = aov(lm(Rank_satisfaction ~ Rank_studytime*Rank_schooltest, data=schooltestA))
summary(aov.results)

# Extract the sum of squares #

df = anova(aov.results)[,"Df"]
sum_df = sum(df)
ss = anova(aov.results)[, "Sum Sq"]
sum_ss = sum(ss)

# Calculate the MS value #

ms = sum_ss/sum_df

# Calculate H value #

H_satisfaction = ss[1]/ms
H_studytime = ss[2]/ms
H_interaction = ss[3]/ms

# Convert this into probability #

1-pchisq(H_satisfaction, df[1])
1-pchisq(H_studytime, df[2])
1-pchisq(H_interaction, df[1])

This eventually produces the output:

> 1-pchisq(H_satisfaction, df[1])
[1] 0.00489259
> 1-pchisq(H_studytime, df[2])
[1] 0.4884362
> 1-pchisq(H_interaction, df[1])
[1] 0.9447752

We utilize the above probabilities to address the various hypotheses.

In investigating the output we can make the following conclusions:

Hypothesis 1 (Stress levels DO NOT differ depending on school): .0049 < .05

Hypothesis 2 (Stress levels DO NOT differ depending on hours of study): .488 > .05

Hypothesis 3 (The combination of school and study is NOT impacting the outcome): .945 > .05

So therefore:

Hypothesis 1: Reject!

Hypothesis 2: Fail to reject.

Hypothesis 3: Fail to reject.

We can then state:

Students of different schools did have significantly different stress levels. There was not significant difference between the levels of study time as it pertains to stress. No interaction effect was present.

That’s all for now, Data Heads! Stay tuned in for more interesting articles pertaining to methods of analysis!

* Credit is due to the creator of an instructional Youtube video, Laurence Trueman, who narrated a short instructional piece which illustrated, in great detail, how to structure this model within the R platform. The video link is as follows: https://www.youtube.com/watch?v=N729aMGIUOk

Reflections of a Data Scientist

Thursday, March 1, 2018

(R) Scheirer-Ray-Hare Test

No comments:

Post a Comment