Reflections of a Data Scientist: (R) Kruskal-Wallis H Test (SPSS)

The Kruskal-Wallis H Test is a non-parametric test that can act as an alternative to the one way analysis of variance. It is similar to both the Mann-Whitney U Test and the Wilcox Signed Rank Test, in that, it utilizes a ranking methodology to generate model output.

Example:

As a researcher, you are presented with three separate groups of data, each of which contain six sample data points. You are tasked to discover if each group of data shares the same distribution.

The samples contained in data group one are: { 7, 4, 14, 14, 6, 5 }

The samples contained in data group two are: { 10, 14, 15, 18, 9, 15 }

The samples contained in data group three are: { 1, 16, 4, 14, 2, 3 }

We will assume an alpha value of .05.

Additionally, we can state the following hypotheses:

H0: The three probability distributions are the same.
HA: The three probability distributions are not the same.

Like its parametric counterpart, the Kruskal-Wallis H Test does not require that all sample sizes contain the same number of data points.

To perform this test within R, we will utilize the following code:

# Create the data groups and the data to populate each group #

group <- c(rep("1",6), rep("2",6), rep("3",6))

data <- c(7, 4, 14, 14, 6, 5, 10, 14, 15, 18, 9, 15, 1, 16, 4, 14, 2, 3)

# Combine both the data and the groups to create a single data set #

dataset <- data.frame(data, group)

# Perfrom analysis through the utilization of the Kruskal-Wallis Test #

kruskal.test(data~group, data=dataset)

This generates the following output:

Kruskal-Wallis rank sum test

data: data by group
Kruskal-Wallis chi-squared = 5.2315, df = 2, p-value =
0.07311

Since our p-value is greater than our stated alpha value (0.07311 > .05), we will fail to reject the null hypothesis. What this is indicating, is that at 95% confidence interval, we cannot state that through the analysis of the data provided, that the three probability distributions are not the same.

Below are the steps necessary to perform the above analysis within the SPSS platform.

Example:

Like the Mann-Whitney U-Test, within SPSS, for this particular test, data must be structured in an un-conventional manner. The cases are combined into one single variable, with their group identity providing their initial designation.

Below is our data set:

From the “Analyze” menu, select “Nonparametric Tests”, then select “Legacy Dialogues”, followed by “K Independent Samples”.

This should populate the menu below:

Select “Data”, and utilize the top center arrow to designate these values as “Test Variable(s)”. Once this has been completed, utilize the bottom center arrow to designate “Group” as our “Grouping Variable”. Three groups exist, which we must specifically define. To achieve this, click “Define Groups”, then enter the value “1” into the input adjacent to “Group 1”. Next, enter the value “3” into the input adjacent to “Group 2”. Once this step has been completed, click “Continue”, and then click “OK”.

This will generate the output below:

The figure contained in the “Test Statistics” which is labeled “Asymp. Sig.” is the figure that we will be investigating.

Again, since our p-value is greater than our stated alpha value (0.07311 > .05), we will fail to reject the null hypothesis. What this is indicating, is that at 95% confidence interval, we cannot state that through the analysis of the data provided, that the three probability distributions are not the same.

Reflections of a Data Scientist

Thursday, March 1, 2018

(R) Kruskal-Wallis H Test (SPSS)

No comments:

Post a Comment