Reflections of a Data Scientist: (R) McNemar Test (SPSS)

(Introduction to Concept in R)

(Introduction to Concept in SPSS)

The McNemar Test, pronounced Mac-Ne-Mar, and not Mic-Nee-Mar, is a method utilized to test marginal probabilities for paired nominal data sets contained within a 2 x 2 contingency table. This test utilizes the chi-squared distribution as an aspect of its methodology, therefore, it is often confused with Pearson's Chi-Squared Test. I am aware that all of this may appear very confusing at first, however, as you continue to read this entry, I can assure you that this befuddlement will cease.

Let's begin by illustrating what a McNemar Test is not. We will achieve this through an explanation of the differences between McNemar and Chi-squared. The most evident difference between the two tests is that a chi-squared test, is not limited to two rows and two columns. Additionally, each data structure, regardless of construction, is assembled to explain and assess a very specific conceptual inquiry.

Within the chi-squared matrix, if a goodness of fit evaluation is not being performed, then each row represents a different category, with each column representing a different segment within that category. If the sum is taken from across each row, the resulting figure will represent the total count of the values contained within the entire category. This will be illustrated within our chi-squared example problem.

A McNemar Test matrix is limited to 2 columns, each containing 2 rows. Unlike the chi-squared matrix, one single category is split between four separate segments. Therefore, the total count of values within the table category is the sum of all cell values within the table.

WARNING – PRIOR TO UTILIZNG ANY OF THE FOLLOWING TESTS, PLEASE READ EACH EXAMPLE THOROUGHLY!!!!!

Example (Chi-Squared):

While working as a statistician at a local university, you are tasked to evaluate, based on survey data, the level of job satisfaction that each member of the staff currently has for their occupational role. The data that you gather from the surveys is as follows:

General Faculty
130 Satisfied 20 Unsatisfied (Total 150 Members of General Faculty)

Professors
30 Satisfied 20 Unsatisfied (Total 50 Professors)

Adjunct Professors
80 Satisfied 20 Unsatisfied (Total 50 Adjunct Professors)

Custodians
20 Satisfied 10 Unsatisfied (Total 30 Custodians)

The question remains however, as to whether the assigned role of each staff member, has any impact on the survey results. To decide this, with 95% confidence, you must follow the subsequent steps.

First, we will need to input this survey data into R as a matrix. This can be achieved by utilizing the code below:

Model <- matrix(c(130, 30, 80, 20, 20, 20, 20, 10), nrow = 4, ncol=2)

The result should resemble:

Once this step has been completed, the next step is as simple as entering the code:

chisq.test(Model)

Console Output:

Pearson's Chi-squared test

data: Model
X-squared = 18.857, df = 3, p-value = 0.0002926

Findings:

Degrees of Freedom (df) - 3
Confidence Interval (CI) - .95
Alpha (a) (1-CI) - .05
Chi Square Test Statistic - 18.857

This creates the hypothesis test parameters:

H0 : There is a not correlation between job type and job satisfaction (Null Hypothesis). Job type and job satisfaction are independent variables.

HA: There is a correlation between job type and job satisfaction. Job type and job satisfaction are not independent variables.

With p-value less than .05, (.0002926 < .05), we can state, that with 95 % confidence, that there is a correlation between job type and overall satisfaction.

Reject: Null Hypothesis.

Example (McNemar's Test):

Typically, examples from the McNemar Test are created to emulate drug trials. For that reason, and also due to this example type best exemplifying all aspects of the test, our example will be structured in a similar manner.

In our fictitious drug trial, individuals are gathered from a select demographic which is particularly susceptible to heart disease. A new drug has been synthesized which has been created to prevent the onset of heart disease, and cure it in individuals who are already afflicted with such. The data is organized in the following contingency table (2x2).

# Code to create contingency table #

medication <-
matrix(c(75, 55, 22, 44),
nrow = 2,
dimnames = list("Affliction" = c("Heart Disease Prior: Present", "Heart Disease Prior: Absent"),
"Drug Trial" = c("After Drug Trial: Present", "After Drug Trial: Absent")))

# Code to print output to the console window #

medication

Which creates the output:

Drug Trial

Affliction After Drug Trial: Present After Drug Trial: Absent
Heart Disease Prior: Present 75 22
Heart Disease Prior: Absent 55 44

As you can observe from the output, there is on single category, the patient group for the drug trial.

This group is segmented into four categories.

Those with heart disease prior to the trial, who still were afflicted after the trial: 75

Those without heart disease prior to the trial, who were afflicted with heart disease after the trial: 55

Those with heart disease prior to the trial, who did not have heart disease after the trial: 22

Those without heart disease prior to the trial, who still were un-afflicted after the trial: 44

The total number of participants who participated in this trial: 196 (75 + 55 + 22 + 44)

# Additionally, a more basic method for assembling the matrix can be achieved through the utilization of the following code #

medication <- matrix(c(75, 55, 22, 44), nrow = 2, ncol=2)

# Code to print output to the console window #

medication

Which creates the output:

[,1] [,2]
[1,] 75 22
[2,] 55 44

To perform the McNemar Test, utilize the following code:

# Code to perform the McNemar Test #

mcnemar.test(medication)

This produces the following output:

McNemar's Chi-squared test with continuity correction

data: medication
McNemar's chi-squared = 13.299, df = 1, p-value = 0.0002656

From this output we can address our hypothesis.

This hypothesis is stated as such:

H0: pb = pc
HA: pb NE pc

Or:

Null: The two marginal probabilities for each outcome as are the same.

Alternative: The two marginal probabilities for each outcome as are NOT the same.

What we are seeking to investigate, is the potential significant change in those who were treated with the drug, in comparison to those who were not.

With p-value less than .05, (0.0002656 < .05), will reject the null hypothesis, and we will conclude that at a 95% confidence interval, that the new experimental drug is having a significant impact on trial participants.

Now let’s re-create the same calculation within the SPSS platform.

Example (McNemar's Test - SPSS):

The data must be structured slightly differently in SPSS to perform this analysis. Though it cannot be seen from this data view, each row of observational data must be populated to match the previously stated frequencies.

Therefore, there will be a total of 196 rows.

75 Rows will have “Present” listed in the first column, and “Present” listed in the second column.

55 Rows will have “Absent” listed in the first column, and “Present” listed in the second column.

22 Rows will have “Present” listed in the first column, and “Absent” listed in the second column.

44 Rows will have “Absent” listed in the first column, and “Absent” listed in the second column.

Once this data set has been assembled, the finished product should resemble:

There are two methods which can be utilized within SPSS to perform The McNemar Test.

The first method can be performed by following the steps below:

To begin, select “Analyze”, followed by “Nonparametric Tests”, and then “Related Samples”.

This populates the following screen:

Within the options tab, select the two column variables which will be utilized for analysis. In the case of our example, the column variables which will act as “Test Fields” are “PreTrial” and “PostTrial”. The middle arrow button can be utilized to designate this distinction.

Once this has been completed, select the option tab “Settings”.

From this tab, select the option “Customize tests”, then select the box to the right of “McNemar’s test”. After these steps are complete, click on the “Run” button.

This should produce the output below:

Double clicking on this output summary produces additional output.

Example (McNemar's Test - SPSS) (cont.):

Below is a separate method which can be utilized within SPSS to perform The McNemar Test. The output that is produced from this alternative path of synthesis differs from the prior output produced.

To begin, select “Analyze”, followed by “Descriptive Statistics”, and then “Crosstabs”.

Through the utilization of the middle arrow buttons, we will designate “Pretrial” as our “Row(s)” variable, and “PostTrial” as our “Column(s)” variable.

After clicking the right menu button labeled “Statistics”, select the box adjacent to the option “McNemar”, then click continue. After completing this series of steps, click “OK” from the initial menu.

This should generate the output below:

The crosstabulation diagram presents us with a frequency table pertaining to the column variables contained within the data sheet. The McNemar Test, as you recall from our previous R example, can be performed with this summary information within the R platform.

Prior to discussing the Chi-Squared Test table, I would like address the WARNING that was issued at the beginning of this article.

*** WARNING ***

In the last article, I issued a similar warning which pertained to, in the context of the article, SPSS not providing correct model output. I hypothesized that this could be potentially due to SPSS shifting model parameters in order to assist the user. However, the danger of this functionality, is that SPSS does not explicitly alert the user to this practice. As a result of such, SPSS may be inadvertently misleading the user by attempting to provide a more appropriate model.

The output for the chi-squared test in this instance, is not accurate. In all actuality, it is not a chi-squared value at all. The result of the analyzation provided by SPSS are the results generated from a Yates Correction. The chart’s footnote states, “Binomial distribution used”, this should be evidence to the vigilant that something strange is happening, as the McNemar Test utilizes a chi-squared distribution.

Therefore, I would not recommend the SPSS package for performing a McNemar test, and would instead utilize the R platform.

That’s all for now. Stay active, Data Heads!

Reflections of a Data Scientist

Monday, March 26, 2018

(R) McNemar Test (SPSS)

No comments:

Post a Comment