Reflections of a Data Scientist: (R) Multivariate Analysis of Variance (SPSS)

Multivariate Analysis of Variance, or MANOVA, is very similar to ANOVA in implementation. The difference lies in the number of dependent variables included within the model. A One-Way MANOVA contains more than 1 dependent variable, and 1 independent variable. A Two-Way MANOVA contains more than 1 dependent variable, and 2 independent variables. A requirement of the model, is that independent variables must be factor type variables, and dependent variables, must be continuous type variables.

One-Way MANOVA (One Independent Factorial Variable)

We’ll begin with a One-Way MANOVA example:

To begin, select “Analyze”, followed by “General Linear Model”, then select “Multivariate”.

This populates the following screen:

After selecting the independent variable “IndepFactor” as our “Fixed Factor(s)", we will select the variables “ContVarA” and “ContVarB” as our “Dependent Variables”.

Next, click “Post Hoc”, this brings up the following menu:

Select “IndepFactor” as the variable in which we will create a Post Hoc test to analyze. Select “Tukey” as the test to utilize.

Once this has been completed, we can move forward with the model creation.

The hypothesis test generated by for this model is as follows:

H0: u1 = u2 = u3 =…..etc.

H1: Not all means are equal.

To test the hypothesis, we will analyze the “Sig” value for the “Pillai’s Trace” entry for “IndepFactor”.

Assuming an alpha value of .05, we will fail to reject the null hypothesis (.863 > .05). As a result of such, we can state:

Using Pillai’s Trace, there was not a significant effect of the “IndepFactor” on ContVarA and ContVarB.

(We utilize the Pillai’s Trace Test due to its robustness. This particular test does not assume homogeneity.)

Let’s now turn to our Tukey HSD output:

Though the MANOVA and ANOVA models differ in their composition, the Tukey’s HSD post hoc test is calculated in the same manner for both methods of analysis. This is not immediately evident in the SPSS output, but can be observed in the R code which performs the underlying function.

What is being displayed in SPSS, therefore, is the combination of multiple Tukey HSD post hoc tests. Each variable’s test values are being calculated independently prior to their combination within the output.

The Column which contains “ContVarA” is displaying the interaction between the values of “IndepFactor”, as the values pertain to the variable “ContVarA”.

The Column which contains “ContVarB” is displaying the interaction between the values of “IndepFactor”, as the values pertain to the variable “ContVarB”.

We can make the following interpretations from the above table:

There was not a significant difference in “ContVarA” between “IndepFactor” value 1 and “IndepFactor” value 2.

There was not a significant difference in “ContVarA” between “IndepFactor” value 2 and “IndepFactor” value 3.

There was not a significant difference in “ContVarB” between “IndepFactor” value 1 and “IndepFactor” value 2.

There was not a significant difference in “ContVarB” between “IndepFactor” value 2 and “IndepFactor” value 3.

In R, the code that would be utilized to complete a similar process is as follows:

# Create Data Frame #

contvara <- c(12.00, 64.00, 61.00, 99.00, 52.00, 65.00, 11.00, 55.00, 19.00, 42.00, 58.00, 6.00, 68.00, 75.00, 54.00)

contvarb <- c(307.00, 122.00, 199.00, 203.00, 707.00, 620.00, 208.00, 485.00, 629.00, 592.00, 316.00, 697.00, 794.00, 489.00, 274.00)

indepfactor <- c(1.00, 2.00, 3.00, 3.00, 2.00, 1.00, 2.00, 2.00, 1.00, 3.00, 2.00, 3.00, 1.00, 1.00, 1.00)

test <- data.frame(contvara, contvarb, factor(indepfactor))

# Create MANOVA Model + Analysis #

results <- manova(cbind(contvara, contvarb) ~ factor(indepfactor), data=test)

# View Model Results #

summary(results)

# Generate Tukey's HSD for "ContVarA" #

tuk1 <- aov(lm(contvara ~ factor(indepfactor) , data=test))

TukeyHSD(tuk1)

# Generate Tukey's HSD for "ContVarB" #

tuk2 <- aov(lm(contvarb ~ factor(indepfactor) , data=test))

TukeyHSD(tuk2)

Now let’s explore a Two-Way MANOVA example. In this example, we will be utilizing two independent variables. It is the number of independent variables contained within a model which determines the number listed prior to the hyphen. Therefore, by this methodology, a MANOVA model which contains three independent variables is referred to as a Three-Way MAOVA, etc.

Two-Way MANOVA (Two Independent Factorial Variables)

We will be using the same data set from the prior exercise.

To begin, select “Analyze”, followed by “General Linear Model”, then select “Multivariate”.

This populates the following screen:

After selecting the independent variables “IndepFactor” and “IndepFactorB” as our “Fixed Factor(s)", we will select the variables “ContVarA” and “ContVarB” as our “Dependent Variables”.

Next, click “Post Hoc”, this brings up the following menu:

Select “IndepFactor” and “IndepFactorB” as the variables in which we will create a Post Hoc test to analyze. Select “Tukey” as the test to utilize.

Once this has been completed, we can move forward with the model creation.

The hypothesis test generated by for this model is as follows:

1.

H0: u1 = u2 = u3 =…..etc.

H1: Not all means are equal.

2.

H0: u1 = u2 = u3 =…..etc. (All means are equal)

H1: Not all means are equal.

3.

H0: An interaction is absent.

H1: An interaction is present.

To test the various hypothesizes, we will analyze the “Sig” value for the “Pillai’s Trace” entries pertaining to “IndepFactor”, “IndepFactorB”, and “IndepFactor * IndepFactorB”.

(Assuming an alpha value of .05)

Hypothesis 1: .695 (IndepFactor)

Hypothesis 2: .660 (IndepFactorB)

Hypothesis 3: .575 (Interaction)

Therefore:

Hypothesis 1: Fail to Reject

Hypothesis 2: Fail to Reject

Hypothesis 3: Fail to Reject

So we may state the following:

Using Pillai’s Trace, there was not a significant effect of “IndepFactor” on "ContVarA" and "ContVarB".

Using Pillai’s Trace, there was not a significant effect of the “IndepFactorB” on "ContVarA" and "ContVarB".

Using Pillai’s Trace, there was no interaction present between “IndepFactor” and “IndepFactorB”.

(We utilize the Pillai’s Trace Test due to its robustness. This particular test does not assume homogeneity.)

Let’s now turn to our Tukey HSD output:

This output is similar to the output generated from the Tukey’s HSD post hoc test of the previous model. Typically, there would be additional rows displaying data pertaining to the variable “IndepFactorB”. However, within the SPSS platform, post hoc tests are not performed if a group of factor type variable consists of fewer than two cases (ex. 1, 1, 2, 2, 3, 4, 4).

In R, the code that would be utilized to complete a similar process is as follows:

# Create Data Frame #

contvara <- c(12.00, 64.00, 61.00, 99.00, 52.00, 65.00, 11.00, 55.00, 19.00, 42.00, 58.00, 6.00, 68.00, 75.00, 54.00)

contvarb <- c(307.00, 122.00, 199.00, 203.00, 707.00, 620.00, 208.00, 485.00, 629.00, 592.00, 316.00, 697.00, 794.00, 489.00, 274.00)

indepfactor <- c(1.00, 2.00, 3.00, 3.00, 2.00, 1.00, 2.00, 2.00, 1.00, 3.00, 2.00, 3.00, 1.00, 1.00, 1.00)

indepfactorb <- c(8.00, 7.00, 8.00, 9.00, 10.00, 8.00, 5.00, 8.00, 9.00, 9.00, 5.00, 5.00, 8.00, 9.00, 6.00)

test <- data.frame(contvara, contvarb, factor(indepfactor), factor(indepfactorb))

# Create MANOVA Model + Analysis #

results <- manova(cbind(contvara, contvarb) ~ factor(indepfactor) * factor(indepfactorb) , data=test)

# View Model Results #

summary(results)

# Generate Tukey's HSD for "ContVarA" #

tuk1 <- aov(lm(contvara ~ factor(indepfactor) , data=test))

TukeyHSD(tuk1)

# Generate Tukey's HSD for "ContVarA" #

tuk1 <- aov(lm(contvara ~ factor(indepfactorb) , data=test))

TukeyHSD(tuk1)

# Generate Tukey's HSD for "ContVarB" #

tuk2 <- aov(lm(contvarb ~ factor(indepfactor) , data=test))

TukeyHSD(tuk2)

# Generate Tukey's HSD for "ContVarB" #

tuk2 <- aov(lm(contvarb ~ factor(indepfactorb) , data=test))

TukeyHSD(tuk2)

Reflections of a Data Scientist

Wednesday, February 14, 2018

(R) Multivariate Analysis of Variance (SPSS)

No comments:

Post a Comment