__Example (SPSS):__In this demonstration, we will assume that you are attempting to predict an individual’s favorite color based on other aspects of their individuality.

We will begin with our data set:

**“Analyze”**, then

**“Regression”**, followed by

**“Multinomial Logic”**.

This sequence of actions should cause the following menu to appear:

**“Color”**as a

**”Dependent”**variable. Once this has been completed, utilize the center arrow button to assign the remaining variables (

**“Gender”**,

**“Smoker”**,

**“Car”**), as

**“Factor(s)”**.

Next, click on the button labeled

**“Reference Category”**, this should populate the following menu:

**“Custom”**option and enter the value of

**“4”**into the box below. After performing these actions, click

**“Continue”**. Once you have returned to the initial menu, click on the button labeled

**“Save”**.

This series of selections should cause the following sub-menu to appear:

Select the box labeled

**“Predicted category”**, then click

**“Continue”**. You will again be returned to the initial menu. From this menu, click

**“OK”**.

This should produce a voluminous output, however we will only concern ourselves with the following output aspects:

**“Nagelkerke”**value, which can be assessed on a scale similar to the traditional r-squared metric. Since this model’s Nagelkerke value is .843, we can assume that the model does function as a decent predictor for our dependent variable.

The above output provides us with the internal aspects of the model’s synthesis. Though this may appear daunting to behold at first, the information that is illustrated in the chart is no different than the output generated as it pertains to a typical linear model.

In the case of our model, we will have three logistical equations:

**Green = (Gender:Female * -35.791) + (Smoker:Yes * 34.774) + (Car:KIA * -17.40) + (Car:Ford * .985) + 17.40**

Yellow = (Gender:Female * -36.664) + (Smoker:Yes * 15.892) + (Car:KIA * -35.632) + (Car:Ford * 1.499) + 16.886

Red = (Gender:Female * -19.199) + (Smoker:Yes * 18.880) + (Car:KIA * -37.252) + (Car:Ford * -19.974) + 18.506

Yellow = (Gender:Female * -36.664) + (Smoker:Yes * 15.892) + (Car:KIA * -35.632) + (Car:Ford * 1.499) + 16.886

Red = (Gender:Female * -19.199) + (Smoker:Yes * 18.880) + (Car:KIA * -37.252) + (Car:Ford * -19.974) + 18.506

As is the case with all log models, we need to transform the output values of each equation to generate the appropriate probabilities.

So, for our first example observation, our equations would resemble:

**Observation 1 | Gender: Female | Smoker: No | Car: Chevy**

Green = (1 * -35.791) + (0 * 34.774) + (0 * -17.40) + (0 * .985) + 17.40

Yellow = (1 * -36.664) + (0 * 15.892) + (0 * -35.632) + (0 * 1.499) + 16.886

Red = (1 * -19.199) + (0 * 18.880) + (0 * -37.252) + (0 * -19.974) + 18.506

Green = (1 * -35.791) + (0 * 34.774) + (0 * -17.40) + (0 * .985) + 17.40

Yellow = (1 * -36.664) + (0 * 15.892) + (0 * -35.632) + (0 * 1.499) + 16.886

Red = (1 * -19.199) + (0 * 18.880) + (0 * -37.252) + (0 * -19.974) + 18.506

Which produces the values of:

Green = -18.391

Yellow = -19.778

Red = -0.693

Green = -18.391

Yellow = -19.778

Red = -0.693

To produce probabilities, we have to transform these values through the utilization of the following R code:

**Green <- -18.391**

Yellow <- -19.778

Red <- -0.693

Yellow <- -19.778

Red <- -0.693

# Green #

exp(Green) / (1 + exp(Green) + exp(Red) + exp(Yellow))

#Red #

exp(Red) / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Yellow #

exp(Yellow) / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Blue (Reference Category) #

1 / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Green #

exp(Green) / (1 + exp(Green) + exp(Red) + exp(Yellow))

#Red #

exp(Red) / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Yellow #

exp(Yellow) / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Blue (Reference Category) #

1 / (1 + exp(Green) + exp(Red) + exp(Yellow))

**Which produces the following outputs:**

*[1] 6.867167e-09*

*[1] 0.333366*

[1] 1.715581e-09

[1] 0.666634

[1] 1.715581e-09

[1] 0.666634

__Interpretation:__Each output value represents the probability of the occurrence of the dependent variable as it is related to the reference category (

**“Blue”**).

**P(Green) = 6.867167e-09**

P(Yellow) = 1.715581e-09

P(Red) = 0.333366

R(Blue) = 0.666634

P(Yellow) = 1.715581e-09

P(Red) = 0.333366

R(Blue) = 0.666634

Therefore, in the case of the first observation of our example data set, we can assume that the reference category,

**“Blue”**, has the highest likelihood of occurrence.

The predicted values, as a result of the

**“Save”**option, have been output into a column within the original data set.

__Example (R):__If we wanted to repeat our analysis through the utilization of the “R” platform, we could do so with the following code:

**# (With the package: "nnet", downloaded and enabled) #**

# Multinomial Logistic Regression #

color <- c("Red", "Blue", "Green", "Blue", "Blue", "Blue", "Green", "Green", "Green", "Yellow")

gender <- c("Female", "Female", "Male", "Female", "Female", "Male", "Male", "Male", "Female", "Male")

smoker <- c("No", "No", "Yes", "No", "No", "No", "No", "No", "Yes", "No")

car <-c("Chevy", "Chevy", "Ford", "Ford", "Chevy", "KIA", "Ford", "KIA", "Ford", "Ford")

color <- as.factor(color)

gender <- as.factor(gender)

smoker <- as.factor(smoker)

car <- as.factor(car)

testset <- data.frame(color, gender, smoker, car)

mlr <- multinom(color ~ gender + smoker + car, data=testset )

summary(mlr)

# Multinomial Logistic Regression #

color <- c("Red", "Blue", "Green", "Blue", "Blue", "Blue", "Green", "Green", "Green", "Yellow")

gender <- c("Female", "Female", "Male", "Female", "Female", "Male", "Male", "Male", "Female", "Male")

smoker <- c("No", "No", "Yes", "No", "No", "No", "No", "No", "Yes", "No")

car <-c("Chevy", "Chevy", "Ford", "Ford", "Chevy", "KIA", "Ford", "KIA", "Ford", "Ford")

color <- as.factor(color)

gender <- as.factor(gender)

smoker <- as.factor(smoker)

car <- as.factor(car)

testset <- data.frame(color, gender, smoker, car)

mlr <- multinom(color ~ gender + smoker + car, data=testset )

summary(mlr)

**This produces the following output:**

*Call:*

multinom(formula = color ~ gender + smoker + car, data = testset)

Coefficients:

(Intercept) genderMale smokerYes carFord carKIA

Green -40.2239699 36.73179 47.085203 21.36387 3.492186

Red -0.6931559 -17.00881 -3.891315 -20.23802 -11.832468

Yellow -41.0510233 37.33637 -10.943821 21.58634 -22.161372

Std. Errors:

(Intercept) genderMale smokerYes carFord carKIA

Green 0.4164966 4.164966e-01 7.125616e-14 3.642157e-01 6.388766e-01

Red 1.2247466 2.899257e-13 1.686282e-23 1.492263e-09 2.899257e-13

Yellow 0.3642157 3.642157e-01 6.870313e-26 3.642157e-01 1.119723e-12

Residual Deviance: 9.364263

AIC: 39.36426

multinom(formula = color ~ gender + smoker + car, data = testset)

Coefficients:

(Intercept) genderMale smokerYes carFord carKIA

Green -40.2239699 36.73179 47.085203 21.36387 3.492186

Red -0.6931559 -17.00881 -3.891315 -20.23802 -11.832468

Yellow -41.0510233 37.33637 -10.943821 21.58634 -22.161372

Std. Errors:

(Intercept) genderMale smokerYes carFord carKIA

Green 0.4164966 4.164966e-01 7.125616e-14 3.642157e-01 6.388766e-01

Red 1.2247466 2.899257e-13 1.686282e-23 1.492263e-09 2.899257e-13

Yellow 0.3642157 3.642157e-01 6.870313e-26 3.642157e-01 1.119723e-12

Residual Deviance: 9.364263

AIC: 39.36426

To test the model results, the code below can be utilized:

# Test Model #

# Gender : Male #

# Test Model #

# Gender : Male #

**a <- 0**

# Smoker : Yes #

# Smoker : Yes #

**b <- 0**

# Car : Ford #

# Car : Ford #

**c <- 0**

# Car : KIA #

# Car : KIA #

**d <- 0**

Green <- -40.2239699 + (a * 36.73179) + (b * 47.085203) + (c * 21.36387) + (d * 3.492186)

Red <- -0.6931559 + (a * -17.00881) + (b * -3.891315) + (c * -20.23802) + (d * -11.832468)

Yellow <- -41.0510233 + (a * 37.33637) + (b * -10.943821) + (c * 21.58634) + (d * -22.161372)

Green <- -40.2239699 + (a * 36.73179) + (b * 47.085203) + (c * 21.36387) + (d * 3.492186)

Red <- -0.6931559 + (a * -17.00881) + (b * -3.891315) + (c * -20.23802) + (d * -11.832468)

Yellow <- -41.0510233 + (a * 37.33637) + (b * -10.943821) + (c * 21.58634) + (d * -22.161372)

**# Green #**

exp(Green) / (1 + exp(Green) + exp(Red) + exp(Yellow))

#Red #

exp(Red) / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Yellow #

exp(Yellow) / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Blue (Reference Category) #

exp(Green) / (1 + exp(Green) + exp(Red) + exp(Yellow))

#Red #

exp(Red) / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Yellow #

exp(Yellow) / (1 + exp(Green) + exp(Red) + exp(Yellow))

# Blue (Reference Category) #

**1 / (1 + exp(Green) + exp(Red) + exp(Yellow))**

**NOTE: The model’s internal aspects differ depending on the platform which was utilized to generate the analysis. Though the model predictions do not differ, I would recommend, if publishing findings, to utilize SPSS in lieu of R. The reason for this rational, pertains to the auditing record which SPSS possesses. If data output possesses abnormalities, R, being open source, cannot be held to account. Additionally, as the multinomial function within R exists as an additional aspect of an external package, it could potentially cause platform computational errors to have a greater likelihood of occurrence.**