The only difference in synthesis of the two models, and this aspect of differentiation is minor at best, is the size of the tails which are inherent in the each model’s creation. The Logit model produces slightly flatter tails. This is the sole unique aspect which separates the two models when utilized within an applied setting.

For this reason, due to the wider adoption of the logistic regression model, I would recommend against utilizing the Probit Regression methodology unless explicitly instructed to do so otherwise.

__Example (SPSS):__

It would seem that I am not alone in my confusion:

http://www-01.ibm.com/support/docview.wss?uid=swg21480469

__Example (R):__I will now briefly explain how to create the model within the R platform.

**# Create data vectors #**

age <- c(55.00, 45.00, 33.00, 22.00, 34.00, 56.00, 78.00, 47.00, 38.00, 68.00, 49.00, 34.00, 28.00, 61.00, 26.00)

obese <- c(1.00, .00, .00, .00, 1.00, 1.00, .00, 1.00, 1.00, .00, 1.00, 1.00, .00, 1.00, .00)

smoking <- c(1.00, .00, .00, 1.00, 1.00, 1.00, .00, .00, 1.00, .00, .00, 1.00, .00, 1.00, 1.00)

cancer <- c(1.00, .00, .00, 1.00, .00, 1.00, .00, .00, 1.00, 1.00, .00, 1.00, 1.00, 1.00, .00)

# Combine data vectors into a single data frame #

cancerdata <- data.frame(cancer, smoking, obese, age)

# Create Probit Model #

probitmodel <- glm(cancer ~ smoking + obese + age, family=binomial(link= "probit"), data=cancerdata)

# Generate Model Summary #

summary(probitmodel)

age <- c(55.00, 45.00, 33.00, 22.00, 34.00, 56.00, 78.00, 47.00, 38.00, 68.00, 49.00, 34.00, 28.00, 61.00, 26.00)

obese <- c(1.00, .00, .00, .00, 1.00, 1.00, .00, 1.00, 1.00, .00, 1.00, 1.00, .00, 1.00, .00)

smoking <- c(1.00, .00, .00, 1.00, 1.00, 1.00, .00, .00, 1.00, .00, .00, 1.00, .00, 1.00, 1.00)

cancer <- c(1.00, .00, .00, 1.00, .00, 1.00, .00, .00, 1.00, 1.00, .00, 1.00, 1.00, 1.00, .00)

# Combine data vectors into a single data frame #

cancerdata <- data.frame(cancer, smoking, obese, age)

# Create Probit Model #

probitmodel <- glm(cancer ~ smoking + obese + age, family=binomial(link= "probit"), data=cancerdata)

# Generate Model Summary #

summary(probitmodel)

This produces the output:

*Call:*

glm(formula = cancer ~ smoking + obese + age, family = binomial(link = "probit"),

data = cancerdata)

Deviance Residuals:

Min 1Q Median 3Q Max

-1.6239 -0.7546 0.5868 0.8184 1.8403

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.40234 1.30773 -1.072 0.284

smoking 1.55682 0.88940 1.750 0.080 .

obese -0.24549 0.82711 -0.297 0.767

age 0.01792 0.02413 0.743 0.458

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 20.728 on 14 degrees of freedom

Residual deviance: 16.795 on 11 degrees of freedom

AIC: 24.795

Number of Fisher Scoring iterations: 4

glm(formula = cancer ~ smoking + obese + age, family = binomial(link = "probit"),

data = cancerdata)

Deviance Residuals:

Min 1Q Median 3Q Max

-1.6239 -0.7546 0.5868 0.8184 1.8403

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.40234 1.30773 -1.072 0.284

smoking 1.55682 0.88940 1.750 0.080 .

obese -0.24549 0.82711 -0.297 0.767

age 0.01792 0.02413 0.743 0.458

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 20.728 on 14 degrees of freedom

Residual deviance: 16.795 on 11 degrees of freedom

AIC: 24.795

Number of Fisher Scoring iterations: 4

Which enables for the creation of the model equation:

**Logit(p) = -1.40234 + (Smoking * 1.55682) + (Obese * -0.23549) + (Age * 0.01792)**

Which can be implemented within the R platform as:

**# Smoking #**

Smoking <- 0

# Obese #

Obese <- 0

# Age #

Age <- 0

p <- -1.40234 + (Smoking * 1.55682) + (Obese * -0.23549) + (Age * 0.01792)

plogis(p)

Smoking <- 0

# Obese #

Obese <- 0

# Age #

Age <- 0

p <- -1.40234 + (Smoking * 1.55682) + (Obese * -0.23549) + (Age * 0.01792)

plogis(p)

**The output of such will provide the probability of the occurrence of the dependent binary variable.**

To check for model fit, we will utilize the Nagelkerke R-Squared statistic.

**# Generate Nagelkerke R Squared #**

**# Download and Enable Package: "BaylorEdPsych" #**

**PseudoR2(probitmodel)**

This series of actions presents the following output:

McFadden Adj.McFadden Cox.Snell Nagelkerke

0.1897432 -0.2927030 0.2306398 0.3079774

McKelvey.Zavoina Effron

0.3228210 0.2436036

Count Adj.Count AIC Corrected.AIC

0.7333333 0.4285714 24.7947590 28.7947590

McFadden Adj.McFadden Cox.Snell Nagelkerke

0.1897432 -0.2927030 0.2306398 0.3079774

McKelvey.Zavoina Effron

0.3228210 0.2436036

Count Adj.Count AIC Corrected.AIC

0.7333333 0.4285714 24.7947590 28.7947590

The only value which we need to concern ourselves with is the Nagelkerke value. As this value is interpreted in way which is similar to that of the typical R-Squared value, at .308, we can determine that this model is not a good fit for the data.

There are additional aspects of interpretation that can be discerned from this model, and these interpretations are similarly applicable as they pertain to the logistic regression model. For more information as to how these methods of interpretation are utilized, please consult the entry titled: Logistic Regression Analysis (Binary Categorical Variables).

## No comments:

## Post a Comment

Note: Only a member of this blog may post a comment.