## Wednesday, May 9, 2018

### (R) Probit Regression (SPSS)

Providing the same function of the Logistic Regression Model, and structured in a similar manner, The Probit Regression model provides an alternative to The Logistic Regression Model if the practitioner wishes to pursue a differing methodology.

The only difference in synthesis of the two models, and this aspect of differentiation is minor at best, is the size of the tails which are inherent in the each model’s creation. The Logit model produces slightly flatter tails. This is the sole unique aspect which separates the two models when utilized within an applied setting.

For this reason, due to the wider adoption of the logistic regression model, I would recommend against utilizing the Probit Regression methodology unless explicitly instructed to do so otherwise.

Example (SPSS):

This is where the probit methodology would be demonstrated within the SPSS platform. However, I cannot seem to discern from the interface, or the internet, as to how this would be achieved.

It would seem that I am not alone in my confusion:

http://www-01.ibm.com/support/docview.wss?uid=swg21480469

Example (R):

I will now briefly explain how to create the model within the R platform.

# Create data vectors #

age <- c(55.00, 45.00, 33.00, 22.00, 34.00, 56.00, 78.00, 47.00, 38.00, 68.00, 49.00, 34.00, 28.00, 61.00, 26.00)

obese <- c(1.00, .00, .00, .00, 1.00, 1.00, .00, 1.00, 1.00, .00, 1.00, 1.00, .00, 1.00, .00)

smoking <- c(1.00, .00, .00, 1.00, 1.00, 1.00, .00, .00, 1.00, .00, .00, 1.00, .00, 1.00, 1.00)

cancer <- c(1.00, .00, .00, 1.00, .00, 1.00, .00, .00, 1.00, 1.00, .00, 1.00, 1.00, 1.00, .00)

# Combine data vectors into a single data frame #

cancerdata <- data.frame(cancer, smoking, obese, age)

# Create Probit Model #

probitmodel <- glm(cancer ~ smoking + obese + age, family=binomial(link= "probit"), data=cancerdata)

# Generate Model Summary #

summary(probitmodel)

This produces the output:

Call:

glm(formula = cancer ~ smoking + obese + age, family = binomial(link = "probit"),
data = cancerdata)

Deviance Residuals:
Min       1Q         Median 3Q       Max
-1.6239 -0.7546 0.5868 0.8184 1.8403

Coefficients:

Estimate        Std. Error z value Pr(>|z|)
(Intercept) -1.40234 1.30773 -1.072 0.284
smoking     1.55682 0.88940 1.750 0.080 .
obese         -0.24549 0.82711 -0.297 0.767
age              0.01792 0.02413 0.743 0.458
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 20.728 on 14 degrees of freedom
Residual deviance: 16.795 on 11 degrees of freedom
AIC: 24.795

Number of Fisher Scoring iterations: 4

Which enables for the creation of the model equation:

Logit(p) = -1.40234 + (Smoking * 1.55682) + (Obese * -0.23549) + (Age * 0.01792)

Which can be implemented within the R platform as:

# Smoking #

Smoking <- 0

# Obese #

Obese <- 0

# Age #

Age <- 0

p <- -1.40234 + (Smoking * 1.55682) + (Obese * -0.23549) + (Age * 0.01792)

plogis(p)

The output of such will provide the probability of the occurrence of the dependent binary variable.

To check for model fit, we will utilize the Nagelkerke R-Squared statistic.

# Generate Nagelkerke R Squared #

PseudoR2(probitmodel)

This series of actions presents the following output:

0.1897432            -0.2927030             0.2306398        0.3079774

McKelvey.Zavoina           Effron
0.3228210                     0.2436036