Wednesday, May 9, 2018

(R) Probit Regression (SPSS)

Providing the same function of the Logistic Regression Model, and structured in a similar manner, The Probit Regression model provides an alternative to The Logistic Regression Model if the practitioner wishes to pursue a differing methodology.

The only difference in synthesis of the two models, and this aspect of differentiation is minor at best, is the size of the tails which are inherent in the each model’s creation. The Logit model produces slightly flatter tails. This is the sole unique aspect which separates the two models when utilized within an applied setting.

For this reason, due to the wider adoption of the logistic regression model, I would recommend against utilizing the Probit Regression methodology unless explicitly instructed to do so otherwise.

Example (SPSS):


This is where the probit methodology would be demonstrated within the SPSS platform. However, I cannot seem to discern from the interface, or the internet, as to how this would be achieved.

It would seem that I am not alone in my confusion:

http://www-01.ibm.com/support/docview.wss?uid=swg21480469

Example (R):

I will now briefly explain how to create the model within the R platform.

# Create data vectors #

age <- c(55.00, 45.00, 33.00, 22.00, 34.00, 56.00, 78.00, 47.00, 38.00, 68.00, 49.00, 34.00, 28.00, 61.00, 26.00)

obese <- c(1.00, .00, .00, .00, 1.00, 1.00, .00, 1.00, 1.00, .00, 1.00, 1.00, .00, 1.00, .00)

smoking <- c(1.00, .00, .00, 1.00, 1.00, 1.00, .00, .00, 1.00, .00, .00, 1.00, .00, 1.00, 1.00)

cancer <- c(1.00, .00, .00, 1.00, .00, 1.00, .00, .00, 1.00, 1.00, .00, 1.00, 1.00, 1.00, .00)

# Combine data vectors into a single data frame #

cancerdata <- data.frame(cancer, smoking, obese, age)

# Create Probit Model #

probitmodel <- glm(cancer ~ smoking + obese + age, family=binomial(link= "probit"), data=cancerdata)

# Generate Model Summary #

summary(probitmodel)


This produces the output:

Call:

glm(formula = cancer ~ smoking + obese + age, family = binomial(link = "probit"),
data = cancerdata)

Deviance Residuals:
Min       1Q         Median 3Q       Max
-1.6239 -0.7546 0.5868 0.8184 1.8403

Coefficients:

            Estimate        Std. Error z value Pr(>|z|)
(Intercept) -1.40234 1.30773 -1.072 0.284
smoking     1.55682 0.88940 1.750 0.080 .
obese         -0.24549 0.82711 -0.297 0.767
age              0.01792 0.02413 0.743 0.458
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 20.728 on 14 degrees of freedom
Residual deviance: 16.795 on 11 degrees of freedom
AIC: 24.795

Number of Fisher Scoring iterations: 4


Which enables for the creation of the model equation:

Logit(p) = -1.40234 + (Smoking * 1.55682) + (Obese * -0.23549) + (Age * 0.01792)

Which can be implemented within the R platform as:

# Smoking #

Smoking <- 0

# Obese #

Obese <- 0

# Age #

Age <- 0

p <- -1.40234 + (Smoking * 1.55682) + (Obese * -0.23549) + (Age * 0.01792)

plogis(p)


The output of such will provide the probability of the occurrence of the dependent binary variable.

To check for model fit, we will utilize the Nagelkerke R-Squared statistic.

# Generate Nagelkerke R Squared #

# Download and Enable Package: "BaylorEdPsych" #

PseudoR2(probitmodel)

This series of actions presents the following output:

McFadden         Adj.McFadden          Cox.Snell        Nagelkerke
0.1897432            -0.2927030             0.2306398        0.3079774

McKelvey.Zavoina           Effron
0.3228210                     0.2436036

Count           Adj.Count          AIC                Corrected.AIC
0.7333333      0.4285714      24.7947590         28.7947590


The only value which we need to concern ourselves with is the Nagelkerke value. As this value is interpreted in way which is similar to that of the typical R-Squared value, at .308, we can determine that this model is not a good fit for the data.

There are additional aspects of interpretation that can be discerned from this model, and these interpretations are similarly applicable as they pertain to the logistic regression model. For more information as to how these methods of interpretation are utilized, please consult the entry titled: Logistic Regression Analysis (Binary Categorical Variables).

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.