Friday, April 20, 2018

(R) Multivariate Multiple Regression

Multivariate Multiple Regression, is a functional methodology within the R platform, that allows for the concurrent analysis of differing regression models, within the same functional dialogue. In this way, there is no structural difference as it pertains to the model’s internal synthesis as compared to the multiple regression model. Simply stated, the multivariate multiple regression method is a more convenient process for comparing the inner synthesis of differing multiple regression model.

This explanation will become more perspicuous proceeding the following example.

Example:

We are given the following data vectors:

x <- c(8, 1, 4, 10, 8, 10, 3, 1, 1, 2)

y <- c(97, 56, 97, 68, 94, 66, 81, 76, 86, 69)

z <- c(188, 184, 181, 194, 190, 180, 192, 184, 192, 191)

w <- c(366, 383, 332, 331, 311, 352, 356, 351, 399, 357)

v <- c(6, 10, 6, 13, 19, 12, 11, 17, 18, 12)


With such, we are tasked with creating the following models:

y = z + w + v + (intercept)

x = z + w + v + (intercept)


This can typically be achieved with the code below:

testdataframe <- data.frame(x,y,z,w,v)

model1 <- lm(y ~ z + w + v, data = testdataframe)

model2 <- lm(x ~ z + w + v, data = testdataframe)

summary(model1)

summary(model2)


However, what if we wanted to utilize the multivariate multiple regression functionality, the code would instead resemble:

testmodel <- lm(cbind(x, y) ~ z + w + v, data = testdataframe)

summary(testmodel)


Which produces the output:

Response x :
Call:
lm(formula = x ~ z + w + v, data = testdataframe)

Residuals:
    Min       1Q       Median  3Q      Max
-3.2385 -2.6632 -0.5016 2.3280 5.5252

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.78843 51.01213 0.466 0.657
z              0.07168 0.26709 0.268 0.797
w            -0.08600 0.04891 -1.758 0.129
v             -0.16200 0.29351 -0.552 0.601

Residual standard error: 3.741 on 6 degrees of freedom
Multiple R-squared: 0.3521, Adjusted R-squared: 0.02811
F-statistic: 1.087 on 3 and 6 DF, p-value: 0.4236


Response y :


Call:
lm(formula = y ~ z + w + v, data = testdataframe)

Residuals:

Min          1Q Median 3Q       Max
-17.907 -10.900 0.074 13.085 16.669

Coefficients:
          Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.7431 226.6371 0.286 0.785
z               0.4171 1.1866 0.351 0.737
w             -0.1636 0.2173 -0.753 0.480
v              -0.4937 1.3040 -0.379 0.718

Residual standard error: 16.62 on 6 degrees of freedom
Multiple R-squared: 0.106, Adjusted R-squared: -0.341
F-statistic: 0.2371 on 3 and 6 DF, p-value: 0.8675


How is the functionally preferable to the more verbose alternative? Since the models co-exist within the same data source, running subsequent functions will display concurrent output.

For example:

# Display coefficient values #

coef(testmodel)

# Display residual standard error #

sigma(testmodel)


This concludes our current article. I encourage you to experiment with this function if its utilization suits your experimental necessities. Until next time, Data Heads!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.