Reflections of a Data Scientist: (R) Multivariate Multiple Regression

Multivariate Multiple Regression, is a functional methodology within the R platform, that allows for the concurrent analysis of differing regression models, within the same functional dialogue. In this way, there is no structural difference as it pertains to the model’s internal synthesis as compared to the multiple regression model. Simply stated, the multivariate multiple regression method is a more convenient process for comparing the inner synthesis of differing multiple regression model.

This explanation will become more perspicuous proceeding the following example.

Example:

We are given the following data vectors:

x <- c(8, 1, 4, 10, 8, 10, 3, 1, 1, 2)

y <- c(97, 56, 97, 68, 94, 66, 81, 76, 86, 69)

z <- c(188, 184, 181, 194, 190, 180, 192, 184, 192, 191)

w <- c(366, 383, 332, 331, 311, 352, 356, 351, 399, 357)

v <- c(6, 10, 6, 13, 19, 12, 11, 17, 18, 12)

With such, we are tasked with creating the following models:

y = z + w + v + (intercept)

x = z + w + v + (intercept)

This can typically be achieved with the code below:

testdataframe <- data.frame(x,y,z,w,v)

model1 <- lm(y ~ z + w + v, data = testdataframe)

model2 <- lm(x ~ z + w + v, data = testdataframe)

summary(model1)

summary(model2)

However, what if we wanted to utilize the multivariate multiple regression functionality, the code would instead resemble:

testmodel <- lm(cbind(x, y) ~ z + w + v, data = testdataframe)

summary(testmodel)

Which produces the output:

Response x :

Call:
lm(formula = x ~ z + w + v, data = testdataframe)

Residuals:
Min 1Q Median 3Q Max
-3.2385 -2.6632 -0.5016 2.3280 5.5252

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.78843 51.01213 0.466 0.657
z 0.07168 0.26709 0.268 0.797
w -0.08600 0.04891 -1.758 0.129
v -0.16200 0.29351 -0.552 0.601

Residual standard error: 3.741 on 6 degrees of freedom
Multiple R-squared: 0.3521, Adjusted R-squared: 0.02811
F-statistic: 1.087 on 3 and 6 DF, p-value: 0.4236

Response y :

Call:
lm(formula = y ~ z + w + v, data = testdataframe)

Residuals:

Min 1Q Median 3Q Max
-17.907 -10.900 0.074 13.085 16.669

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.7431 226.6371 0.286 0.785
z 0.4171 1.1866 0.351 0.737
w -0.1636 0.2173 -0.753 0.480
v -0.4937 1.3040 -0.379 0.718

Residual standard error: 16.62 on 6 degrees of freedom
Multiple R-squared: 0.106, Adjusted R-squared: -0.341
F-statistic: 0.2371 on 3 and 6 DF, p-value: 0.8675

How is the functionally preferable to the more verbose alternative? Since the models co-exist within the same data source, running subsequent functions will display concurrent output.

For example:

# Display coefficient values #

coef(testmodel)

# Display residual standard error #

sigma(testmodel)

This concludes our current article. I encourage you to experiment with this function if its utilization suits your experimental necessities. Until next time, Data Heads!

Reflections of a Data Scientist

Friday, April 20, 2018

(R) Multivariate Multiple Regression

No comments:

Post a Comment