In this example, we have three variables, each variable is comprised of prior observational data.

**x <- c(27, 34, 22, 30, 17, 32, 25, 34, 46, 37)**

y <- c(70, 80, 73, 77, 60, 93, 85, 72, 90, 85)

z <- c(13, 22, 18, 30, 15, 17, 20, 11, 20, 25)

y <- c(70, 80, 73, 77, 60, 93, 85, 72, 90, 85)

z <- c(13, 22, 18, 30, 15, 17, 20, 11, 20, 25)

To integrate these variable sets into a model, we will use the following code:

**multiregress <- (lm(y ~ x + z))**

This code creates a new set ('multiregress'), which contains the regression model data. In this model, 'y' is the dependent variable, with 'x' and 'z' both represented as dependent variables.

We will need to run the following summary function to receive output information pertinent to the model:

**summary(multiregress)**

The output produced within the console window is as follows:

*Call:*

*lm(formula = y ~ x + z)*

*Residuals:*

*Min 1Q Median 3Q Max*

*-6.4016 -5.0054 -1.7536 0.8713 14.0886*

*Coefficients:*

*Estimate Std. Error t value Pr(>|t|)*

*(Intercept) 47.1434 12.0381 3.916 0.00578 ***

*x 0.7808 0.3316 2.355 0.05073 .*

*z 0.3990 0.4804 0.831 0.43363*

*---*

*Residual standard error: 7.896 on 7 degrees of freedom*

*Multiple R-squared: 0.5249, Adjusted R-squared: 0.3891*

*F-statistic: 3.866 on 2 and 7 DF, p-value: 0.07394*

In this particular model scenario, the model that we would use to determine the value of 'y' is:

**y = 0.7808x + 0.3990z + 47.1434**

However, in investigating the results of the summary output, we observe that:

**Multiple R-squared = 0.5249**

Which

*can*be a large enough coefficient, depending on what type of data we are observing...but the following values should raise some alarm:

**p-value: 0.07394 (> Alpha of .05)**

AND

**F-statistic: 3.866 on 2 and 7 DF**

Code:

**qf(.95, df1=2, df2=7) #Alpha .05#**

*[1] 4.737414*

4.737414 > 3.866

If these concepts seem foreign, please refer to the previous article.

From the summary data, we can conclude that this model is too inaccurate to be properly accepted and utilized.

Therefore, I would recommend re-creating this model with new independent variables.

When creating multiple linear regression models, it is important to consider the values of the f-statistic and the coefficient of determination (multiple r-squared). If variables are being added, or exchanged for different variables within an existing regression model, ideally, the f-statistic and the coefficient of determination should rise in value. This increase indicates that the model is increasing in accuracy. A decline in either of these values would indicate otherwise.

Moving forward, new articles will cover less complicated fundamental aspects of statistics. If you understand this article, and all prior articles, the following topics of discussion should be mastered with relative ease. Stay tuned for more content, Data Heads!

## No comments:

## Post a Comment

Note: Only a member of this blog may post a comment.