A General Linear Mixed Model is a methodology of modeling utilized to create linear models which account for categorical variables. This is achieved through the synthesis of a model which contains dynamic intercept values as an aspect of its design. The categorical variables which are included as elements of the model design can be singular or numerous. However, if multiple variable of this nature are included, they will be assessed in categorical combinations.

We will be utilizing the following data vectors:

These vectors can be combined to create a single data frame:

If we were to create a simple linear model, the necessary code could be applied:

Which produces the output:

Now, we will produce a general linear mixed model which accounts for the variable “w”.

(This variable, the variable which is accounted for, but is not as an independent variable within the model, is referred to as a “random variable”.)

This produces the output:

__Example:__We will be utilizing the following data vectors:

**w <- c(366, 383, 332, 331, 311, 352, 356, 351, 399, 357)**

v <- c(6, 10, 6, 13, 19, 12, 11, 17, 18, 12)

x <- c(8, 1, 4, 10, 8, 10, 3, 1, 1, 2)

y <- c(97, 56, 97, 68, 94, 66, 81, 76, 86, 69)

z <- c(188, 184, 181, 194, 190, 180, 192, 184, 192, 191)v <- c(6, 10, 6, 13, 19, 12, 11, 17, 18, 12)

x <- c(8, 1, 4, 10, 8, 10, 3, 1, 1, 2)

y <- c(97, 56, 97, 68, 94, 66, 81, 76, 86, 69)

z <- c(188, 184, 181, 194, 190, 180, 192, 184, 192, 191)

These vectors can be combined to create a single data frame:

**model <- data.frame(v, w, x, y, z)**If we were to create a simple linear model, the necessary code could be applied:

**lmmodeltest <- lm(x ~ y + z, data=model)**

summary(lmmodeltest)summary(lmmodeltest)

Which produces the output:

Call:

lm(formula = x ~ y + z, data = model)

Residuals:

Min 1Q Median 3Q Max

-4.027 -2.882 -1.643 2.668 5.623

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.587008 53.572275 0.048 0.963

y 0.034052 0.099425 0.342 0.742

z -0.002543 0.285784 -0.009 0.993

Residual standard error: 4.267 on 7 degrees of freedom

Multiple R-squared: 0.01653, Adjusted R-squared: -0.2645

F-statistic: 0.05883 on 2 and 7 DF, p-value: 0.9433Call:

lm(formula = x ~ y + z, data = model)

Residuals:

Min 1Q Median 3Q Max

-4.027 -2.882 -1.643 2.668 5.623

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.587008 53.572275 0.048 0.963

y 0.034052 0.099425 0.342 0.742

z -0.002543 0.285784 -0.009 0.993

Residual standard error: 4.267 on 7 degrees of freedom

Multiple R-squared: 0.01653, Adjusted R-squared: -0.2645

F-statistic: 0.05883 on 2 and 7 DF, p-value: 0.9433

Now, we will produce a general linear mixed model which accounts for the variable “w”.

(This variable, the variable which is accounted for, but is not as an independent variable within the model, is referred to as a “random variable”.)

**# You will need to download and enable the package: “nlme” #**

mmodeltest1 <- lme(x ~ y + z, data=model, random=~1|w)

summary(mmodeltest1)mmodeltest1 <- lme(x ~ y + z, data=model, random=~1|w)

summary(mmodeltest1)

This produces the output:

Linear mixed-effects model fit by REML

Data: model

AIC BIC logLik

65.41286 65.14241 -27.70643

Random effects:

Formula: ~1 | wLinear mixed-effects model fit by REML

Data: model

AIC BIC logLik

65.41286 65.14241 -27.70643

Random effects:

Formula: ~1 | w

*(Intercept) Residual*

*StdDev: 3.995423 1.498283*

Fixed effects: x ~ y + z

Value Std.Error DF t-value p-value

(Intercept) 2.5870077 53.57227 7 0.0482900 0.9628

y 0.0340519 0.09942 7 0.3424887 0.7420

z -0.0025432 0.28578 7 -0.0088991 0.9931

Correlation:

(Intr) y

y -0.066

z -0.989 -0.081

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-0.3313798 -0.2371629 -0.1352219 0.2195812 0.4627224

Number of Observations: 10

Number of Groups: 10

Fixed effects: x ~ y + z

Value Std.Error DF t-value p-value

(Intercept) 2.5870077 53.57227 7 0.0482900 0.9628

y 0.0340519 0.09942 7 0.3424887 0.7420

z -0.0025432 0.28578 7 -0.0088991 0.9931

Correlation:

(Intr) y

y -0.066

z -0.989 -0.081

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-0.3313798 -0.2371629 -0.1352219 0.2195812 0.4627224

Number of Observations: 10

Number of Groups: 10

Generally speaking, the models look identical. What differs between the two models are the coefficient values. It is these values that determine the value of the intercept.

For example, the values for the intercept of the original linear model are:

**coef(lmmodeltest)**

Output:

(Intercept) y z

2.587007686 0.034051914 -0.002543224

However, the values for the general linear mixed model are:

**coef(mmodeltest1)**

Output:

*(Intercept) y z*

*311 4.95003237 0.03405191 -0.002543224*

*331 7.48857278 0.03405191 -0.002543224*

*332 1.33355478 0.03405191 -0.002543224*

*351 -0.66296480 0.03405191 -0.002543224*

*352 7.51706478 0.03405191 -0.002543224*

*356 0.95902860 0.03405191 -0.002543224*

*357 0.43833139 0.03405191 -0.002543224*

*366 4.85601183 0.03405191 -0.002543224*

*383 -0.06589015 0.03405191 -0.002543224*

*399 -0.94366472 0.03405191 -0.002543224*

As you can observe, the intercept coefficient values, which are sorted categorically, differ. However, the coefficient values of the dependent variables remain static. This phenomenon allows for dynamic model creation.

For example, if we were to test the simple linear model with prior data set variables, the outcome for the first observation would be:

**yvalue <- 97**

zvalue <- 188

interceptvalue <- 2.5870077

xvalue <- (yvalue * 0.034052) + (zvalue * -0.002543) + interceptvalue

xvalue

zvalue <- 188

interceptvalue <- 2.5870077

xvalue <- (yvalue * 0.034052) + (zvalue * -0.002543) + interceptvalue

xvalue

Which would produce the output:

*[1] 5.41192*

The actual value of x in the initial set was:

8

If we were to test the generalized mixed linear model with prior data set variables, the outcome for the first observation would be:

yvalue <- 97

zvalue <- 188

interceptvalue <- 4.85601183

xvalue <- (yvalue * .0340519) + (zvalue * -0.0025432) + interceptvalue

xvalue

yvalue <- 97

zvalue <- 188

interceptvalue <- 4.85601183

xvalue <- (yvalue * .0340519) + (zvalue * -0.0025432) + interceptvalue

xvalue

Which would produce the output:

*[1] 7.680925*

Which is closer to the actual value of x:

8

To produce a general linear mixed model which accounts for both the variables “w” and “v”, the following code can be utilized:

**mmodeltest2 <- lme(x ~ y + z, data=model, random=~1|w/v)**

summary(mmodeltest2)

summary(mmodeltest2)

This code produces the output:

*Linear mixed-effects model fit by REML*

Data: model

AIC BIC logLik

Data: model

AIC BIC logLik

*67.41286 67.08832 -27.70643*

Random effects:

Formula: ~1 | w

(Intercept)

StdDev: 2.916513

Formula: ~1 | v %in% w

(Intercept) Residual

StdDev: 2.916513 1.093692

Fixed effects: x ~ y + z

Value Std.Error DF t-value p-value

(Intercept) 2.5870077 53.57227 7 0.0482900 0.9628

y 0.0340519 0.09942 7 0.3424887 0.7420

z -0.0025432 0.28578 7 -0.0088991 0.9931

Correlation:

(Intr) y

y -0.066

z -0.989 -0.081

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-0.24189513 -0.17312025 -0.09870703 0.16028629 0.33777045

Number of Observations: 10

Number of Groups:

w v %in% w

10 10

Random effects:

Formula: ~1 | w

(Intercept)

StdDev: 2.916513

Formula: ~1 | v %in% w

(Intercept) Residual

StdDev: 2.916513 1.093692

Fixed effects: x ~ y + z

Value Std.Error DF t-value p-value

(Intercept) 2.5870077 53.57227 7 0.0482900 0.9628

y 0.0340519 0.09942 7 0.3424887 0.7420

z -0.0025432 0.28578 7 -0.0088991 0.9931

Correlation:

(Intr) y

y -0.066

z -0.989 -0.081

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-0.24189513 -0.17312025 -0.09870703 0.16028629 0.33777045

Number of Observations: 10

Number of Groups:

w v %in% w

10 10

With the model output produced, we must again generate coefficient values. This can be achieved through the utilization of the following code:

**coef(mmodeltest2)**

Which produces the output:

*(Intercept) y z*

311/19 5.1052676 0.03405191 -0.002543224

331/13 7.8105734 0.03405191 -0.002543224

332/6 1.2512112 0.03405191 -0.002543224

351/17 -0.8764666 0.03405191 -0.002543224

352/12 7.8409371 0.03405191 -0.002543224

356/11 0.8520811 0.03405191 -0.002543224

357/12 0.2971775 0.03405191 -0.002543224

366/6 5.0050705 0.03405191 -0.002543224

383/10 -0.2401681 0.03405191 -0.002543224

399/18 -1.1756067 0.03405191 -0.002543224

311/19 5.1052676 0.03405191 -0.002543224

331/13 7.8105734 0.03405191 -0.002543224

332/6 1.2512112 0.03405191 -0.002543224

351/17 -0.8764666 0.03405191 -0.002543224

352/12 7.8409371 0.03405191 -0.002543224

356/11 0.8520811 0.03405191 -0.002543224

357/12 0.2971775 0.03405191 -0.002543224

366/6 5.0050705 0.03405191 -0.002543224

383/10 -0.2401681 0.03405191 -0.002543224

399/18 -1.1756067 0.03405191 -0.002543224

Each combination of categorical variables, along with their coinciding intercept values, had been output in the above data stream. If the model were to be utilized to assess new observational values generated from the same system of observation, cross tabulated categorical values, would determine the value of the utilized intercept.

The type of categorical data which would satisfy the “random” element of model synthesis, typically would refer to variables such as: gender, race, ethnicity, zipcode, etc.

That’s all for now, Data Heads! Thanks for visiting! I appreciate your patronage.

## No comments:

## Post a Comment

Note: Only a member of this blog may post a comment.