Week 12 Logistic Mixed-Effects Regression

This week, we will learn more about mixed-effects regressions and we will extend what we learned last week by fitting a logistic mixed-effects model.

Preparation and session set up

For this week’s content, we need to install certain packages from an R library so that the scripts shown below are executed without errors. hence, before turning to the code below, please install the packages by running the code below this paragraph - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).

# install
install.packages("Boruta")
install.packages("car")
install.packages("emmeans")
install.packages("effects")
install.packages("flextable")
install.packages("ggplot2")
install.packages("ggpubr")
install.packages("Hmisc")
install.packages("knitr")
install.packages("lme4")
install.packages("MASS")
install.packages("mclogit")
install.packages("MuMIn")
install.packages("nlme")
install.packages("ordinal")
install.packages("rms")
install.packages("robustbase")
install.packages("sjPlot")
install.packages("stringr")
install.packages("tibble")
install.packages("dplyr")
install.packages("vcd")
install.packages("vip")
# install klippy for copy-to-clipboard button in code chunks
install.packages("remotes")
remotes::install_github("rlesur/klippy")

Now that we have installed the packages, we activate them as shown below.

# set options
options(stringsAsFactors = F)          # no automatic data transformation
options("scipen" = 100, "digits" = 12) # suppress math annotation
# load packages
#library(Boruta)
library(car)
library(effects)
library(emmeans)
library(flextable)
library(ggfortify)
library(ggplot2)
library(ggpubr)
library(Hmisc)
library(knitr)
library(lme4)
library(MASS)
#library(mclogit)
library(MuMIn)
library(nlme)
#library(ordinal)
library(rms)
#library(robustbase)
library(sjPlot)
library(stringr)
library(tibble)
library(vcd)
library(vip)
# activate klippy for copy-to-clipboard button
klippy::klippy()

12.1 Introduction

We now turn to an extension of binomial logistic regression: mixed-effects binomial logistic regression. As is the case with linear mixed-effects models logistic mixed effects models have the following advantages over simpler statistical tests:

Mixed-effects models are multivariate, i.e. they test the effect of several predictors simultaneously while controlling for the effect of all other predictors.
Mixed models allow to statistically incorporate within-speaker variability and are thus fit to model hierarchical or nested data structures. This applies if several observations are produced by an individual speaker, for instance.
Mixed-models provide a wealth of diagnostic statistics which enables us to control e.g. multicollinearity, i.e. correlations between predictors, and to test whether conditions or requirements are violated (e.g. homogeneity of variance, etc.).

Major disadvantages of regression modeling are that they are prone to producing high \(\beta\)-errors (see Johnson 2009) and that they require rather large data sets.

12.2 Basic concepts

As is the case with linear mixed-effects models, binomial logistic mixed-effect models are multivariate analyses that treat data points as hierarchical or grouped in some way. In other words, they take into account that the data is nested in the sense that data points are produced by the same speaker or are grouped by some other characteristics. In mixed-models, hierarchical structures are modelled as random effects. If the random effect structure represents speakers then this means that a mixed-model would have a separate intercept and/or slope for each speaker.

Random Effects in linear models can be visualized using two parameters: the intercept (the point where the regression line crosses the y-axis at x = 0) and the slope (the acclivity of the regression line). In contrast to linear mixed-effects models, random effects differ in the position and the slope of the logistic function that is applied to the likelihood of the dependent variable. random intercepts (center left panel \(\ref{fig:mem02}\)) or various random slopes (center right panel \(\ref{fig:mem02}\)), or both, various random intercepts and various random slopes (right panel \(\ref{fig:mem02}\)). In the following, we will only focus on models with random intercepts because this is the by far more common method and because including both random intercepts and random slopes requires huge amounts of data. Consider the Figure below to understand what is meant by “random intercepts”.

The upper left panel merely shows the logistic curve representing the predictions of a fixed-effects logistic regression with a single intercept and slope. The upper right panel shows the logistic curves representing the predictions of a of a mixed-effects logistic regression with random intercepts for each level of a grouping variable. The lower left panel shows the logistic curves representing the predictions of a mixed-effects logistic regression with one intercept but random slopes for each level of a grouping variable. The lower right panel shows the logistic curves representing the predictions of a mixed-effects logistic regression with random intercepts and random slopes for each level of a grouping variable.

After adding random intercepts, predictors (or fixed effects) are added to the model (just like with multiple regression). So mixed-effects are called mixed-effects because they contain both random and fixed effects.

In terms of general procedure, random effects are added first, and only after we have ascertained that including random effects is warranted, we test whether including fixed-effects is warranted (Field, Miles, and Field 2012). We test whether including random effects is warranted by comparing a model, that bases its estimates of the dependent variable solely on the base intercept, with a model that bases its estimates of the dependent variable solely on the intercepts of the random effect. If the mixed-effects model explains significantly more variance than the fixed-effects model without random effect structure, then we continue with the mixed-effects model. In other words, including random effects is justified.

12.3 Example: Discourse LIKE in Irish English

In this example we will investigate which factors correlate with the use of final discourse like (e.g. “The weather is shite, like!”) in Irish English. The data set represents speech units in a corpus that were coded for the speaker who uttered a given speech unit, the gender (Gender: Men versus Women) and age of that speaker (Age: Old versus Young), whether the interlocutors were of the same or a different gender (ConversationType: SameGender versus MixedGender), and whether another final discourse like had been used up to three speech units before (Priming: NoPrime versus Prime), whether or not the speech unit contained an final discourse like (SUFLike: 1 = yes, 0 = no. To begin with, we load the data and inspect the structure of the data set,

# load data
mblrdata  <- base::readRDS(url("https://slcladal.github.io/data/mbd.rda", "rb"))

Table 12.1: First 15 rows of the mblrdata.
ID	Gender	Age	ConversationType	Priming	SUFlike
S1A-061$C	Women	Young	MixedGender	NoPrime	0
S1A-023$B	Women	Young	MixedGender	NoPrime	0
S1A-054$A	Women	Young	SameGender	NoPrime	0
S1A-090$B	Women	Young	MixedGender	NoPrime	0
S1A-009$B	Women	Old	SameGender	Prime	0
S1A-085$E	Men	Young	MixedGender	Prime	1
S1A-003$C	Women	Young	MixedGender	NoPrime	1
S1A-084$C	Women	Young	SameGender	NoPrime	0
S1A-076$A	Women	Young	SameGender	NoPrime	0
S1A-083$D	Men	Old	MixedGender	NoPrime	1
S1A-068$A	Women	Young	SameGender	NoPrime	0
S1A-066$B	Women	Young	SameGender	NoPrime	0
S1A-061$A	Men	Old	MixedGender	NoPrime	1
S1A-049$A	Women	Young	SameGender	NoPrime	0
S1A-022$B	Women	Young	MixedGender	NoPrime	0

As all variables except for the dependent variable (SUFlike) are character strings, we factorize the independent variables.

# def. variables to be factorized
vrs <- c("ID", "Age", "Gender", "ConversationType", "Priming")
# def. vector with variables
fctr <- which(colnames(mblrdata) %in% vrs)     
# factorize variables
mblrdata[,fctr] <- lapply(mblrdata[,fctr], factor)
# relevel Age (Young = Reference)
mblrdata$Age <- relevel(mblrdata$Age, "Young")
# order data by ID
mblrdata <- mblrdata %>%
  dplyr::arrange(ID)

Table 12.2: First 15 rows of the mblrdata arranged by ID.
ID	Gender	Age	ConversationType	Priming	SUFlike
S1A-001$A	Men	Old	SameGender	NoPrime	0
S1A-001$A	Men	Old	SameGender	NoPrime	0
S1A-001$A	Men	Old	SameGender	NoPrime	0
S1A-001$A	Men	Old	SameGender	NoPrime	0
S1A-001$A	Men	Old	SameGender	NoPrime	0
S1A-001$A	Men	Old	SameGender	NoPrime	0
S1A-001$A	Men	Old	SameGender	NoPrime	0
S1A-001$A	Men	Old	SameGender	NoPrime	0
S1A-001$B	Women	Old	MixedGender	NoPrime	0
S1A-001$B	Women	Old	MixedGender	NoPrime	0
S1A-001$B	Women	Old	MixedGender	NoPrime	0
S1A-001$B	Women	Old	MixedGender	Prime	0
S1A-001$B	Women	Old	MixedGender	NoPrime	1
S1A-001$B	Women	Old	MixedGender	NoPrime	0
S1A-001$B	Women	Old	MixedGender	NoPrime	0

Before continuing, a few words about the minimum number of random effect levels and the minimum number of observations per random effect level are in order.

While many data points per random variable level increases statistical power and thus to more robust estimates of the random effects (Austin and Leckie 2018), it has been shown that small numbers of observations per random effect variable level do not cause serious bias and it does not negatively affect the estimates of the fixed-effects coefficients (Bell, Ferron, and Kromrey 2008; Clarke 2008; Clarke and Wheaton 2007; Maas and Hox 2005). The minimum number of observations per random effect variable level is therefore 1.

In simulation study, (Bell, Ferron, and Kromrey 2008) tested the impact of random variable levels with only a single observation ranging from 0 to 70 percent. As long as there was a relatively high number of random effect variable levels (500 or more), small numbers of observations had almost no impact on bias and Type 1 error control.

We now plot the data to inspect the relationships within the data set.

ggplot(mblrdata, aes(Gender, SUFlike, color = Priming)) +
  facet_wrap(Age~ConversationType) +
  stat_summary(fun = mean, geom = "point") +
  stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) +
  theme_set(theme_bw(base_size = 10)) +
  theme(legend.position = "top") +
  labs(x = "", y = "Observed Probabilty of discourse like") +
  scale_color_manual(values = c("gray20", "gray70"))

The upper left panel in the Figure above indicates that men use discourse like more frequently than women. The center right panel suggests that priming significantly increases the likelihood of discourse like being used. The center left panel suggests that speakers use discourse like more frequently in mixed-gender conversations. However, the lower right panel indicates an interaction between gender and conversation type as women appear to use discourse like less frequently in same gender conversations while the conversation type does not seem to have an effect on men. After visualizing the data, we will now turn to the model building process.

12.4 Model Building

In a first step, we set the options.

# set options
options(contrasts  =c("contr.treatment", "contr.poly"))
mblrdata.dist <- datadist(mblrdata)
options(datadist = "mblrdata.dist")

In a next step, we generate a base-line mixed-model using the glmer function with a random intercept for ID.

# base-line mixed-model
m0.glmer = glmer(SUFlike ~ (1|ID), data = mblrdata, family = binomial)

12.5 Model Fitting

The next step is to fit the model which means that we aim to find the “best” model, i.e. the minimal adequate model. In this case, we will use a manual step-wise step-up, forward elimination procedure. Before we begin with the model fitting process we need to add ´control = glmerControl(optimizer = “bobyqa”)´ to avoid unnecessary failures to converge.

m0.glmer <- glmer(SUFlike ~ 1+ (1|ID), family = binomial, data = mblrdata, control=glmerControl(optimizer="bobyqa"))

During each step of the fitting procedure, we test whether certain assumptions on which the model relies are violated. To avoid incomplete information (a combination of variables does not occur in the data), we tabulate the variables we intend to include and make sure that all possible combinations are present in the data. Including variables although not all combinations are present in the data would lead to unreliable models that report (vastly) inaccurate results. A special case of incomplete information is complete separation which occurs if one predictor perfectly explains an outcome (in that case the incomplete information would be caused by a level of the dependent variable). In addition, we make sure that the VIFs do not exceed a maximum of 3 for main effects (Zuur, Ieno, and Elphick 2010) - Booth GD (1994) suggest that VIFs should ideally be lower than 3 for as higher values would indicate multicollinearity and thus that the model is unstable. The value of 3 should be taken with a pinch of salt because there is no clear consensus about what the maximum VIF for interactions should be or if it should be considered at all. The reason is that we would, of course, expect the VIFs to increase when we are dealing with interactions as the main effects that are part of the interaction are very likely to correlate with the interaction itself. However, if the VIFs are too high, then this will still cause the issues with the attribution of variance. The value of 3 was chosen based on recommendations in the standard literature on multicollinearity (Zuur et al. 2009; Neter, Wasserman, and Kutner 1990). Only once we have confirmed that the incomplete information, complete separation, and multicollinearity are not a major concern, we generate the more saturated model and test whether the inclusion of a predictor leads to a significant reduction in residual deviance. If the predictor explains a significant amount of variance, it is retained in the model while being disregarded in case it does not explain a sufficient quantity of variance.

# add Priming
ifelse(min(ftable(mblrdata$Priming, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m1.glmer <- update(m0.glmer, .~.+Priming)
anova(m1.glmer, m0.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m0.glmer: SUFlike ~ 1 + (1 | ID)
## m1.glmer: SUFlike ~ (1 | ID) + Priming
##          npar         AIC         BIC       logLik    deviance     Chisq Df
## m0.glmer    2 1828.492271 1839.694076 -912.2461355 1824.492271             
## m1.glmer    3 1702.773341 1719.576048 -848.3866704 1696.773341 127.71893  1
##                      Pr(>Chisq)    
## m0.glmer                           
## m1.glmer < 0.000000000000000222 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the tests do not show problems relating to incomplete information, because including Priming significantly improves the model fit (decrease in AIC and BIC values), and since it correlates significantly with our dependent variable, we include Priming into our model.

# add Age
ifelse(min(ftable(mblrdata$Age, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m2.glmer <- update(m1.glmer, .~.+ Age)
ifelse(max(car::vif(m2.glmer)) <= 3,  "VIFs okay", "VIFs unacceptable")

## [1] "VIFs okay"

anova(m2.glmer, m1.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m1.glmer: SUFlike ~ (1 | ID) + Priming
## m2.glmer: SUFlike ~ (1 | ID) + Priming + Age
##          npar         AIC         BIC       logLik    deviance   Chisq Df
## m1.glmer    3 1702.773341 1719.576048 -848.3866704 1696.773341           
## m2.glmer    4 1704.210790 1726.614400 -848.1053950 1696.210790 0.56255  1
##          Pr(>Chisq)
## m1.glmer           
## m2.glmer    0.45323

Anova(m2.glmer, test = "Chi")

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: SUFlike
##             Chisq Df           Pr(>Chisq)    
## Priming 129.51634  1 < 0.0000000000000002 ***
## Age       0.56943  1              0.45049    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVAs show that Age is not significant and the first ANOVA also shows that the BIC has increased which indicates that Age does not decrease variance. In such cases, the variable should not be included.

However, if the second ANOVA would report Age as being marginally significant, a case could be made for including it but it would be better to change the ordering in which predictors are added to the model. This is, however, just a theoretical issue here as Age is clearly not significant.

# add Gender
ifelse(min(ftable(mblrdata$Gender, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m3.glmer <- update(m1.glmer, .~.+Gender)
ifelse(max(car::vif(m3.glmer)) <= 3,  "VIFs okay", "VIFs unacceptable")

## [1] "VIFs okay"

anova(m3.glmer, m1.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m1.glmer: SUFlike ~ (1 | ID) + Priming
## m3.glmer: SUFlike ~ (1 | ID) + Priming + Gender
##          npar         AIC         BIC       logLik    deviance    Chisq Df
## m1.glmer    3 1702.773341 1719.576048 -848.3866704 1696.773341            
## m3.glmer    4 1679.397070 1701.800680 -835.6985349 1671.397070 25.37627  1
##             Pr(>Chisq)    
## m1.glmer                  
## m3.glmer 0.00000047168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(m3.glmer, test = "Chi")

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: SUFlike
##             Chisq Df             Pr(>Chisq)    
## Priming 124.40790  1 < 0.000000000000000222 ***
## Gender   28.56776  1         0.000000090476 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Gender is significant and will therefore be included as a predictor (you can also observe that including Gender has substantially decreased both AIC and BIC).

# add ConversationType
ifelse(min(ftable(mblrdata$ConversationType, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m4.glmer <- update(m3.glmer, .~.+ConversationType)
ifelse(max(car::vif(m4.glmer)) <= 3,  "VIFs okay", "VIFs unacceptable")

## [1] "VIFs okay"

anova(m4.glmer, m3.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m3.glmer: SUFlike ~ (1 | ID) + Priming + Gender
## m4.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType
##          npar         AIC         BIC       logLik    deviance    Chisq Df
## m3.glmer    4 1679.397070 1701.800680 -835.6985349 1671.397070            
## m4.glmer    5 1668.583222 1696.587734 -829.2916108 1658.583222 12.81385  1
##          Pr(>Chisq)    
## m3.glmer               
## m4.glmer 0.00034406 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(m4.glmer, test = "Chi")

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: SUFlike
##                      Chisq Df             Pr(>Chisq)    
## Priming          130.69147  1 < 0.000000000000000222 ***
## Gender            13.44553  1             0.00024559 ***
## ConversationType  12.99301  1             0.00031266 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ConversationType improves model fit (AIC and BIC decrease and it is reported as being significant) and will, therefore, be included in the model.

# add Priming*Age
ifelse(min(ftable(mblrdata$Priming, mblrdata$Age, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m5.glmer <- update(m4.glmer, .~.+Priming*Age)
ifelse(max(car::vif(m5.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "VIFs okay"

anova(m5.glmer, m4.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m4.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType
## m5.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Age + Priming:Age
##          npar         AIC         BIC       logLik    deviance   Chisq Df
## m4.glmer    5 1668.583222 1696.587734 -829.2916108 1658.583222           
## m5.glmer    7 1671.599008 1710.805326 -828.7995042 1657.599008 0.98421  2
##          Pr(>Chisq)
## m4.glmer           
## m5.glmer    0.61134

The interaction between Priming and Age is not significant and we thus not be included.

# add Priming*Gender
ifelse(min(ftable(mblrdata$Priming, mblrdata$Gender, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m6.glmer <- update(m4.glmer, .~.+Priming*Gender)
ifelse(max(car::vif(m6.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

We get the warning that the VIFs are high (>= 3) which means that the model suffers from (multi-)collinearity. We thus check the VIFs to determine how to proceed. If the VIFs are > 10, then we definitely cannot use the model as the multicollinearity is excessive.

car::vif(m6.glmer)

##          Priming           Gender ConversationType   Priming:Gender 
##    4.35680826881    1.43320666469    1.20288687415    4.58319516483

The VIFs are below 5 which is not good (VIFs of 5 mean “that column in the model matrix is explainable from the others with an R² of 0.8” (Gries 2021)) but it is still arguably acceptable and we will thus check if including the interaction between Priming and Gender significantly improved model fit.

anova(m6.glmer, m4.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m4.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType
## m6.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Priming:Gender
##          npar         AIC         BIC       logLik    deviance   Chisq Df
## m4.glmer    5 1668.583222 1696.587734 -829.2916108 1658.583222           
## m6.glmer    6 1663.164481 1696.769896 -825.5822404 1651.164481 7.41874  1
##          Pr(>Chisq)   
## m4.glmer              
## m6.glmer  0.0064548 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(m6.glmer, test = "Chi")

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: SUFlike
##                      Chisq Df             Pr(>Chisq)    
## Priming          131.96076  1 < 0.000000000000000222 ***
## Gender            13.57620  1             0.00022907 ***
## ConversationType  10.70693  1             0.00106735 ** 
## Priming:Gender     7.45276  1             0.00633391 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The interaction between Priming and Gender improved model fit (AIC and BIC reduction) and significantly correlates with the use of speech-unit final like. It will therefore be included in the model.

# add Priming*ConversationType
ifelse(min(ftable(mblrdata$Priming, mblrdata$ConversationType, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m7.glmer <- update(m6.glmer, .~.+Priming*ConversationType)
ifelse(max(car::vif(m7.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

When including the interaction between Priming and ConversationType we get a warning that there are high VIFs (multicollinearity) so we inspect the VIFs in more detail.

# check VIFs
car::vif(m7.glmer)

##                  Priming                   Gender         ConversationType 
##            5.10734918012            1.49292942539            1.49160879613 
##           Priming:Gender Priming:ConversationType 
##            4.93046662346            3.41094672459

The VIF of Priming is above 5 so we would normally continue without checking if including the interaction between Priming and ConversationType leads to a significant improvement in model fit. However, given that this is just a practical example, we check if including this interaction significantly improves model fit.

anova(m7.glmer, m6.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m6.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Priming:Gender
## m7.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Priming:Gender + Priming:ConversationType
##          npar         AIC         BIC       logLik    deviance   Chisq Df
## m6.glmer    6 1663.164481 1696.769896 -825.5822404 1651.164481           
## m7.glmer    7 1663.134968 1702.341285 -824.5674840 1649.134968 2.02951  1
##          Pr(>Chisq)
## m6.glmer           
## m7.glmer    0.15427

The interaction between Priming and ConversationType does not significantly correlate with the use of speech-unit final like and it does not explain much variance (AIC and BIC increase). It will be not be included in the model.

# add Age*Gender
ifelse(min(ftable(mblrdata$Age, mblrdata$Gender, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m8.glmer <- update(m6.glmer, .~.+Age*Gender)
ifelse(max(car::vif(m8.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

When including the interaction between Age and Gender we get a warning that there are high VIFs (multicollinearity) so we inspect the VIFs in more detail.

# check VIFs
car::vif(m8.glmer)

##          Priming           Gender ConversationType              Age 
##    4.46289267895    2.14177558400    1.20634523936    3.25168362545 
##   Priming:Gender       Gender:Age 
##    4.66920572936    3.21352954473

The VIFs are all below 5 so we test if including the interaction between Gender and Age significantly improves model fit.

anova(m8.glmer, m6.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m6.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Priming:Gender
## m8.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Age + Priming:Gender + Gender:Age
##          npar         AIC         BIC       logLik    deviance   Chisq Df
## m6.glmer    6 1663.164481 1696.769896 -825.5822404 1651.164481           
## m8.glmer    8 1665.634538 1710.441758 -824.8172690 1649.634538 1.52994  2
##          Pr(>Chisq)
## m6.glmer           
## m8.glmer    0.46535

The interaction between Age and Gender is not significant and will thus continue without it.

# add Age*ConversationType
ifelse(min(ftable(mblrdata$Age, mblrdata$ConversationType, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m9.glmer <- update(m6.glmer, .~.+Age*ConversationType)
ifelse(max(car::vif(m9.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

When including the interaction between Age and ConversationType we get a warning that there are high VIFs (multicollinearity) so we inspect the VIFs in more detail.

# check VIFs
car::vif(m9.glmer)

##              Priming               Gender     ConversationType 
##        4.37809105527        1.50666016648        1.51472301608 
##                  Age       Priming:Gender ConversationType:Age 
##        1.96943944520        4.61338605422        2.05900983613

The VIFs are all below 5 so we test if including the interaction between ConversationType and Age significantly improves model fit.

anova(m9.glmer, m6.glmer, test = "Chi")

## Data: mblrdata
## Models:
## m6.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Priming:Gender
## m9.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Age + Priming:Gender + ConversationType:Age
##          npar         AIC         BIC       logLik    deviance   Chisq Df
## m6.glmer    6 1663.164481 1696.769896 -825.5822404 1651.164481           
## m9.glmer    8 1666.262166 1711.069386 -825.1310831 1650.262166 0.90231  2
##          Pr(>Chisq)
## m6.glmer           
## m9.glmer    0.63689

The interaction between Age and ConversationType is insignificant and does not improve model fit (AIC and BIC reduction). It will therefore not be included in the model.

# add Gender*ConversationType
ifelse(min(ftable(mblrdata$Gender, mblrdata$ConversationType, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m10.glmer <- update(m6.glmer, .~.+Gender*ConversationType)
ifelse(max(car::vif(m10.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

When including the interaction between Gender and ConversationType we get a warning that there are high VIFs (multicollinearity) so we inspect the VIFs in more detail.

# check VIFs
car::vif(m10.glmer)

##                 Priming                  Gender        ConversationType 
##           4.96110835338           1.73312563585           7.76376927699 
##          Priming:Gender Gender:ConversationType 
##           5.11781767473           9.24774173655

The highest VIF is almost 10 (9.247741736554) which is why the interaction between Gender and ConversationType will not be included in the model.

# add Priming*Age*Gender
ifelse(min(ftable(mblrdata$Priming,mblrdata$Age, mblrdata$Gender, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m11.glmer <- update(m6.glmer, .~.+Priming*Age*Gender)
ifelse(max(car::vif(m11.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

When including the interaction between Priming, Age, and Gender we get a warning that there are high VIFs (multicollinearity) so we inspect the VIFs in more detail.

# check VIFs
car::vif(m11.glmer)

##            Priming             Gender   ConversationType                Age 
##      6.57423672195      2.28498823866      1.22324933101      3.66031482934 
##     Priming:Gender        Priming:Age         Gender:Age Priming:Gender:Age 
##      6.66072311731      5.85260474142      3.75877625969      5.98039277219

There are several VIFs with values greater than 5 and we will thus continue without including the interaction between Priming, Age, and Gender into the model.

# add Priming*Age*ConversationType
ifelse(min(ftable(mblrdata$Priming,mblrdata$Age, mblrdata$ConversationType, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m12.glmer <- update(m6.glmer, .~.+Priming*Age*ConversationType)
ifelse(max(car::vif(m12.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

When including the interaction between Priming, Age, and Gender we get a warning that there are high VIFs (multicollinearity) so we inspect the VIFs in more detail.

# check VIFs
car::vif(m12.glmer)

##                      Priming                       Gender 
##                7.14846915051                1.61071467209 
##             ConversationType                          Age 
##                1.83168387186                2.20174547810 
##               Priming:Gender                  Priming:Age 
##                4.99371024125                3.33294901081 
##     Priming:ConversationType         ConversationType:Age 
##                4.87474703042                2.38301363954 
## Priming:ConversationType:Age 
##                2.99921275539

The VIF of Priming is very high (7.148469150514) which is why we will thus continue without including the interaction between Priming, Age, and Gender in the model.

# add Priming*Gender*ConversationType
ifelse(min(ftable(mblrdata$Priming,mblrdata$Gender, mblrdata$ConversationType, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m13.glmer <- update(m6.glmer, .~.+Priming*Gender*ConversationType)
ifelse(max(car::vif(m13.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

The VIFs are excessive with a maximum value is 23.881840741192 which shows an unacceptable degree of multicollinearity so that we abort and move on the next model.

car::vif(m13.glmer)

##                         Priming                          Gender 
##                   7.69739102251                   1.81221735436 
##                ConversationType                  Priming:Gender 
##                  21.94386151465                  10.01727959017 
##        Priming:ConversationType         Gender:ConversationType 
##                  21.63339176644                  23.88184074119 
## Priming:Gender:ConversationType 
##                  22.68146890142

The VIFs are excessive with a maximum value is 23.881840741192 which shows an unacceptable degree of multicollinearity so that we abort and move on the next model.

# add Age*Gender*ConversationType
ifelse(min(ftable(mblrdata$Age,mblrdata$Gender, mblrdata$ConversationType, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "okay"

m14.glmer <- update(m6.glmer, .~.+Age*Gender*ConversationType)
ifelse(max(car::vif(m14.glmer)) <= 3,  "VIFs okay", "WARNING: high VIFs!")

## [1] "WARNING: high VIFs!"

When including the interaction between Age, Gender, ConversationType, we get a warning that there are high VIFs (multicollinearity) so we inspect the VIFs in more detail.

car::vif(m14.glmer)

##                     Priming                      Gender 
##               5.33146116728               2.60066483117 
##            ConversationType                         Age 
##              11.94300742736               3.80060133104 
##              Priming:Gender                  Gender:Age 
##               5.45327304064               5.66147481864 
##        ConversationType:Age     Gender:ConversationType 
##              16.98422075229              13.80766661408 
## Gender:ConversationType:Age 
##              19.07445301779

Again, the VIFs are excessive with a maximum value of 19.074453017793 which shows an unacceptable degree of multicollinearity so that we abort and move on the next model.

# add Priming*Age*Gender*ConversationType
ifelse(min(ftable(mblrdata$Priming,mblrdata$Age,mblrdata$Gender, mblrdata$ConversationType, mblrdata$SUFlike)) == 0, "incomplete information", "okay")

## [1] "incomplete information"

The model suffers from incomplete information! As this was the last possible model, we have found our final minimal adequate model in m6.glmer.

In a next step, we create an overview of model comparisons which serves as a summary for the model fitting process and provides AIC, BIC, and \(\chi\)² values.

source("https://slcladal.github.io/rscripts/ModelFittingSummarySWSU.r") 
# comparisons of glmer objects
m1.m0 <- anova(m1.glmer, m0.glmer, test = "Chi") 
m2.m1 <- anova(m2.glmer, m1.glmer, test = "Chi")   
m3.m1 <- anova(m3.glmer, m1.glmer, test = "Chi")
m4.m3 <- anova(m4.glmer, m3.glmer, test = "Chi") 
m5.m4 <- anova(m5.glmer, m4.glmer, test = "Chi") 
m6.m4 <- anova(m6.glmer, m4.glmer, test = "Chi") 
m7.m6 <- anova(m7.glmer, m6.glmer, test = "Chi")
m8.m6 <- anova(m8.glmer, m6.glmer, test = "Chi") 
m9.m6 <- anova(m9.glmer, m6.glmer, test = "Chi") 
# create a list of the model comparisons
mdlcmp <- list(m1.m0, m2.m1, m3.m1, m4.m3, m5.m4, m6.m4, m7.m6, m8.m6, m9.m6)
# summary table for model fitting
mdlft <- mdl.fttng.swsu(mdlcmp)
mdlft <- mdlft[,-2]

Table 12.3: First 15 rows of the model fitting summary table.
Model	Term Added	Compared to...	DF	AIC	BIC	LogLikelihood	Residual Deviance	X2	X2DF	p-value	Significance
m1.glmer	1+Priming	m0.glmer	3	1702.77	1719.58	-848.39	1696.77	127.72	1	0	p < .001***
m2.glmer	Age	m1.glmer	4	1704.21	1726.61	-848.11	1696.21	0.56	1	0.45323	n.s.
m3.glmer	Gender	m1.glmer	4	1679.4	1701.8	-835.7	1671.4	25.38	1	0	p < .001***
m4.glmer	ConversationType	m3.glmer	5	1668.58	1696.59	-829.29	1658.58	12.81	1	0.00034	p < .001***
m5.glmer	Age+Priming:Age	m4.glmer	7	1671.6	1710.81	-828.8	1657.6	0.98	2	0.61134	n.s.
m6.glmer	Priming:Gender	m4.glmer	6	1663.16	1696.77	-825.58	1651.16	7.42	1	0.00645	p < .01 **
m7.glmer	Priming:ConversationType	m6.glmer	7	1663.13	1702.34	-824.57	1649.13	2.03	1	0.15427	n.s.
m8.glmer	Age+Gender:Age	m6.glmer	8	1665.63	1710.44	-824.82	1649.63	1.53	2	0.46535	n.s.
m9.glmer	Age+ConversationType:Age	m6.glmer	8	1666.26	1711.07	-825.13	1650.26	0.9	2	0.63689	n.s.

We now rename our final minimal adequate model, test whether it performs significantly better than the minimal base-line model, and print the regression summary.

# rename final minimal adequate model
mlr.glmer <- m6.glmer 
# final model better than base-line model
sigfit <- anova(mlr.glmer, m0.glmer, test = "Chi") 
# inspect
sigfit

## Data: mblrdata
## Models:
## m0.glmer: SUFlike ~ 1 + (1 | ID)
## mlr.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Priming:Gender
##           npar         AIC         BIC       logLik    deviance     Chisq Df
## m0.glmer     2 1828.492271 1839.694076 -912.2461355 1824.492271             
## mlr.glmer    6 1663.164481 1696.769896 -825.5822404 1651.164481 173.32779  4
##                       Pr(>Chisq)    
## m0.glmer                            
## mlr.glmer < 0.000000000000000222 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# inspect final minimal adequate model
print(mlr.glmer, corr = F)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Priming:Gender
##    Data: mblrdata
##       AIC       BIC    logLik  deviance  df.resid 
## 1663.1645 1696.7699 -825.5822 1651.1645      1994 
## Random effects:
##  Groups Name        Std.Dev.   
##  ID     (Intercept) 0.292489497
## Number of obs: 2000, groups:  ID, 208
## Fixed Effects:
##                (Intercept)                PrimingPrime  
##               -0.920849821                 1.060142104  
##                GenderWomen  ConversationTypeSameGender  
##               -0.867734454                -0.492309882  
##   PrimingPrime:GenderWomen  
##                1.035738267

To extract the effect sizes of the significant fixed effects, we compare the model with that effect to a model without that effect. This can be problematic when checking the effect of main effects that are involved in significant interactions though (Field, Miles, and Field 2012, 622).

# effect of ConversationType
ef_conv <- anova(m4.glmer, m3.glmer, test = "Chi") 
# inspect
ef_conv

## Data: mblrdata
## Models:
## m3.glmer: SUFlike ~ (1 | ID) + Priming + Gender
## m4.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType
##          npar         AIC         BIC       logLik    deviance    Chisq Df
## m3.glmer    4 1679.397070 1701.800680 -835.6985349 1671.397070            
## m4.glmer    5 1668.583222 1696.587734 -829.2916108 1658.583222 12.81385  1
##          Pr(>Chisq)    
## m3.glmer               
## m4.glmer 0.00034406 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# effect of Priming:Gender
ef_prigen <- anova(m6.glmer, m4.glmer, test = "Chi")
# inspect
ef_prigen

## Data: mblrdata
## Models:
## m4.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType
## m6.glmer: SUFlike ~ (1 | ID) + Priming + Gender + ConversationType + Priming:Gender
##          npar         AIC         BIC       logLik    deviance   Chisq Df
## m4.glmer    5 1668.583222 1696.587734 -829.2916108 1658.583222           
## m6.glmer    6 1663.164481 1696.769896 -825.5822404 1651.164481 7.41874  1
##          Pr(>Chisq)   
## m4.glmer              
## m6.glmer  0.0064548 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

12.6 Visualizing Effects

As we will see the effects in the final summary, we visualize the effects here using the sjPlot package (Lüdecke 2021).

plot_model(m6.glmer, type = "pred", terms = c("Priming", "Gender"))

We can see that discourse like is more likely to surface in primed contexts but that in contrast to women and men in same-gender conversations as well as women in mixed-gender conversations, priming appears to affect the use of discourse like by men in mixed-gender conversations only very little.

12.7 Extracting Model Fit Parameters

We now extract model fit parameters (Baayen 2008, 281).

probs = 1/(1+exp(-fitted(mlr.glmer)))
probs = binomial()$linkinv(fitted(mlr.glmer))
somers2(probs, as.numeric(mblrdata$SUFlike))

##                 C               Dxy                 n           Missing 
##    0.758332104539    0.516664209078 2000.000000000000    0.000000000000

The two lines that start with probs are simply two different ways to do the same thing (you only need one of these).

The model fit parameters indicate a suboptimal fit. Both the C-value and Somers’s D_xy show poor fit between predicted and observed occurrences of discourse like. If the C-value is 0.5, the predictions are random, while the predictions are perfect if the C-value is 1. C-values above 0.8 indicate real predictive capacity (Baayen 2008, 204). Somers’ D_xy is a value that represents a rank correlation between predicted probabilities and observed responses. Somers’ D_xy values range between 0, which indicates complete randomness, and 1, which indicates perfect prediction (Baayen 2008, 204). The C.value of 0.758332104539 suggests that the model has some predictive and explanatory power, but not at an optimal level. We will now perform the model diagnostics.

12.8 Model Diagnostics

We begin the model diagnostics by generating a diagnostic that plots the fitted or predicted values against the residuals.

plot(mlr.glmer, pch = 20, col = "black", lty = "dotted")

As a final step, we summarize our findings in tabulated form.

# summarize final model
sjPlot::tab_model(mlr.glmer)

	SUFlike
Predictors	Odds Ratios	CI	p
(Intercept)	0.40	0.29 – 0.54	<0.001
Priming [Prime]	2.89	1.50 – 5.56	0.002
Gender [Women]	0.42	0.29 – 0.61	<0.001
ConversationType [SameGender]	0.61	0.46 – 0.82	0.001
Priming [Prime] * Gender [Women]	2.82	1.34 – 5.93	0.006
Random Effects
σ²	3.29
τ₀₀ _ID	0.09
ICC	0.03
N _ID	208
Observations	2000
Marginal R² / Conditional R²	0.138 / 0.160

We can use the reports package (Makowski et al. 2021) to summarize the analysis.

report::report(mlr.glmer)

## We fitted a logistic mixed model (estimated using ML and BOBYQA optimizer) to predict SUFlike with Priming, Gender and ConversationType (formula: SUFlike ~ Priming + Gender + ConversationType + Priming:Gender). The model included ID as random effect (formula: ~1 | ID). The model's total explanatory power is moderate (conditional R2 = 0.16) and the part related to the fixed effects alone (marginal R2) is of 0.14. The model's intercept, corresponding to Priming = NoPrime, Gender = Men and ConversationType = MixedGender, is at -0.92 (95% CI [-1.22, -0.62], p < .001). Within this model:
## 
##   - The effect of Priming [Prime] is statistically significant and positive (beta = 1.06, 95% CI [0.41, 1.71], p = 0.002; Std. beta = 1.06, 95% CI [0.41, 1.71])
##   - The effect of Gender [Women] is statistically significant and negative (beta = -0.87, 95% CI [-1.25, -0.49], p < .001; Std. beta = -0.87, 95% CI [-1.25, -0.49])
##   - The effect of ConversationType [SameGender] is statistically significant and negative (beta = -0.49, 95% CI [-0.79, -0.20], p = 0.001; Std. beta = -0.49, 95% CI [-0.79, -0.20])
##   - The interaction effect of Gender [Women] on Priming [Prime] is statistically significant and positive (beta = 1.04, 95% CI [0.29, 1.78], p = 0.006; Std. beta = 1.04, 95% CI [0.29, 1.78])
## 
## Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using the Wald approximation.

We can use this output to write up a final report:

A mixed-effect binomial logistic regression model with random intercepts for speakers was fit to the data in a step-wise-step up procedure. The final minimal adequate model performed significantly better than an intercept-only base line model (\(\chi\)²(4): 173.32779031323, p = 0) and a good but not optimal fit (C: 0.758332104539, Somers’ D_xy: 0.516664209078). The final minimal adequate model reported that speakers use more discourse like in mixed-gender conversations compared to same-gender conversations (\(\chi\)²(1): `r 12.813848197053, p = 0.00034) and that there is an interaction between priming and gender with men using more discourse like in un-primed contexts while this gender difference is not present in primed contexts where speakers more more likely to use discourse like regardless of gender (\(\chi\)²(1): 7.418740943305, p = 0.00645).

References

Austin, Peter C., and George Leckie. 2018. “The Effect of Number of Clusters and Cluster Size on Statistical Power and Type i Error Rates When Testing Random Effects Variance Components in Multilevel Linear and Logistic Regression Models.” Journal of Statistical Computation and Simulation 88: 3151–63.

Baayen, R Harald. 2008. Analyzing Linguistic Data. A Practical Introduction to Statistics Using r. Cambridge: Cambridge University press.

Bell, Bethany A., John M. Ferron, and Jeffrey D. Kromrey. 2008. “Cluster Size in Multilevel Models: The Impact of Sparse Data Structures on Point and Interval Estimates in Two-Level Models.” In JSM Proceedings. Section on Survey Research Methods, 1122–29. American Statistical Association Alexandria, VA.

Booth GD, Schuster EG, Niccolucci MJ. 1994. “Identifying Proxy Sets in Multiple Linear Regression: An Aid to Better Coefficient Interpretation. Research Paper INT-470.”

Clarke, Philippa. 2008. “When Can Group Level Clustering Be Ignored? Multilevel Models Versus Single-Level Models with Sparse Data.” Journal of Epidemiology & Community Health 62 (8): 752–58.

Clarke, Philippa, and Blair Wheaton. 2007. “Addressing Data Sparseness in Contextual Population Research: Using Cluster Analysis to Create Synthetic Neighborhoods.” Sociological Methods & Research 35 (3): 311–51.

Field, Andy, Jeremy Miles, and Zoe Field. 2012. Discovering Statistics Using r. Sage.

Gries, Stefan Th. 2021. Statistics for Linguistics Using r: A Practical Introduction. Berlin & New York: Mouton de Gruyter.

Johnson, Daniel Ezra. 2009. “Getting Off the GoldVarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule Analysis.” Language and Linguistics Compass 3 (1): 359–83.

Lüdecke, Daniel. 2021. sjPlot: Data Visualization for Statistics in Social Science. https://CRAN.R-project.org/package=sjPlot.

Maas, Cora JM, and Joop J Hox. 2005. “Sufficient Sample Sizes for Multilevel Modeling.” Methodology 1 (3): 86–92.

Makowski, Dominique, Mattan S. Ben-Shachar, Indrajeet Patil, and Daniel Lüdecke. 2021. “Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption.” CRAN. https://github.com/easystats/report.

Neter, J., W. Wasserman, and MH Kutner. 1990. Applied Linear Statistical Models. Regression, Analysis of Variance, and Experimental Design. Irwin, Homewood, USA.

Zuur, Alain F., Elena N. Ieno, and Chris S. Elphick. 2010. “A Protocol for Data Exploration to Avoid Common Statistical Problems.” Methods in Ecology and Evolution 1 (1): 3–14.

Zuur, Alain F., Elena N. Ieno, Neil J. Walker, Anatoly A. Saveliev, and Graham M Smith. 2009. Mixed Effects Models and Extensions in Ecology with r. Springer.