# Simulating Non-Linear Interactions

Simulating 2-level data for APA 2021 Supplemental, so others can follow along.

Christopher Loan true
2021-08-03
show code
``````library(tidyverse)
library(glmertree)
``````

This post is meant for two audiences:

1. anyone who wants to follow along with my APA 2021 Supplemental Material
2. someone looking to simulate complex interaction with multilevel modeling.

# Disclaimer

My code is available for use and alteration, but please give credit via citations. Anyways, let’s get into it! If you’re looking for more detail regarding the simulated data or would like to simulate your own data with a simple linear interaction, see this prior post which looks more deeply at distribtions of simulated data.

# Goal

Simulate data to use in my APA 2021 Supplemental Material Post.

# Simulating data

By modifying the interaction in the previous data simulation function with an `if_else()` statement, we can make a more complex interaction term that would not be easily detected by traditional multilevel models (MLMs). I am coding the interaction’s influence as conditional on it’s value, more specifically the interaction’s effect is: `if_else(z[[i]]*x[[i]] < 0, interaction1*z[[i]]*x[[i]], interaction2*z[[i]]*x[[i]])`. This is one type of circumstance where MLMs may struggle without substantial theoretical understanding to guide exploration of data.

So that is a simple simulation with a complex interaction term. We have 15 level 1 units for each level 2 unit and 71 level 2 units (labeled with `id`). The level 1 intercept of `outcome` is 4.25 with a main effect of 1.25 for `x`, 2.00 for `z`. The function also makes a nuisance variable `w`, which does not contribute to `outcome` at all.

show code
``````nonlinear_dat <-
simulate_nonlinear_intx(
n = 15, ## number level 1 units at each level 2
j = 71, ## number level 2 units
intercept_lv1 = 4.25, ## intercept at level 1
main_x = 1.25, ## main effect of x
main_z = 2.00, ## main effect of z
interaction1 = -4, ## interaction of x and z when product < 0
interaction2 = 4, ## interaction of x and z when product >= 0
residual_var_sd_lv1 = 2.00, ## standard deviation of residuals at level 1
random_int_mean_lv2 = 0, ## mean of random intercept at level 2
random_int_sd_lv2 = 1.00, ## standard deviation of random intercept at level 2,
start_seed = 123 ## ensure you can reproduce
)
``````

# Plotting the Simulated Data

show code
``````figure_1a <-
nonlinear_dat %>%
ggplot(aes(x = x, y = outcome)) +
geom_point() +
geom_smooth(method = 'lm') +
theme_minimal() +
project_theme +
labs(
title = 'Figure 1. Association of Outcome & Each Covariate (non-linear intx)',
subtitle = '1a. Association of Outcome & x'
) +
theme(plot.subtitle = element_text(size = 15))

figure_1b <-
nonlinear_dat %>%
ggplot(aes(x = z, y = outcome)) +
geom_point() +
geom_smooth(method = 'lm') +
theme_minimal() +
project_theme +
labs(subtitle = '1b. Association of Outcome & z') +
theme(plot.subtitle = element_text(size = 15))

figure_1c <-
nonlinear_dat %>%
ggplot(aes(x = x*z, y = outcome)) +
geom_point() +
geom_smooth(method = 'lm', aes(group = x*z < 0)) +
theme_minimal() +
project_theme +
labs(subtitle = '1c. Association of Outcome & Interaction') +
theme(plot.subtitle = element_text(size = 15))

ggpubr::ggarrange(figure_1a, figure_1b, figure_1c, ncol = 1)
`````` # Fitting multilevel model to the nonlinear interaction

When we fit an MLM to the data, it is unable to find a significant effect. You see this from the small t-values in the MLM results.

show code
``````misspecified_lmer <-
lmer(
data = nonlinear_dat,
formula = outcome ~ x * z + w +
(1 | id)
)
summary(misspecified_lmer)
``````
``````Linear mixed model fit by REML ['lmerMod']
Formula: outcome ~ x * z + w + (1 | id)
Data: nonlinear_dat

REML criterion at convergence: 5730

Scaled residuals:
Min      1Q  Median      3Q     Max
-2.7929 -0.6616 -0.1518  0.4770  4.5718

Random effects:
Groups   Name        Variance Std.Dev.
id       (Intercept)  0.7794  0.8828
Residual             12.0680  3.4739
Number of obs: 1065, groups:  id, 71

Fixed effects:
Estimate Std. Error t value
(Intercept)   6.7859     0.1498  45.312
x             1.3789     0.1153  11.961
z             2.0853     0.1093  19.087
w             0.1155     0.1093   1.057
x:z           0.1555     0.1160   1.341

Correlation of Fixed Effects:
(Intr) x      z      w
x   -0.069
z    0.016 -0.011
w    0.016 -0.023 -0.031
x:z -0.006 -0.049 -0.140 -0.004``````

If you had a highly experienced theorist or a whiz analyst, they perhaps could figure out how to correctly specify an MLM, but it would not be quick. This is a three-way interaction, and one where any two-way interaction would either eat away statistical power (or even result in a type I error). that would be specified with this formula: `outcome ~ x + z + x * z * dummy_code + w + (1 | id)`, which estimates 4 unnecessary effects. In a smaller sample, potential issues with this are amplified

Below is the correctly specified model, which technically provides statistical test that the slope of each dummy coded group, formed with the condition `x * z < 0`, is different from 0. This would be a very difficult model to simply happen upon.

show code
``````nonlinear_dat <-
nonlinear_dat %>%
mutate(
dummy_code =
factor(if_else(x*z < 0, 0, 1))
)

confirmation_lmer2 <-
lmer(
data = nonlinear_dat,
formula =
outcome ~ x + z + x:z:dummy_code + w +
(1 | id)
)
summary(confirmation_lmer2)
``````
``````Linear mixed model fit by REML ['lmerMod']
Formula: outcome ~ x + z + x:z:dummy_code + w + (1 | id)
Data: nonlinear_dat

REML criterion at convergence: 4727.3

Scaled residuals:
Min      1Q  Median      3Q     Max
-3.5431 -0.6796 -0.0225  0.6703  2.9410

Random effects:
Groups   Name        Variance Std.Dev.
id       (Intercept) 0.7424   0.8616
Residual             4.5122   2.1242
Number of obs: 1065, groups:  id, 71

Fixed effects:
Estimate Std. Error t value
(Intercept)      4.48729    0.13363  33.579
x                1.11054    0.07152  15.528
z                1.96038    0.06733  29.117
w                0.07117    0.06740   1.056
x:z:dummy_code0 -3.75376    0.11871 -31.621
x:z:dummy_code1  3.87560    0.11535  33.598

Correlation of Fixed Effects:
(Intr) x      z      w      x:z:_0
x           -0.011
z            0.029 -0.010
w            0.016 -0.023 -0.027
x:z:dmmy_c0  0.331  0.041 -0.049  0.008
x:z:dmmy_c1 -0.331 -0.100 -0.122 -0.011 -0.253``````

# Exploring the Data with the `{glmertree}` package

Let’s see how GLMM trees act in my next post