Simulating Non-Linear Interactions

Conference Multi-Level Modeling MLM Plotting Simulated Data Tutorial

Simulating 2-level data for APA 2021 Supplemental, so others can follow along.

Christopher Loan true
2021-08-03
show code
library(tidyverse)
library(glmertree)

This post is meant for two audiences:

  1. anyone who wants to follow along with my APA 2021 Supplemental Material
  2. someone looking to simulate complex interaction with multilevel modeling.

Disclaimer

My code is available for use and alteration, but please give credit via citations. Anyways, let’s get into it! If you’re looking for more detail regarding the simulated data or would like to simulate your own data with a simple linear interaction, see this prior post which looks more deeply at distribtions of simulated data.

Goal

Simulate data to use in my APA 2021 Supplemental Material Post.

Function

Simulating data

By modifying the interaction in the previous data simulation function with an if_else() statement, we can make a more complex interaction term that would not be easily detected by traditional multilevel models (MLMs). I am coding the interaction’s influence as conditional on it’s value, more specifically the interaction’s effect is: if_else(z[[i]]*x[[i]] < 0, interaction1*z[[i]]*x[[i]], interaction2*z[[i]]*x[[i]]). This is one type of circumstance where MLMs may struggle without substantial theoretical understanding to guide exploration of data.

So that is a simple simulation with a complex interaction term. We have 15 level 1 units for each level 2 unit and 71 level 2 units (labeled with id). The level 1 intercept of outcome is 4.25 with a main effect of 1.25 for x, 2.00 for z. The function also makes a nuisance variable w, which does not contribute to outcome at all.

show code
nonlinear_dat <-
  simulate_nonlinear_intx(
    n = 15, ## number level 1 units at each level 2
    j = 71, ## number level 2 units
    intercept_lv1 = 4.25, ## intercept at level 1
    main_x = 1.25, ## main effect of x
    main_z = 2.00, ## main effect of z
    interaction1 = -4, ## interaction of x and z when product < 0
    interaction2 = 4, ## interaction of x and z when product >= 0
    residual_var_sd_lv1 = 2.00, ## standard deviation of residuals at level 1
    random_int_mean_lv2 = 0, ## mean of random intercept at level 2
    random_int_sd_lv2 = 1.00, ## standard deviation of random intercept at level 2,
    start_seed = 123 ## ensure you can reproduce 
  )

Plotting the Simulated Data

show code
figure_1a <-   
  nonlinear_dat %>% 
  ggplot(aes(x = x, y = outcome)) + 
  geom_point() +
  geom_smooth(method = 'lm') +
  theme_minimal() +
  project_theme +
  labs(
    title = 'Figure 1. Association of Outcome & Each Covariate (non-linear intx)',
    subtitle = '1a. Association of Outcome & x'
    ) +
  theme(plot.subtitle = element_text(size = 15))

figure_1b <-   
  nonlinear_dat %>%
  ggplot(aes(x = z, y = outcome)) + 
  geom_point() +
  geom_smooth(method = 'lm') +
  theme_minimal() +
  project_theme +
  labs(subtitle = '1b. Association of Outcome & z') +
  theme(plot.subtitle = element_text(size = 15))

figure_1c <- 
  nonlinear_dat %>% 
  ggplot(aes(x = x*z, y = outcome)) + 
  geom_point() +
  geom_smooth(method = 'lm', aes(group = x*z < 0)) +
  theme_minimal() +
  project_theme +
  labs(subtitle = '1c. Association of Outcome & Interaction') +
  theme(plot.subtitle = element_text(size = 15))

ggpubr::ggarrange(figure_1a, figure_1b, figure_1c, ncol = 1)

Fitting multilevel model to the nonlinear interaction

When we fit an MLM to the data, it is unable to find a significant effect. You see this from the small t-values in the MLM results.

show code
misspecified_lmer <- 
  lmer(
    data = nonlinear_dat, 
    formula = outcome ~ x * z + w + 
      (1 | id)
  )
summary(misspecified_lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: outcome ~ x * z + w + (1 | id)
   Data: nonlinear_dat

REML criterion at convergence: 5730

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.7929 -0.6616 -0.1518  0.4770  4.5718 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept)  0.7794  0.8828  
 Residual             12.0680  3.4739  
Number of obs: 1065, groups:  id, 71

Fixed effects:
            Estimate Std. Error t value
(Intercept)   6.7859     0.1498  45.312
x             1.3789     0.1153  11.961
z             2.0853     0.1093  19.087
w             0.1155     0.1093   1.057
x:z           0.1555     0.1160   1.341

Correlation of Fixed Effects:
    (Intr) x      z      w     
x   -0.069                     
z    0.016 -0.011              
w    0.016 -0.023 -0.031       
x:z -0.006 -0.049 -0.140 -0.004

If you had a highly experienced theorist or a whiz analyst, they perhaps could figure out how to correctly specify an MLM, but it would not be quick. This is a three-way interaction, and one where any two-way interaction would either eat away statistical power (or even result in a type I error). that would be specified with this formula: outcome ~ x + z + x * z * dummy_code + w + (1 | id), which estimates 4 unnecessary effects. In a smaller sample, potential issues with this are amplified

Below is the correctly specified model, which technically provides statistical test that the slope of each dummy coded group, formed with the condition x * z < 0, is different from 0. This would be a very difficult model to simply happen upon.

show code
nonlinear_dat <- 
  nonlinear_dat %>% 
  mutate(
    dummy_code = 
      factor(if_else(x*z < 0, 0, 1))
      )

confirmation_lmer2 <- 
  lmer(
    data = nonlinear_dat, 
    formula = 
      outcome ~ x + z + x:z:dummy_code + w +  
      (1 | id)
  )
summary(confirmation_lmer2)
Linear mixed model fit by REML ['lmerMod']
Formula: outcome ~ x + z + x:z:dummy_code + w + (1 | id)
   Data: nonlinear_dat

REML criterion at convergence: 4727.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5431 -0.6796 -0.0225  0.6703  2.9410 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.7424   0.8616  
 Residual             4.5122   2.1242  
Number of obs: 1065, groups:  id, 71

Fixed effects:
                Estimate Std. Error t value
(Intercept)      4.48729    0.13363  33.579
x                1.11054    0.07152  15.528
z                1.96038    0.06733  29.117
w                0.07117    0.06740   1.056
x:z:dummy_code0 -3.75376    0.11871 -31.621
x:z:dummy_code1  3.87560    0.11535  33.598

Correlation of Fixed Effects:
            (Intr) x      z      w      x:z:_0
x           -0.011                            
z            0.029 -0.010                     
w            0.016 -0.023 -0.027              
x:z:dmmy_c0  0.331  0.041 -0.049  0.008       
x:z:dmmy_c1 -0.331 -0.100 -0.122 -0.011 -0.253

Exploring the Data with the {glmertree} package

Let’s see how GLMM trees act in my next post