Purpose

Today we will review how to run models containing interactions between two continuous predictors. We will go over how to specify interaction terms in R, how to interpret the model output and how to visualize the results.

  1. Research scenario
  2. Centering
  3. Running interaction models
  4. Simple slopes
  5. Significance tests of simple slopes
  6. Minihacks

Be sure to have the following packages installed and loaded:

library(tidyverse) # for plotting and data wrangling
library(rio) # for importing data
library(psych) # for descriptives
library(olsrr) # for diagnostics
library(broom) # for tidying model output
library(reghelper) # for calculating simple slopes
library(sjPlot) # for plotting simple slopes

Research scenario

The tendency to suppress the expression of emotion when regulating one’s emotions is associated with receiving less social support from friends (Srivastava et al., 2009). A graduate student is interested in examining how this relationship differs depending on the extent to which the emotion regulator holds individualistic as compared to collectivist values. To test this, participants complete the following measures:

  • Social support received from friends = Y
    • 1 = receives no social support from friends
    • 5 = receives an extreme amount of social support from friends


  • Expressive Suppression = X
    • 1 = rarely uses suppression to regulate one’s emotions
    • 5 = frequently uses suppression to regulate one’s emotions


  • Individualism/Collectivism = Z (moderator)
    • 1 = strongly endorses values related to collectivism
    • 5 = strongly endorses values related to individualism

The data

  • Import the data:
social <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-7/data/social_support.csv")


  • Look at the structure

Code
str(social)
Output
## 'data.frame':    72 obs. of  4 variables:
##  $ ID         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ socsup     : int  5 2 4 3 5 3 1 3 1 4 ...
##  $ suppression: int  1 4 1 2 1 4 5 3 4 1 ...
##  $ indiv      : int  5 4 5 4 4 5 3 5 3 4 ...


  • Check out the descriptives

Code
describe(social)
Output


The model

  • Written generically…

\[\hat{Y} = b_0 + b_1X + b_2Z + b_3(XZ)\]

  • With our variable names…

\[\hat{SocSupport} = b_0 + b_1Suppression + b_2Individualism + b_3(Suppression*Individualism)\]

Question: What does b0 represent?

Question: What does b1 represent?

Question: What does b2 represent?

Question: What does b3 represent?


Centering

  • Centering (or “mean centering”) a variable refers to subtracting the mean of that variable from every individual value. When working with models that contain interaction terms, it is generally a good idea to center your predictor variables, because…
    • centering addresses issues of multicollinearity

    • centering makes the model coefficients more interpretable


  • We’ll talk about each of these ideas in more detail in a moment, but first let’s discuss how to center a variable in R…


  • To center our predictors, we’ll use the mutate() and scale() functions. By default, scale() z-scores variables, but we can get it to just perform mean centering by setting the argument scale = FALSE. Remember, a z score is \(\frac{X - \bar{X}}{\sigma_X}\), or a mean-centered score divided by the SD; by setting scale = FALSE we’re telling R to center it but not scale it (i.e., don’t divide by the SD).


  • The default settings for scale() are scale(x, center = TRUE, scale = TRUE). To be very explicit, we’ll set center = TRUE and scale = FALSE.


  • Highly technical but important note: scale() actually changes the class of the resulting variable to be matrix, so we’ll use as.numeric() to force the resulting centered variables to be of type numeric.
social <- social %>% 
  mutate(suppression_c = as.numeric(scale(suppression, center = TRUE, scale = FALSE)),
         indiv_c = as.numeric(scale(indiv, center = TRUE, scale = FALSE)))


  • Let’s peek at our new data frame. We now have both centered and un-centered versions of our predictor variables

Code
head(social)
Output



Running interaction models

  • Next we’ll run the moderated multiple regression model. Note that in R, if we want to get the main effects and interactions we can just enter the interaction term, e.g., lm(Y ~ X*Z), which tells R to include X, Z, and XZ (the interaction term). It is equivalent to running it spelled out, e.g., lm(Y ~ X + Z + X:Z). If we wanted just the interaction, we would run a model with just the last term, e.g., lm(Y ~ X:Z)…but DON’T DO THIS!

Code
model_int <- lm(socsup ~ suppression_c*indiv_c, data = social)
summary(model_int)
Output
## 
## Call:
## lm(formula = socsup ~ suppression_c * indiv_c, data = social)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.79330 -0.81716  0.02792  1.00532  2.45494 
## 
## Coefficients:
##                        Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)            2.976971   0.141778  20.997 < 0.0000000000000002 ***
## suppression_c         -0.009945   0.106094  -0.094             0.925591    
## indiv_c                0.161361   0.112255   1.437             0.155175    
## suppression_c:indiv_c -0.303784   0.083044  -3.658             0.000497 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.197 on 68 degrees of freedom
## Multiple R-squared:  0.1874, Adjusted R-squared:  0.1515 
## F-statistic: 5.227 on 3 and 68 DF,  p-value: 0.002632


Question: What can we conclude? Is there an interaction between suppression and individualism predicting social support?

Interpret the coefficients

  • Remember that we first centered our predictor variables before running the model. With that in mind, let’s interpret what the model coefficients mean. Here are the coefficients again…
## 
## Call:
## lm(formula = socsup ~ suppression_c * indiv_c, data = social)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.79330 -0.81716  0.02792  1.00532  2.45494 
## 
## Coefficients:
##                        Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)            2.976971   0.141778  20.997 < 0.0000000000000002 ***
## suppression_c         -0.009945   0.106094  -0.094             0.925591    
## indiv_c                0.161361   0.112255   1.437             0.155175    
## suppression_c:indiv_c -0.303784   0.083044  -3.658             0.000497 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.197 on 68 degrees of freedom
## Multiple R-squared:  0.1874, Adjusted R-squared:  0.1515 
## F-statistic: 5.227 on 3 and 68 DF,  p-value: 0.002632


Question: What does b0 represent?

Question: What does b1 represent?

Question: What does b2 represent?

Question: What does b3 represent?


  • For comparison, here is the model output if we had NOT centered our predictors. The lower-order coefficients (b0, b1, b2) are less meaningful/interpretable here. But notice that the higher-order coefficient (b3) does not change!
## 
## Call:
## lm(formula = socsup ~ suppression * indiv, data = social)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.79330 -0.81716  0.02792  1.00532  2.45494 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.15430    0.91326  -0.169 0.866336    
## suppression        0.88875    0.27286   3.257 0.001757 ** 
## indiv              1.06850    0.27183   3.931 0.000201 ***
## suppression:indiv -0.30378    0.08304  -3.658 0.000497 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.197 on 68 degrees of freedom
## Multiple R-squared:  0.1874, Adjusted R-squared:  0.1515 
## F-statistic: 5.227 on 3 and 68 DF,  p-value: 0.002632


A note about multicollinearity

  • Let’s assess the multicollinearity of the model we just ran…

Code
# using the {olsrr} package 
ols_vif_tol(model_int)
Output


Question: Do we have any multicollinearity issues?

  • And now let’s see how these diagnostics look if we had not centered the predictors first…

Code
# piping the uncentered model to the tolerance function
lm(socsup ~ suppression*indiv, data = social) %>% 
  ols_vif_tol()
Output


  • To think about why we have multicollinearity, let’s look at the correlation matrix including our two predictor variables, suppression and indiv, and the interaction term (a linear combination of these two variables), that we’ll call suppression_x_indiv

Code
social %>% 
  mutate(suppression_x_indiv = suppression * indiv) %>% # create the interaction term
  select(suppression,                    
         indiv, 
         suppression_x_indiv) %>% # select the 3 variables we want to correlate
  cor() # generate a correlation matrix of these 3 variables
Output
##                     suppression       indiv suppression_x_indiv
## suppression          1.00000000 -0.09892416           0.6667002
## indiv               -0.09892416  1.00000000           0.6095234
## suppression_x_indiv  0.66670020  0.60952336           1.0000000


Question: Do you see a problem here?

  • Now let’s look at the correlation matrix including our centered predictor variables. What do you notice?

Code
social %>% 
  mutate(suppression_x_indiv_c = suppression_c * indiv_c) %>% # create the interaction term
  select(suppression_c,                    
         indiv_c, 
         suppression_x_indiv_c) %>% # select the 3 variables we want to correlate
  cor() # generate a correlation matrix of these 3 variables
Output
##                       suppression_c      indiv_c suppression_x_indiv_c
## suppression_c            1.00000000 -0.098924158           0.055129871
## indiv_c                 -0.09892416  1.000000000          -0.009086891
## suppression_x_indiv_c    0.05512987 -0.009086891           1.000000000



Simple slopes

  • Interpreting what the interaction between expressive suppression and individualism means from the model coefficients alone is rather difficult. It helps a LOT to plot the interaction instead of trying to interpret nuanced coefficients by themselves. What you end up plotting are “simple slopes” so we will discuss what this means first before diving into making plots.


  • “Simple slope” refers to both
    • the equation for the effect of X on Y at different levels of Z (moderator) AND…
    • just the coefficient for X in this equation


  • We’ll review how to calculate simple slopes “by hand” (the hard way) and using a function (the easy way).

Calculating simple slopes by hand

  • First we’ll get the coefficients from our model (model_int) into a nicely formatted data frame using broom::tidy()

Code
model_int_coefs <- model_int %>% 
  tidy()

# view the data frame of coefficients
model_int_coefs 
Output


  • Now we’ll pull out each individual coefficient using {dplyr} functions.
# b0 (the intercept)
b0 <- model_int_coefs %>% 
  filter(term == "(Intercept)") %>% 
  select(estimate)  

# b1 = the slope for supression
b_supp <- model_int_coefs %>% 
  filter(term == "suppression_c") %>% 
  select(estimate) 
  
# b2 = the slope for individualism
b_indiv <- model_int_coefs %>% 
  filter(term == "indiv_c") %>% 
  select(estimate)

# b3 = the coefficient for the interaction term
b_supp_x_indiv <- model_int_coefs %>% 
  filter(term == "suppression_c:indiv_c") %>% 
  select(estimate)


  • Next let’s calculate low (-1 SD), mid (mean), and high (+ 1 SD) values of individualism based on the mean and standard deviation of this variable
# low = mean - 1 SD
low_indiv <- mean(social$indiv_c, na.rm = TRUE) - sd(social$indiv_c, na.rm = TRUE) 

# mid = mean
mid_indiv <- mean(social$indiv_c, na.rm = TRUE)

# high = mean + 1 SD
high_indiv <- mean(social$indiv_c, na.rm = TRUE) + sd(social$indiv_c, na.rm = TRUE)


  • Calculate simple intercepts by plugging in low, mid and high values of Individualism. To review how this works algebraically, see this slide from class
# calculate simple intercepts and bind them together into 3 rows
simple_intercepts <- rbind(
                    b0 + (b_indiv * low_indiv),
                    b0 + (b_indiv * mid_indiv), 
                    b0 + (b_indiv * high_indiv)
                    )

# We'll give it a column name which will help below
colnames(simple_intercepts) <- "simple_intercepts"


  • Calculate simple slopes, again by plugging in low, mid and high values of Individualism.
# Calculate simple slopes
simple_slopes <-rbind(
                b_supp + (b_supp_x_indiv * low_indiv),
                b_supp + (b_supp_x_indiv * mid_indiv), 
                b_supp + (b_supp_x_indiv * high_indiv)
                )

# Again, give it a column name
colnames(simple_slopes) <- "simple_slopes"


  • Now combine all this information into one data frame so we can more easily plot it
# create labels for the different levels of individualism
indiv_levels <- c("low","mid","high")

# put them all together into a data frame
int_plot_df <- data.frame(cbind(
                         indiv_levels, # labels
                         simple_intercepts, # simple intercepts 
                         simple_slopes)) # simple slopes


  • Let’s look at our data frame of simple intercepts and simple slopes…

Code
int_plot_df
Output


Plotting simple slopes by hand

  • Finally, we’ll make the plot using ggplot and a few different geoms, including geom_point() and geom_abline()

Code
social %>% 
  ggplot(aes(x = suppression_c, y = socsup)) + 
  
  # add points for scatterplot and make them invisible
   geom_point(alpha = 0) + 
  
  # add line for low level of individualism
  geom_abline(aes(intercept = int_plot_df$simple_intercepts[1], 
                  slope = int_plot_df$simple_slopes[1], 
                  color = "-1SD Individualism"), # This will  give it  a label
              show.legend = TRUE) +
  
  # add line for mid level of individualism
  geom_abline(aes(intercept = int_plot_df$simple_intercepts[2], 
                  slope = int_plot_df$simple_slopes[2], 
                  color = "Mean Individualism"), 
              show.legend = TRUE) + 
  
  # add line for high level of individualism
  geom_abline(aes(intercept = int_plot_df$simple_intercepts[3], 
                  slope = int_plot_df$simple_slopes[3], 
                  color = "+1SD Individualism"), 
              show.legend = TRUE) +
  
  # add a title and label the axes
  labs(title = "Effect of expressive suppression on social support \nat low, mid, & high levels of individualism", 
       x = "Expressive suppression (Centered)",
       y = "Social Support") +
  
  # we need to set the colors of the lines manually, which requires scale_color_manual()
  scale_color_manual("Individualism Level", 
  # values() is where you specify the colors, in the form label = color_name
                     values = c("-1SD Individualism" = "salmon", 
                              "Mean Individualism" = "purple",
                              "+1SD Individualism" = "cornflower blue")) + 
  
  theme_minimal()
Output


Question: How would you interpret this plot?

sjPlot::plot_model()

  • Ok that was intense…Fortunately there is a much easier way to plot interactions! We can use the plot_model() function from {sjPlot}. This function takes in a fitted model object as the first argument (i.e. model). You can then specify the type of plot; in this case we will tell it to plot the simple slopes of our interaction model by specifying type = "int". Lastly, we will specify which values of the moderator variable (in our case, Individualism) to plot as separate lines.

Code
plot_model(model = model_int, # model fit object
           type = "int", # interaction 
           mdrt.values = "meansd") # which values of the moderator variable to use
Output


  • You can read more about using sjPlot::plot_model() to plot interactions here

Significance tests of simple slopes

  • Let’s talk about how to run significance tests on simple slopes. For each simple slope we can test whether it is significantly different from 0.


  • We’ll use the function simple_slopes() from the {reghelper} package (make sure you have this installed). This function takes in a model object and outputs a data frame with a row for each simple effect.


  • The first few columns identify the level at which each variable in your model was set for that test. A sstest value in a particular column indicates that the simple slope for this variable is being tested. The column labeled Test Estimate will give you the simple slopes (by default at -1 SD, mean, and + 1 SD), and the remaining columns have the information that pertains to the significance test.

Code
simple_slopes(model = model_int)
Output


  • In our case, we are only interested in the simple slopes of Suppression (at different levels of Individualism), so we can filter the resulting data frame from to only give us the relevant information

Code
simple_slopes(model_int) %>% 
  filter(suppression_c == "sstest")
Output


  • Lastly, let’s prove to ourselves and our long and arduous way of calculating simple slopes by hand was correct!


  • Here are the simple slopes calculated by reghelper::simple_slopes()

Code
simple_slopes(model_int) %>% 
  filter(suppression_c == "sstest") %>% 
  select(`Test Estimate`)
Output


  • Here are the simple slopes we calculated by hand…

Code
# we created this data frame earlier
int_plot_df %>% 
  select(simple_slopes)
Output



Minihacks

Research Scenario:

  • For her masters project, a grad student wants to see whether the time students spend studying (study_time) interacts with test anxiety (anxiety) to predict students’ test performance (perf).

  • Import the data below:

test_perf <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-7/data/test_perf.csv")

Minihack 1

  • Run a model testing whether test anxiety moderates the relationship between time spent studying and test performance. Don’t forget to first center your predictors.

  • What do the model coefficients mean? Try explaining them to a friend.


Minihack 2

  • Visualize the interaction. For an extra challenge, try doing this without using {sjPlot}.

  • What can you infer from this plot? Again, try explaining it to a friend.


Minihack 3

  • Test whether each simple slope is significantly different from 0.

  • What do the results of these significance tests mean?