Purpose

Today’s lab will focus on correlations. We will discuss how to calculate a correlation coefficient between two variables, how to assess statistical significance of correlations, and a variety of tools for visualizing correlations, especially among large groups of variables.

To quickly navigate to the desired section, click one of the following links:

  1. Covariance and correlation
  2. Visualizing correlations
  3. Hypothesis testing with correlations

You will need to load the following libraries to follow along with today’s lab. If you don’t have any of these packages installed, please do so now.

library(tidyverse) # for plotting and data wrangling
library(rio) # for importing data
library(psych) # for covariance and correlation functions
library(apaTables) # for correlation tables
library(pwr) # for power calculation

Covariances and correlation

Covariance

  • Before we talk about correlations, let’s review the concept of covariance. Simply put, covariance captures how the variances of two variables are related, i.e. how they co-vary.

\[\large cov_{xy} = {\frac{\sum{(x-\bar{x})(y-\bar{y})}}{N-1}}\]

  • You’ll notice that this formula looks very similar to the formula for calculating the variance of a single variable:

\[\large s^{2} = {\frac{\sum{(x-\bar{x})^2}}{N-1}}\]

  • To calculate covariance, use cov(). This function comes from the {stats} package, which is already loaded when you open R. We’ll use the mtcars dataset as an example:


  • Calculate the covariance between mtcars$mpg and mtcars$hp:

Code
cov(mtcars$mpg, mtcars$hp)
Output
## [1] -320.7321


  • Feeding cov() a data frame (of numeric variables), will generate a covariance matrix. Let’s start with a data frame that only includes mpg and hp. We’ll also round to two decimal places for convenience.

Code
mtcars %>% 
  select(mpg, hp) %>% # select only our variables on interest
  cov() %>% # calculate covariance
  round(2) # round all values to 2 decimal places
Output
##         mpg      hp
## mpg   36.32 -320.73
## hp  -320.73 4700.87

Question: In the above output, what do the numbers along the diagonal represent?


  • We can also easily generate a covariance matrix of more than 2 variables:

Code
cov(mtcars) %>% 
  round(2)
Output
##          mpg    cyl     disp      hp   drat     wt   qsec     vs     am   gear
## mpg    36.32  -9.17  -633.10 -320.73   2.20  -5.12   4.51   2.02   1.80   2.14
## cyl    -9.17   3.19   199.66  101.93  -0.67   1.37  -1.89  -0.73  -0.47  -0.65
## disp -633.10 199.66 15360.80 6721.16 -47.06 107.68 -96.05 -44.38 -36.56 -50.80
## hp   -320.73 101.93  6721.16 4700.87 -16.45  44.19 -86.77 -24.99  -8.32  -6.36
## drat    2.20  -0.67   -47.06  -16.45   0.29  -0.37   0.09   0.12   0.19   0.28
## wt     -5.12   1.37   107.68   44.19  -0.37   0.96  -0.31  -0.27  -0.34  -0.42
## qsec    4.51  -1.89   -96.05  -86.77   0.09  -0.31   3.19   0.67  -0.20  -0.28
## vs      2.02  -0.73   -44.38  -24.99   0.12  -0.27   0.67   0.25   0.04   0.08
## am      1.80  -0.47   -36.56   -8.32   0.19  -0.34  -0.20   0.04   0.25   0.29
## gear    2.14  -0.65   -50.80   -6.36   0.28  -0.42  -0.28   0.08   0.29   0.54
## carb   -5.36   1.52    79.07   83.04  -0.08   0.68  -1.89  -0.46   0.05   0.33
##       carb
## mpg  -5.36
## cyl   1.52
## disp 79.07
## hp   83.04
## drat -0.08
## wt    0.68
## qsec -1.89
## vs   -0.46
## am    0.05
## gear  0.33
## carb  2.61


Correlation

  • Correlations are standardized covariances. Because correlations are in standardized units, we can compare them across scales of measurements and across studies. Recall that mathematically, a correlation is the covariance divided by the product of the standard deviations. It is also equivalent to the product of two z-scored variables.

\[\large r_{xy} = {\frac{cov(X,Y)}{\hat\sigma_{x}\hat\sigma_{y}}}\]

\[\large r_{xy} = {\frac{\sum({z_{x}z_{y})}}{N}}\]

  • To calculate a correlation coefficient, use cor() (again, from the {stats} package).

Code
cor(mtcars$mpg, mtcars$hp)
Output
## [1] -0.7761684


  • As with covariances, we can generate a matrix of correlations by feeding a data frame to cor(). The following code will give us a correlation matrix of mpg and hp.

Code
mtcars %>% 
  select(mpg, hp) %>% 
  cor() %>% 
  round(2)
Output
##       mpg    hp
## mpg  1.00 -0.78
## hp  -0.78  1.00


  • To get a correlation matrix of all the variables in our data frame:

Code
cor(mtcars) %>% 
  round(2)
Output
##        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
## mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.66  0.60  0.48 -0.55
## cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.81 -0.52 -0.49  0.53
## disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.71 -0.59 -0.56  0.39
## hp   -0.78  0.83  0.79  1.00 -0.45  0.66 -0.71 -0.72 -0.24 -0.13  0.75
## drat  0.68 -0.70 -0.71 -0.45  1.00 -0.71  0.09  0.44  0.71  0.70 -0.09
## wt   -0.87  0.78  0.89  0.66 -0.71  1.00 -0.17 -0.55 -0.69 -0.58  0.43
## qsec  0.42 -0.59 -0.43 -0.71  0.09 -0.17  1.00  0.74 -0.23 -0.21 -0.66
## vs    0.66 -0.81 -0.71 -0.72  0.44 -0.55  0.74  1.00  0.17  0.21 -0.57
## am    0.60 -0.52 -0.59 -0.24  0.71 -0.69 -0.23  0.17  1.00  0.79  0.06
## gear  0.48 -0.49 -0.56 -0.13  0.70 -0.58 -0.21  0.21  0.79  1.00  0.27
## carb -0.55  0.53  0.39  0.75 -0.09  0.43 -0.66 -0.57  0.06  0.27  1.00


  • If are given a covariance matrix, we can easily convert it to a correlation matrix using cov2cor().

Code
# covariance matrix
cov_mat <- cov(mtcars)

# convert to correlation matrix
cov2cor(cov_mat) %>% 
  round(2)
Output
##        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
## mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.66  0.60  0.48 -0.55
## cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.81 -0.52 -0.49  0.53
## disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.71 -0.59 -0.56  0.39
## hp   -0.78  0.83  0.79  1.00 -0.45  0.66 -0.71 -0.72 -0.24 -0.13  0.75
## drat  0.68 -0.70 -0.71 -0.45  1.00 -0.71  0.09  0.44  0.71  0.70 -0.09
## wt   -0.87  0.78  0.89  0.66 -0.71  1.00 -0.17 -0.55 -0.69 -0.58  0.43
## qsec  0.42 -0.59 -0.43 -0.71  0.09 -0.17  1.00  0.74 -0.23 -0.21 -0.66
## vs    0.66 -0.81 -0.71 -0.72  0.44 -0.55  0.74  1.00  0.17  0.21 -0.57
## am    0.60 -0.52 -0.59 -0.24  0.71 -0.69 -0.23  0.17  1.00  0.79  0.06
## gear  0.48 -0.49 -0.56 -0.13  0.70 -0.58 -0.21  0.21  0.79  1.00  0.27
## carb -0.55  0.53  0.39  0.75 -0.09  0.43 -0.66 -0.57  0.06  0.27  1.00



Visualizing correlations

  • As Sara mentioned in class, it is very important to always visualize your data. There might be patterns in your data that are not apparent just from looking at a correlation between two variables, or even a correlation matrix for that matter. We will go over just a few basic examples here, but there are many different options for visualizing correlations. See here if you want to explore more options.

Scatter plots

  • Scatter plots allow us to visualize the relationship between two variables. By now we are all familiar with scatter plots, but let’s create a simple one using ggplot() just to jog our memory. To continue with our running example…

Code
ggplot(data = mtcars, aes(x = mpg, y = hp)) +
  geom_point()
Output


  • It is often useful to add a best fit line. We can do this by adding geom_smooth(method = "lm"). ("lm" stands for “linear model”.) Put a pin in that for now…we’ll discuss more what this line represents in a future lab.

Code
ggplot(data = mtcars, aes(x = mpg, y = hp)) +
  geom_point() + 
  geom_smooth(method = "lm")
Output


SPLOM plots

  • “SPLOM” stands for scatter plot matrix. The pairs.panel() function from the {psych} package allows a quick way to visualize relationships among all the continuous variables in your data frame. The lower diagonal contains scatter plots showing bivariate relationships between pairs of variables, and the upper diagonal contains the corresponding correlation coefficients. Histograms for each variable are shown along the diagonal.

  • Note that this function is not ideal with very large data sets, as it becomes difficult to read the plots. (Also, we actually already learned this function in PSY 611!

Code
pairs.panels(mtcars, lm = TRUE)
Output


Heat maps

  • Heat maps are a great way to get a high-level visualization of a correlation matrix. They are particularly useful for visualizing the number of “clusters” in your data if that’s something you’re looking for. For example the Thurstone dataset, built into the {psych} package, is a correlation matrix of items that assess different aspects of cognitive ability. We can plot a heatmap of this correlation matrix using the corPlot() function:

Code
corPlot(Thurstone)
Output

Question: What do you notice about the structure of this data?


APA Tables

  • The package {apaTables} has a very useful function apa.cor.table() that creates nicely formatted tables of correlation matrices in APA format.

Code
apa.cor.table(mtcars, filename = "cars.doc", table.number = 1)
Output


Hypothesis testing with correlations

  • A correlation can be considered both a descriptive and inferential statistic. So far we’ve talked about how to calculate correlations between pairs of continuous variables and how to interpret them descriptively, but we haven’t mentioned how to assess whether correlations are statistically meaningful.

  • To illustrate how to assess statistical significance of correlations, we are going to work with a dataset about the relationship between conscientiousness and physical health. We’ve collected data from 60 people (30 men and 30 women) on their self-reported conscientiousness (using the BFI) and their self-reported physical health. We want to know if there is a significant correlation between these variables, and then whether or not that correlation differs between men and women.

  • Import the data using the following code and check out the structure of the data.

health <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-1/data/consc_health.csv")

str(health)

Visualize the data

Code
ggplot(data = health, aes(x = consc, y = sr_health)) +
  geom_point() + 
  geom_smooth(method = "lm")
Output


Statistical hypotheses

\[\large H_{0}: \rho_{xy} = 0\]

\[\large H_{A}: \rho_{xy} \neq 0\]

Power calculation

Code
pwr.r.test(n = nrow(health), sig.level = .05 , power = .8)
Output
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 60
##               r = 0.3525707
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Question: If our sample size were doubled (120 instead of 60), what would happen to the value of r in the above output? Why? What if we decreased our significance level to .01?


Relevant functions

  • There are a couple different functions to be aware for running statistical tests of correlations.

stats::cor.test()

  • By default, cor.test() from the {stats} package runs a statistical test to determine whether the observed correlation between two variables is significantly different from 0 (the null hypothesis). In addition to the usual output, it also returns a 95% CI for the correlation coefficient based on a Fisher’s r to z’ transformation.

Code
cor.test(health$consc, health$sr_health)
Output
## 
##  Pearson's product-moment correlation
## 
## data:  health$consc and health$sr_health
## t = 4.1281, df = 58, p-value = 0.0001186
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2532571 0.6516132
## sample estimates:
##       cor 
## 0.4765366

Question: What can we conclude about the relationship between conscientiousness and health from this data?


psych::corr.test()

  • corr.test() from the {psych} package gives the same information, but in slightly different format.

Code
health %>% 
  select(consc, sr_health) %>% 
  corr.test()
Output
## Call:corr.test(x = .)
## Correlation matrix 
##           consc sr_health
## consc      1.00      0.48
## sr_health  0.48      1.00
## Sample Size 
## [1] 60
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##           consc sr_health
## consc         0         0
## sr_health     0         0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option


  • Note that in order to actually see the confidence intervals, we have to print the output of corr.test() and add the argument short = FALSE.

Code
health %>% 
  select(consc, sr_health) %>% 
  corr.test() %>% 
  print(short = FALSE)
Output
## Call:corr.test(x = .)
## Correlation matrix 
##           consc sr_health
## consc      1.00      0.48
## sr_health  0.48      1.00
## Sample Size 
## [1] 60
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##           consc sr_health
## consc         0         0
## sr_health     0         0
## 
##  Confidence intervals based upon normal theory.  To get bootstrapped values, try cor.ci
##             raw.lower raw.r raw.upper raw.p lower.adj upper.adj
## consc-sr_hl      0.25  0.48      0.65     0      0.25      0.65


  • It is often very useful to save the output of a statistical test to an object that you can then pull useful information out of.

Code
# store the results of the corr.test() as a list
r_consc_health <- health %>% 
  select(consc, sr_health) %>% 
  corr.test() 

# Now we can pull out just the confidence intervals (or any other information we want!)
r_consc_health$ci
Output
##                 lower         r     upper            p
## consc-sr_hl 0.2532571 0.4765366 0.6516132 0.0001186082


  • psych::corr.test() is also a more flexible function because it can take in an entire data frame of continuous variables and run many statistical tests at once. Let’s look again at the mtcars dataset, which has 11 variables.

Code
mtcars %>% 
  corr.test() %>% 
  print(short = FALSE)
Output
## Call:corr.test(x = .)
## Correlation matrix 
##        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
## mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.66  0.60  0.48 -0.55
## cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.81 -0.52 -0.49  0.53
## disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.71 -0.59 -0.56  0.39
## hp   -0.78  0.83  0.79  1.00 -0.45  0.66 -0.71 -0.72 -0.24 -0.13  0.75
## drat  0.68 -0.70 -0.71 -0.45  1.00 -0.71  0.09  0.44  0.71  0.70 -0.09
## wt   -0.87  0.78  0.89  0.66 -0.71  1.00 -0.17 -0.55 -0.69 -0.58  0.43
## qsec  0.42 -0.59 -0.43 -0.71  0.09 -0.17  1.00  0.74 -0.23 -0.21 -0.66
## vs    0.66 -0.81 -0.71 -0.72  0.44 -0.55  0.74  1.00  0.17  0.21 -0.57
## am    0.60 -0.52 -0.59 -0.24  0.71 -0.69 -0.23  0.17  1.00  0.79  0.06
## gear  0.48 -0.49 -0.56 -0.13  0.70 -0.58 -0.21  0.21  0.79  1.00  0.27
## carb -0.55  0.53  0.39  0.75 -0.09  0.43 -0.66 -0.57  0.06  0.27  1.00
## Sample Size 
## [1] 32
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##       mpg cyl disp   hp drat   wt qsec   vs   am gear carb
## mpg  0.00   0 0.00 0.00 0.00 0.00 0.22 0.00 0.01 0.10 0.02
## cyl  0.00   0 0.00 0.00 0.00 0.00 0.01 0.00 0.04 0.08 0.04
## disp 0.00   0 0.00 0.00 0.00 0.00 0.20 0.00 0.01 0.02 0.30
## hp   0.00   0 0.00 0.00 0.17 0.00 0.00 0.00 1.00 1.00 0.00
## drat 0.00   0 0.00 0.01 0.00 0.00 1.00 0.19 0.00 0.00 1.00
## wt   0.00   0 0.00 0.00 0.00 0.00 1.00 0.02 0.00 0.01 0.20
## qsec 0.02   0 0.01 0.00 0.62 0.34 0.00 0.00 1.00 1.00 0.00
## vs   0.00   0 0.00 0.00 0.01 0.00 0.00 0.00 1.00 1.00 0.02
## am   0.00   0 0.00 0.18 0.00 0.00 0.21 0.36 0.00 0.00 1.00
## gear 0.01   0 0.00 0.49 0.00 0.00 0.24 0.26 0.00 0.00 1.00
## carb 0.00   0 0.03 0.00 0.62 0.01 0.00 0.00 0.75 0.13 0.00
## 
##  Confidence intervals based upon normal theory.  To get bootstrapped values, try cor.ci
##           raw.lower raw.r raw.upper raw.p lower.adj upper.adj
## mpg-cyl       -0.93 -0.85     -0.72  0.00     -0.95     -0.57
## mpg-disp      -0.92 -0.85     -0.71  0.00     -0.95     -0.56
## mpg-hp        -0.89 -0.78     -0.59  0.00     -0.93     -0.41
## mpg-drat       0.44  0.68      0.83  0.00      0.24      0.89
## mpg-wt        -0.93 -0.87     -0.74  0.00     -0.96     -0.61
## mpg-qsec       0.08  0.42      0.67  0.02     -0.09      0.75
## mpg-vs         0.41  0.66      0.82  0.00      0.21      0.88
## mpg-am         0.32  0.60      0.78  0.00      0.11      0.86
## mpg-gear       0.16  0.48      0.71  0.01     -0.03      0.79
## mpg-carb      -0.75 -0.55     -0.25  0.00     -0.83     -0.05
## cyl-disp       0.81  0.90      0.95  0.00      0.70      0.97
## cyl-hp         0.68  0.83      0.92  0.00      0.53      0.95
## cyl-drat      -0.84 -0.70     -0.46  0.00     -0.90     -0.27
## cyl-wt         0.60  0.78      0.89  0.00      0.42      0.93
## cyl-qsec      -0.78 -0.59     -0.31  0.00     -0.85     -0.10
## cyl-vs        -0.90 -0.81     -0.64  0.00     -0.94     -0.48
## cyl-am        -0.74 -0.52     -0.21  0.00     -0.81     -0.02
## cyl-gear      -0.72 -0.49     -0.17  0.00     -0.80      0.02
## cyl-carb       0.22  0.53      0.74  0.00      0.02      0.82
## disp-hp        0.61  0.79      0.89  0.00      0.44      0.93
## disp-drat     -0.85 -0.71     -0.48  0.00     -0.90     -0.28
## disp-wt        0.78  0.89      0.94  0.00      0.66      0.97
## disp-qsec     -0.68 -0.43     -0.10  0.01     -0.77      0.08
## disp-vs       -0.85 -0.71     -0.48  0.00     -0.90     -0.28
## disp-am       -0.78 -0.59     -0.31  0.00     -0.85     -0.10
## disp-gear     -0.76 -0.56     -0.26  0.00     -0.83     -0.05
## disp-carb      0.05  0.39      0.65  0.03     -0.11      0.74
## hp-drat       -0.69 -0.45     -0.12  0.01     -0.78      0.07
## hp-wt          0.40  0.66      0.82  0.00      0.20      0.88
## hp-qsec       -0.85 -0.71     -0.48  0.00     -0.90     -0.28
## hp-vs         -0.86 -0.72     -0.50  0.00     -0.91     -0.30
## hp-am         -0.55 -0.24      0.12  0.18     -0.65      0.27
## hp-gear       -0.45 -0.13      0.23  0.49     -0.53      0.33
## hp-carb        0.54  0.75      0.87  0.00      0.35      0.92
## drat-wt       -0.85 -0.71     -0.48  0.00     -0.90     -0.28
## drat-qsec     -0.27  0.09      0.43  0.62     -0.34      0.49
## drat-vs        0.11  0.44      0.68  0.01     -0.08      0.77
## drat-am        0.48  0.71      0.85  0.00      0.28      0.90
## drat-gear      0.46  0.70      0.84  0.00      0.27      0.90
## drat-carb     -0.43 -0.09      0.27  0.62     -0.47      0.31
## wt-qsec       -0.49 -0.17      0.19  0.34     -0.58      0.30
## wt-vs         -0.76 -0.55     -0.26  0.00     -0.83     -0.06
## wt-am         -0.84 -0.69     -0.45  0.00     -0.89     -0.26
## wt-gear       -0.77 -0.58     -0.29  0.00     -0.85     -0.09
## wt-carb        0.09  0.43      0.68  0.01     -0.08      0.76
## qsec-vs        0.53  0.74      0.87  0.00      0.34      0.92
## qsec-am       -0.54 -0.23      0.13  0.21     -0.63      0.27
## qsec-gear     -0.52 -0.21      0.15  0.24     -0.62      0.28
## qsec-carb     -0.82 -0.66     -0.40  0.00     -0.88     -0.20
## vs-am         -0.19  0.17      0.49  0.36     -0.30      0.57
## vs-gear       -0.15  0.21      0.52  0.26     -0.28      0.61
## vs-carb       -0.77 -0.57     -0.28  0.00     -0.84     -0.07
## am-gear        0.62  0.79      0.89  0.00      0.44      0.93
## am-carb       -0.30  0.06      0.40  0.75     -0.30      0.40
## gear-carb     -0.08  0.27      0.57  0.13     -0.24      0.67


Comparing correlations by group

  • We saw above that there is a significant positive relationship between conscientiousness and self-reported health. However, now we want to know whether or not this correlation is significantly different for men and women.

Visualizing by group

  • First let’s plot our data again, but this time split the data by gender.

Code
health %>% 
  ggplot(aes(consc, sr_health, col = gender)) + 
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", alpha = 0.2) + 
  labs(x = "Conscientiousness", y = "Self-reported Health") +
  theme_minimal()
Output

Question: What can we conclude from this graph?


psych::r.test()

  • To statistically compare the correlations between conscientiousness and self-reported health for men and women, we can use the r.test() function. This particular function requires the sample size, and the two correlations we’re testing against each other (i.e., we can’t simply pass in the dataset). So, we’ll need to run the correlation separately for men and women, save those values, and then use them in the r.test() function.
health_women <- health %>% 
  filter(gender == "female") 

health_men <- health %>% 
  filter(gender == "male")

r_women <- cor(health_women$consc, health_women$sr_health)

r_men <- cor(health_men$consc, health_men$sr_health)
  • The argument names for r.test() are a little confusing. In our case, we need to supply 3 pieces of information:
  1. n = sample size of first group
    2.r12 = r for variable 1 and variable 2 (e.g., r for women’s consc and women’s health)
    3.r34 = r for variable 3 and variable 4 (e.g., r for men’s consc and men’s health)

Code
r.test(n = 30, r12 = r_women, r34 = r_men)
Output
## Correlation tests 
## Call:r.test(n = 30, r12 = r_women, r34 = r_men)
## Test of difference between two independent correlations 
##  z value 1.16    with probability  0.25

Question: What does this test suggest about the correlation between conscientiousness and health across genders?



Minihacks

Minihack 1

For this minihack, calculate a 95% CI “by hand” for the correlation between conscientiousness and self-reported health from our earlier health() dataset. Make sure you get the same answer that was given by cor.test(). Hint: when calculating CI’s, use the functions fisherz() and fisherz2r() from {psych}.

To review the steps of calculating confidence intervals using Fisher’s transformation, see here.

# Your code here

Minihack 2

Now pretend that we had collected data about conscientiousness and self-reported health from 600 people instead of only 60. If we were to obtain the same point estimate of the correlation between conscientiousness and health, what would the 95% CI be of our estimate if N = 600? Explain to your neighbor how and why these confidence intervals are different.

# Your code here

Minihack 3

Check out this paper from Dawtry et al. (2015). Focus on Table 1 from Study 1a (see below). Your task is to replicate the correlation matrix from Table 1. For an extra bonus, format your correlation matrix using apa.cor.table() and open it in Microsoft Word.

You can import the data using the following code:

dawtry_clean <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-1/data/dawtry_2015_clean.csv")

The data above has been cleaned somewhat for you. For an extra data wrangling challenge, import the raw form of the data (not required):

dawtry_raw <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-1/data/dawtry_2015_raw.csv")

Note: If you use the raw data, note that you will have to create some composite variables (check out the rowMeans() function). Also note that items redist2 and redist4 must be reverse-coded before creating a composite score.

# Your code here