Purpose

Today we will briefly review univariate regression and then will discuss how to summarize and visualize uncertainty in regression models using a variety of plotting methods. We will then touch on how to estimate regression coefficients using matrix algebra. Lastly, we will introduce the General Linear Model and demonstrate how GLM can be used to understand all of the statistical tests we have learned so far (t-tests, ANOVA, correlations, regressions) within one unifying framework.

To quickly navigate to the desired section, click one of the following links:

Visualizing uncertainty
Regression with matrix algebra
The General Linear Model

You will need to load the following libraries to follow along with today’s lab. If you don’t have any of these packages installed, please do so now.

library(tidyverse) # for plotting and data wrangling
library(rio) # for importing data
library(broom) # for cleaning up output
library(sjPlot) # for plotting
library(ggpubr) # for plotting
library(carData) # for Guyer dataset

Visualizing uncertainty

We’re going to use the same dataset from previous labs about the relationship between conscientiousness and self-rated health.

Data and review

Load in the data

health <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-3/data/consc_health.csv")

Recall how we wrote out our model

\[Y_i = b_0 + b_1X_i + e_i\]

\[health_i = b_0 + b_1consc_i + e_i\]

Here’s how we specified the model in R

Code

model <- lm(sr_health ~ consc, data = health)
summary(model)

Output

## 
## Call:
## lm(formula = sr_health ~ consc, data = health)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5450 -0.6414 -0.1623  0.6817  2.0154 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)   1.6570     0.3571   4.641 0.0000203 ***
## consc         0.4904     0.1188   4.128  0.000119 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8854 on 58 degrees of freedom
## Multiple R-squared:  0.2271, Adjusted R-squared:  0.2138 
## F-statistic: 17.04 on 1 and 58 DF,  p-value: 0.0001186

Here are our coefficients…

Results of Regressing Self-Reported Health on Conscientiousness
coefficient	b	SE	t	p
(Intercept)	1.66	0.36	4.64	< .001
consc	0.49	0.12	4.13	< .001

Question: What do the intercept and slope mean? What do the t-values tell us?

Confidence intervals

Our b's (intercept and slope) are estimates from our sample of true population parameters (\(\beta\)’s). Remember that whenever we calculate an estimate of something, we should also determine how precise our estimate is. This is where standard errors and confidence intervals come in.
Recall the formula for calculating confidence intervals:

\[CI_b = b \pm CV(SE_b)\]

In Minihack 2 you will get some practice using this formula to calculate confidence intervals around regression coefficients. For now, we will use a much easier method: stats::confint(). This function takes in a fitted model object as the first argument. By default it will give you 95% CI’s.

Code

confint(model)

Output

##                 2.5 %    97.5 %
## (Intercept) 0.9422515 2.3716950
## consc       0.2526050 0.7282069

Question: What does these 95% CI for the slope of conscientiousness mean in plain English?

Confidence bands

In addition to estimating precision around the our coefficients, we can also estimate our precision around each predicted value, \(\hat{Y_i}\). These standard errors are generated by broom::augment() (and are labeled .se.fit).

Code

model %>% # start with our model object
  augment() %>% # from broom package; gives us fitted values, residuals, etc.
  select(sr_health, .fitted, .se.fit) # select relevant variables

Output

If we were to string all of this information together, it would generate a confidence band around our regression line. As we’ve seen already with previous examples, it’s really easy to get this confidence band when creating a scatter plot by adding geom_smooth(method = "lm").

Code

health %>%
  ggplot(aes(x = consc, y = sr_health)) + 
  geom_point(alpha = 0.3) + 
  geom_smooth(method = "lm") + # adds a layer that includes the regression line & 95% confidence band
  labs(x = "Conscientiousness", y = "Self-rated health") +
  theme_minimal()

Output

The animation below is an example of a “Hypothetical Outcomes Plot” (HOP) that visually demonstrates what this 95% CI band represents. In essence, this plot shows what the regression line could look like if we were to repeat our experiment over and over (sampling from the same population each time).

Prediction

A regression line, by definition, corresponds to the line that gives the mean value of Y corresponding to each possible value of X, i.e. E(Y|X).
In addition, we can also predict an individual’s score (\(Y_i\)) for any value of X. From our regression model, we have the following equation that mathematically represents the relationship between conscientiousness and self-rated health:

\[\hat{health}_{i} = 1.6569733 + 0.4904059 * consc_{i}\]

For example, if we know someone’s conscientiousness score is 3.5, we can easily predict their score for self-rated health according to our model:

\[\hat{health} = 1.6569733 + 0.4904059*3.5 = 3.374\]

The predict() function gives us an easy way to get the predicted Y values from all the X values in our dataset.

Code

predict(model)

Output

##        1        2        3        4        5        6        7        8 
## 2.961300 4.119826 3.543509 2.507581 3.350326 3.621532 3.038562 3.570843 
##        9       10       11       12       13       14       15       16 
## 3.178485 3.147345 2.758911 3.403290 2.987058 3.015841 3.216697 3.135321 
##       17       18       19       20       21       22       23       24 
## 3.056917 3.006047 3.015321 3.297619 2.458993 2.840238 3.082434 2.292952 
##       25       26       27       28       29       30       31       32 
## 3.103269 3.133964 2.989617 3.354057 2.882657 3.952820 2.891139 3.307783 
##       33       34       35       36       37       38       39       40 
## 2.745710 3.555768 3.643154 2.474954 2.778601 2.520440 2.935472 2.524356 
##       41       42       43       44       45       46       47       48 
## 2.467469 3.442900 4.079152 2.368996 3.882772 2.256636 3.239823 2.609330 
##       49       50       51       52       53       54       55       56 
## 2.309207 3.229900 2.795072 2.476516 3.459384 3.363102 3.536529 1.938202 
##       57       58       59       60 
## 2.698847 3.737070 3.129544 2.779391

This should look familiar, as we already have gotten this information from broom::augment().

Code

augment(model)$.fitted

Output

##  [1] 2.961300 4.119826 3.543509 2.507581 3.350326 3.621532 3.038562 3.570843
##  [9] 3.178485 3.147345 2.758911 3.403290 2.987058 3.015841 3.216697 3.135321
## [17] 3.056917 3.006047 3.015321 3.297619 2.458993 2.840238 3.082434 2.292952
## [25] 3.103269 3.133964 2.989617 3.354057 2.882657 3.952820 2.891139 3.307783
## [33] 2.745710 3.555768 3.643154 2.474954 2.778601 2.520440 2.935472 2.524356
## [41] 2.467469 3.442900 4.079152 2.368996 3.882772 2.256636 3.239823 2.609330
## [49] 2.309207 3.229900 2.795072 2.476516 3.459384 3.363102 3.536529 1.938202
## [57] 2.698847 3.737070 3.129544 2.779391

Prediction bands

We can use this information to create “prediction bands”. First we will generate our predicted values (i.e. fitted values) along with a “prediction interval” (lower and upper bound) for each of these values.

Code

predicted <- predict(model, interval = "prediction")
predicted

Output

## Warning in predict.lm(model, interval = "prediction"): predictions on current data refer to _future_ responses

##         fit        lwr      upr
## 1  2.961300 1.17371888 4.748882
## 2  4.119826 2.25947787 5.980173
## 3  3.543509 1.74074780 5.346271
## 4  2.507581 0.70106755 4.314095
## 5  3.350326 1.55750837 5.143143
## 6  3.621532 1.81339348 5.429671
## 7  3.038562 1.25152397 4.825601
## 8  3.570843 1.76628606 5.375400
## 9  3.178485 1.39043087 4.966539
## 10 3.147345 1.35973942 4.934950
## 11 2.758911 0.96619235 4.551629
## 12 3.403290 1.60822745 5.198353
## 13 2.987058 1.19974524 4.774372
## 14 3.015841 1.22872479 4.802958
## 15 3.216697 1.42791809 5.005476
## 16 3.135321 1.34785491 4.922787
## 17 3.056917 1.26989230 4.843942
## 18 3.006047 1.21887622 4.793219
## 19 3.015321 1.22820165 4.802440
## 20 3.297619 1.50667342 5.088564
## 21 2.458993 0.64887783 4.269108
## 22 2.840238 1.05022878 4.630247
## 23 3.082434 1.29535377 4.869513
## 24 2.292952 0.46828552 4.117619
## 25 3.103269 1.31608138 4.890458
## 26 3.133964 1.34651249 4.921416
## 27 2.989617 1.20232583 4.776908
## 28 3.354057 1.56109375 5.147021
## 29 2.882657 1.09371696 4.671597
## 30 3.952820 2.11333811 5.792301
## 31 2.891139 1.10238491 4.679893
## 32 3.307783 1.51650432 5.099061
## 33 2.745710 0.95247006 4.538949
## 34 3.555768 1.75221339 5.359323
## 35 3.643154 1.83338823 5.452920
## 36 2.474954 0.66605493 4.283853
## 37 2.778601 0.98661799 4.570584
## 38 2.520440 0.71482928 4.326051
## 39 2.935472 1.14753408 4.723410
## 40 2.524356 0.71901566 4.329695
## 41 2.467469 0.65800380 4.276934
## 42 3.442900 1.64591980 5.239880
## 43 4.079152 2.22418998 5.934114
## 44 2.368996 0.55142160 4.186570
## 45 3.882772 2.05104678 5.714497
## 46 2.256636 0.42832923 4.084942
## 47 3.239823 1.45051297 5.029134
## 48 2.609330 0.80938210 4.409277
## 49 2.309207 0.48611661 4.132297
## 50 3.229900 1.44082634 5.018974
## 51 2.795072 1.00366438 4.586479
## 52 2.476516 0.66773471 4.285298
## 53 3.459384 1.66154564 5.257222
## 54 3.363102 1.56977624 5.156427
## 55 3.536529 1.73421026 5.338847
## 56 1.938202 0.07115747 3.805246
## 57 2.698847 0.90357393 4.494119
## 58 3.737070 1.91955030 5.554589
## 59 3.129544 1.34213807 4.916950
## 60 2.779391 0.98743583 4.571345

Next we’ll bind these predicted values (and their prediction intervals) to our original dataset.

Code

new_df <- cbind(health, predicted)
new_df

Output

And finally, we’ll plot a prediction band on top of the data by adding a geom_ribbon(). This prediction band gives us, for every value of X, the interval that represents the range of values that is likely to contain the value of a single new individual observation. Unlike confidence intervals, prediction intervals predict the spread for individual observations rather than the mean.

Code

new_df %>% 
  ggplot(aes(x = consc, y = sr_health)) +
  geom_point() +
  geom_smooth(method=lm,se=TRUE) +
  geom_ribbon(aes(ymin = lwr, ymax = upr), fill = "blue", alpha = 0.1) + # prediction band
  labs(x = "Conscientiousness", y = "Self-rated health") +
  theme_minimal()

Output

Question: Why is the prediction band wider than the confidence band around our regression line?

Out-of-sample prediction

Remember, we can plug any X value (i.e. conscientiousness score) into our regression equation and predict the corresponding Y value (i.e. self-rated health score). And when I say any X value, I mean that it doesn’t have to be an actual observation from our sample. We can also predict out-of-sample!
To illustrate this, we’ll use a data frame that has new X values (conscientiousness scores). Let’s pretend that we collected this data from a completely new sample of people.

Code

# read in new data
consc_data_new <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-3/data/consc_new.csv")

# let's view it
consc_data_new

Output

Now we can predict Y values for these out-of-sample X values based on the coefficients from our regression model. We just have to tell the predict() function what our newdata is.

Code

predict(model, newdata = consc_data_new)

Output

##        1        2        3        4        5        6        7        8 
## 3.677484 3.616375 3.226025 2.650264 2.713850 2.727757 2.515764 3.205323 
##        9       10       11       12       13       14       15       16 
## 3.289070 2.415204 3.323391 3.443534 3.226104 2.092627 3.096743 3.638404 
##       17       18       19       20 
## 2.282975 2.867180 2.216868 3.142791

Other visualization tools

sjPlot::plot_model()

Code

plot_model(model = model,    # name of model object
           type = "pred",    # show predicted values (i.e. regression line)
           show.data = TRUE, # include data points on plot
           jitter = TRUE)    # add small amount of random variation to  to prevent overlap

Output

## $consc

ggpubr::ggscatter()

Code

ggscatter(data = health,              # name of data.frame
          x = "consc",                # IV (must be quoted)
          y = "sr_health",            # DV (must be quoted)
          add = "reg.line",           # add regression line
          xlab = "Conscientiousness", # x-axis label
          ylab = "Self-rated health", # y-axis label
          conf.int = TRUE,            # show 95% confidence band around regression line
          cor.coef = TRUE)            # display correlation coefficient and p-value

Output

Regression with matrix algebra

Review

Consider our regression equation:

\[\mathbf{y} = b_0 + b_1\mathbf{x} + e\]

Recall that y is a vector of values, which can be represented as an \(n\times1\) matrix, \(\mathbf{Y}\). Similarly, x can be represented as an \(n\times1\) matrix, \(\mathbf{X}\).
If we augment the matrix \(\mathbf{X}\) to be an \(n\times2\) matrix, in which the first column is filled with 1’s, we can simplify our regression equation (one property of the residuals is that the the average residual is 0, so we can remove e from the equation as well):

\[\mathbf{Y} = \mathbf{XB}\]

Note: \(\mathbf{B}\) is a \(2 \times 1\) matrix containing our estimates of the intercept and slope.

Example

Let’s illustrate how this works by running our regression analysis (self-rated health ~ conscientiousness) using matrix algebra.
Create the X matrix

Code

x_mat <- health %>% # start with the original data frame
  mutate(ones = rep(1, nrow(.))) %>% # create a column of 1's to represent the intercept
  select(ones, consc) %>% # select only the column of 1's and X variable
  as.matrix() %>%  # coerce to a matrix
  unname() # get rid of dimnames 

print(x_mat)

Output

##       [,1]      [,2]
##  [1,]    1 2.6596884
##  [2,]    1 5.0220685
##  [3,]    1 3.8468867
##  [4,]    1 1.7344974
##  [5,]    1 3.4529606
##  [6,]    1 4.0059852
##  [7,]    1 2.8172358
##  [8,]    1 3.9026234
##  [9,]    1 3.1025556
## [10,]    1 3.0390568
## [11,]    1 2.2469903
## [12,]    1 3.5609621
## [13,]    1 2.7122125
## [14,]    1 2.7709048
## [15,]    1 3.1804746
## [16,]    1 3.0145390
## [17,]    1 2.8546636
## [18,]    1 2.7509337
## [19,]    1 2.7698433
## [20,]    1 3.3454849
## [21,]    1 1.6354199
## [22,]    1 2.4128273
## [23,]    1 2.9066946
## [24,]    1 1.2968417
## [25,]    1 2.9491817
## [26,]    1 3.0117720
## [27,]    1 2.7174300
## [28,]    1 3.4605698
## [29,]    1 2.4993243
## [30,]    1 4.6815225
## [31,]    1 2.5166209
## [32,]    1 3.3662098
## [33,]    1 2.2200715
## [34,]    1 3.8718843
## [35,]    1 4.0500753
## [36,]    1 1.6679664
## [37,]    1 2.2871418
## [38,]    1 1.7607182
## [39,]    1 2.6070213
## [40,]    1 1.7687026
## [41,]    1 1.6527038
## [42,]    1 3.6417315
## [43,]    1 4.9391295
## [44,]    1 1.4519042
## [45,]    1 4.5386858
## [46,]    1 1.2227879
## [47,]    1 3.2276325
## [48,]    1 1.9419757
## [49,]    1 1.3299870
## [50,]    1 3.2073976
## [51,]    1 2.3207269
## [52,]    1 1.6711524
## [53,]    1 3.6753437
## [54,]    1 3.4790127
## [55,]    1 3.8326522
## [56,]    1 0.5734602
## [57,]    1 2.1245120
## [58,]    1 4.2415805
## [59,]    1 3.0027591
## [60,]    1 2.2887515

Create the Y matrix

Code

y_mat <- health %>% # start with the original data frame
  select(sr_health) %>% # select just the Y variable
  as.matrix() %>% # coerce to a matrix
  unname() # get rid of dimnames 

print(y_mat)

Output

##           [,1]
##  [1,] 1.909508
##  [2,] 4.493539
##  [3,] 3.403894
##  [4,] 2.045359
##  [5,] 4.210038
##  [6,] 5.111872
##  [7,] 3.089556
##  [8,] 4.575580
##  [9,] 3.902935
## [10,] 2.637214
## [11,] 3.600213
## [12,] 2.941156
## [13,] 2.630240
## [14,] 3.696171
## [15,] 5.232125
## [16,] 3.083821
## [17,] 2.415286
## [18,] 4.424573
## [19,] 3.652712
## [20,] 2.914556
## [21,] 2.274012
## [22,] 2.275091
## [23,] 2.502707
## [24,] 1.651588
## [25,] 1.633890
## [26,] 2.171920
## [27,] 3.128065
## [28,] 3.782407
## [29,] 2.963693
## [30,] 3.728316
## [31,] 1.550077
## [32,] 4.927635
## [33,] 3.764443
## [34,] 4.631044
## [35,] 3.363912
## [36,] 2.860350
## [37,] 1.787703
## [38,] 1.575609
## [39,] 2.998598
## [40,] 4.292776
## [41,] 3.639892
## [42,] 2.314446
## [43,] 3.126662
## [44,] 1.658845
## [45,] 2.337766
## [46,] 1.944531
## [47,] 3.615128
## [48,] 1.902425
## [49,] 2.946371
## [50,] 2.833848
## [51,] 3.990127
## [52,] 1.139102
## [53,] 3.384039
## [54,] 3.460063
## [55,] 3.267850
## [56,] 2.623867
## [57,] 1.982783
## [58,] 3.415394
## [59,] 4.124655
## [60,] 1.658569

Apply the matrix algebra formula to solve for the B matrix

\[\mathbf{B} = (\mathbf{X'X})^{-1} \mathbf{X'Y}\]

Code

solve(t(x_mat) %*% x_mat) %*% (t(x_mat) %*% y_mat)

Output

##           [,1]
## [1,] 1.6569733
## [2,] 0.4904059

The b’s we just solved for using matrix algebra match the coefficients we get from lm()!

Code

model$coefficients

Output

## (Intercept)       consc 
##   1.6569733   0.4904059

The General Linear Model

We just saw that regression works “under the hood” by solving a matrix algebra equation to get the intercept and slope of the regression line. As Sara mentioned in class, this matrix algebra equation works for any type of data we have. This should clue us into the idea that there is some fundamental process going on behind the scenes of all of our models…
The general linear model (GLM) is a family of models that assume the relationship between your DV and IV(s) is linear and additive, and that your outcome is normally distributed. In its simplest form, we can think of the general linear model as:

\[Data = Model + Error \]

This provides a unifying framework for all of the statistical tests we have learned (or at least touched on) so far: t-tests, correlations, ANOVA, and linear regression. All of these tests are really doing the same math at the end of the day.
To illustrate this, we’ll turn back to an example from 611. The data are from Fox and Guyer’s (1978) anonymity and cooperation study. The data are included in the {carData} package, and you can see information about the data set using ?Guyer. Twenty groups of four participants each played 30 trials of the the prisoner’s dilemma game. The number of cooperative choices (cooperation) made by each group were scored out of 120 (i.e., cooperative choices made by 4 participants over 30 trials). The groups either made decisions publicly or privately (condition).
You can load in the data with the following code:

guyer <- carData::Guyer

Let’s look at the first few rows…

Code

head(guyer)

Output

…and the structure of the data

Code

str(guyer)

Output

## 'data.frame':    20 obs. of  3 variables:
##  $ cooperation: num  49 64 37 52 68 54 61 79 64 29 ...
##  $ condition  : Factor w/ 2 levels "anonymous","public": 2 2 2 2 2 2 2 2 2 2 ...
##  $ sex        : Factor w/ 2 levels "female","male": 2 2 2 2 2 1 1 1 1 1 ...

Covert the condition variable to 0's and 1's. We will talk more about dummy coding in a future lab.

Code

guyer <-  guyer %>% 
  mutate(condition = case_when(condition == "public" ~ 0,
                               condition == "anonymous" ~ 1))

guyer

Output

t-test

Code

t_test <- t.test(formula = cooperation ~ condition, data = guyer, var.equal = TRUE)
t_test

Output

## 
##  Two Sample t-test
## 
## data:  cooperation by condition
## t = 2.6615, df = 18, p-value = 0.0159
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.117245 26.482755
## sample estimates:
## mean in group 0 mean in group 1 
##            55.7            40.9

Question: What is the 95% CI around?

Correlation

Code

cor_test <- cor.test(formula = ~ cooperation + condition, data = guyer) # note the one-sided formula
cor_test

Output

## 
##  Pearson's product-moment correlation
## 
## data:  cooperation and condition
## t = -2.6615, df = 18, p-value = 0.0159
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7885070 -0.1162225
## sample estimates:
##        cor 
## -0.5314123

Question: What is the 95% CI around this time?

ANOVA

Code

anova_test <- aov(formula = cooperation ~ condition, data = guyer)
summary(anova_test)

Output

##             Df Sum Sq Mean Sq F value Pr(>F)  
## condition    1   1095  1095.2   7.084 0.0159 *
## Residuals   18   2783   154.6                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Question: What is the relationship between the F statistic from this ANOVA and the t statistic from our previous two tests?

Regression

Code

regression <- lm(formula = cooperation ~ condition, data = guyer)
summary(regression)

Output

## 
## Call:
## lm(formula = cooperation ~ condition, data = guyer)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -26.70  -6.75  -0.40   8.30  23.30 
## 
## Coefficients:
##             Estimate Std. Error t value        Pr(>|t|)    
## (Intercept)   55.700      3.932  14.166 0.0000000000335 ***
## condition    -14.800      5.561  -2.661          0.0159 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.43 on 18 degrees of freedom
## Multiple R-squared:  0.2824, Adjusted R-squared:  0.2425 
## F-statistic: 7.084 on 1 and 18 DF,  p-value: 0.0159

Question: What does the intercept represent? The slope?

Minihacks

Minihack 1: Matrix algebra

Continuing with the guyer dataset example, use matrix algebra to manually calculate the intercept and slope for the regression equation (cooperation ~ condition).
Hint: To find the inverse of a matrix, use the solve() function. Refer back to the lab on matrix algebra from 611 for a refresher on other matrix operations if needed.

Create the Matrix \(\mathbf{X}\) & Vector \(\mathbf{Y}\)

# your code here

Produce \(\mathbf{X'}\) (also called \(\mathbf{X^T}\) or the transpose of \(\mathbf{X}\))

# your code here

Produce \(\mathbf{X'X}^{-1}\) (inverse of \(\mathbf{X'X}\))

# your code here

Produce \(\mathbf{X'Y}\)

# your code here

Produce \(\mathbf{B = (X'X)}^{-1}\mathbf{X'Y}\)

# your code here

Confirm your results using the lm() function in R.

# your code here

Minihack 2: Confidence intervals

For this minihack, we will refer back to the example about conscientiousness and health. We used confint() to calculate the 95% CI for our regression coefficients. Your job is start with the model output, stored as a list object (see below), and extract the relevant pieces of information to calculate the 95% CI around the intercept and slope.

model <- lm(sr_health ~ consc, data = health)

Calculate confidence intervals “by hand”.

# your code here

Verify that your answer corresponds to the result from confint().

# your code here

Create a scatter plot representing the relationship between conscientiousness and self-rated health with a regression line and 95% confidence band.

# your code here

3a. Add a line whose intercept is the lower bound of the 95% CI for the model intercept and whose slope is the upper bound of the 95% CI for the model slope. Hint: see ?geom_abline()

# your code here

3b. Now add another line whose intercept is the upper bound of the 95% CI for the model intercept and whose slope is the lower bound of the 95% CI for the model slope.

# your code here

3d. What does this show you about the relationship between the 95% CI’s around the model coefficients and the 95% CI band around the regression line?

Lab 3: Univariate Regression (II) & GLM

Purpose

Visualizing uncertainty

Data and review

Code

Output

Confidence intervals

Code

Output

Confidence bands

Code

Output

Code

Output

Prediction

Code

Output

Code

Output

Prediction bands

Code

Output

Code

Output

Code

Output

Out-of-sample prediction

Code

Output

Code

Output

Other visualization tools

Code

Output

Code

Output

Regression with matrix algebra

Review

Example

Code

Output

Code

Output

Code

Output

Code

Output

The General Linear Model

Code

Output

Code

Output

Code

Output

t-test

Code

Output

Correlation

Code

Output

ANOVA

Code

Output

Regression

Code

Output

Minihacks

Minihack 1: Matrix algebra

Minihack 2: Confidence intervals