Today we will learn

Interaction Effects with two Continuous Variables
Prediction So Far
Meet the apply function
Simulation
- Expected Values
- First Differences
Predicted Values

In other words, the goals are to:

use simulation to make statements about uncertainty of predicted/expected values
calculate quantities of interest

This is a very important lab session. You can benefit a lot if you work through it step by step at home and read the King, Tomz, and Wittenberg (2000) article again.

But first we have another look at interaction effects.

1 Understanding Marginal Effects for Categorical and Continuous Moderators

So far, we have only looked at interactions between a continuous predictor and a categorical, binary, moderating variable. We can, however, easily generalize this idea to continuous-by-continuous interactions.

The data frame ia_data contains four variables:

an outcome variable y
a continuous predictor x
two versions of a moderating variable:
- the continuous z_con
- the binary/categorical z_cat

Below, we run two models of the form \(\hat{y} = \beta_1 + \beta_2 x + \beta_3 z + \beta_4 x z\), using the continuous and categorical moderators, respectively.

## Load data
load("raw-data/ia_data.RData")

## Model with continuous moderator
mod_con <- lm(y ~ x * z_con, data = ia_data)
summary(mod_con)

## 
## Call:
## lm(formula = y ~ x * z_con, data = ia_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.954  -7.148   0.085   7.086  35.538 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.8324     0.8355   3.390 0.000713 ***
## x             2.6908     0.3768   7.141 1.29e-12 ***
## z_con         0.8038     0.3252   2.472 0.013534 *  
## x:z_con       1.0975     0.1469   7.473 1.16e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.29 on 1996 degrees of freedom
## Multiple R-squared:  0.3169, Adjusted R-squared:  0.3158 
## F-statistic: 308.6 on 3 and 1996 DF,  p-value: < 2.2e-16

## Model with categorical moderator
mod_cat <- lm(y ~ x * z_cat, data = ia_data)
summary(mod_cat)

## 
## Call:
## lm(formula = y ~ x * z_cat, data = ia_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.984  -7.207   0.014   7.344  39.591 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.1945     0.7625   4.189 2.92e-05 ***
## x             3.6484     0.3423  10.658  < 2e-16 ***
## z_cat         2.3655     1.0494   2.254   0.0243 *  
## x:z_cat       2.5030     0.4725   5.298 1.30e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.73 on 1996 degrees of freedom
## Multiple R-squared:  0.2578, Adjusted R-squared:  0.2567 
## F-statistic: 231.1 on 3 and 1996 DF,  p-value: < 2.2e-16

1.1 Calculate linear predictions

Next, we want to calculate the linear prediction, \(\hat{y}\), as a function of \(x\) at different values of the moderating variable \(z\). (Please note that this does not mean that \(z\) is a mediator in a causal graph. Here the term moderating variable is used to say that the effect of \(x\) on \(y\) is influenced by \(z\).) When using z_cat, this is easy. As before, we calculate values for two lines: One when z_val_cat = 0 and one when z_val_cat = 1. When using the continuous moderator z_cont things are a bit more intricate. We need to select a range of values of z_cont at which we want to calculate the relationship between \(y\) and \(x\). Here, we use the following values:

minimum
\(1^{st}\) percentile
\(1^{st}\) decile (\(10^{th}\) percentile)
\(1^{st}\) quartile (\(25^{th}\) percentile)
the median (\(50^{th}\) percentile)
\(3^{rd}\) quartile (\(75^{th}\) percentile)
\(9^{th}\) decile (\(90^{th}\) percentile)
\(99^{th}\) percentile
maximum

## function for the linear prediction
pred_lm <- function(b, x, z) {
  # b - coefficients vector; x - covariate 1; z - covariate 2
  # empty matrix for predictions for every combination of x & z
  preds <- matrix(NA, length(x), length(z)) 
  for (i in seq_len(length(z))) {
    # calculate the predictions given x for each value number i in vector z
    # store predictions in column number i 
    preds[, i] <- b[1] + b[2] * x + b[3] * z[i] + b[4] * x * z[i]
  }
  return(preds)
}

## use the function for the continuous model
b_con <- coef(mod_con) # extract coefficients from lm output 
x_vals <- seq(min(ia_data$x), max(ia_data$x), length.out = 101) # set a sequence 
z_vals_con <- quantile(ia_data$z_con, # select the percentiles in z 
                       c(0, .01, .1, .25, .5, .75, .9, .99, 1))
preds_con <- pred_lm(b_con, x_vals, z_vals_con) # apply the function

## for the categorical model
b_cat <- coef(mod_cat) # extract coefficients from lm output
# choose two only possible values, in z - 0 and 1, its min and max values 
z_vals_cat <- c(min(ia_data$z_cat), max(ia_data$z_cat)) 
preds_cat <- pred_lm(b_cat, x_vals, z_vals_cat) # apply the function

1.2 Calculate marginal effects

Next, we want to calculate the marginal effect of each of the predicted lines. As we know from the lecture (Week 06, slide 27), if we have a regression equation of the form \(y = \beta_0 + \beta_1 x + \beta_2 z + \beta_3 x z + \epsilon\), then the marginal effect of \(x\) can be obtained by taking the partial derivative with respect to \(x\):

\[ \underbrace{\frac{\partial y}{\partial x}}_{\text{Notation for partial derivative}} = \underbrace{\beta_1 + \beta_3 z}_{\text{Marginal effect of x}} \]

## function for the marginal effect
mfx_lm <- function (b, z) {
  # this assumes that main term for x is coefficient 2 
  # and interaction term is in position 4
  b[2] + b[4] * z
}

## use the function for the continuous model
mfx_con <- mfx_lm(b_con, z_vals_con)

## use the function for the continuous model
mfx_cat <- mfx_lm(b_cat, z_vals_cat)

1.3 Calculate standard errors of estimated marginal effects

Lastly, we also want to calculate the standard errors for our estimated marginal effects of \(x\). Per the lecture slides (Week 06, slide 27), we know the formula is:

\[ \hat{\sigma}_{\frac{\partial y}{\partial x}} = \sqrt{\text{Var}(\hat{\beta_1}) + z^2 \text{Var}(\hat{\beta_3}) + 2 z \text{Cov}(\hat{\beta_1}, \hat{\beta_3})} \]

We can obtain both the variances and covariances for our regression coefficients from the variance-covariance matrix. Here, it is a symmetric matrix that contains the information about the uncertainty in our regression coefficients.

This is what a variance-covariance matrix looks like if there are four coefficients in the model (like in our case):

\[ \text{Var} \begin{pmatrix} \hat\beta_0 \\ \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \\ \end{pmatrix} = \begin{pmatrix} \text{Var}(\hat\beta_0) & \text{Cov}(\hat\beta_0,\hat\beta_1) & \text{Cov}(\hat\beta_0,\hat\beta_2) & \text{Cov}(\hat\beta_0,\hat\beta_3) \\\text{Cov}(\hat\beta_1,\hat\beta_0) & \text{Var}(\hat\beta_1) & \text{Cov}(\hat\beta_1,\hat\beta_2) & \text{Cov}(\hat\beta_1,\hat\beta_3) \\\text{Cov}(\hat\beta_2,\hat\beta_0) & \text{Cov}(\hat\beta_2,\hat\beta_1) & \text{Var}(\hat\beta_2) & \text{Cov}(\hat\beta_2,\hat\beta_3) &\\\text{Cov}(\hat\beta_3,\hat\beta_0) & \text{Cov}(\hat\beta_3,\hat\beta_1) & \text{Cov}(\hat\beta_3,\hat\beta_2) & \text{Var}(\hat\beta_3) \end{pmatrix} \]

## function for the standard error of the marginal effect
se_mfx <- function(vcov, z) {
  sqrt(vcov[2, 2] + z^2 * vcov[4, 4] + 2 * z * vcov[4, 2]) 
}

## use the function for the continuous model
vcov_con <- vcov(mod_con) # get the variance-covariance matrix 
se_mfx_con <- se_mfx(vcov_con, z_vals_con)

## use the function for the continuous model
vcov_cat <- vcov(mod_cat) # get the variance-covariance matrix 
se_mfx_cat <- se_mfx(vcov_cat, z_vals_cat)

1.4 Plot the results

We have stored objects for:

the linear predictions of \(y\) as a function of \(x\) at the specified values of \(z\).
the marginal effects \(\frac{\partial y}{\partial x}\) at the specified values of \(z\).
and the corresponding standard errors \(\hat{\sigma}_{\frac{\partial y}{\partial x}}\).

We can now plot the linear predictions and the marginal effects (with confidence intervals) for both scenarios of categorical and continuous moderation side-by-side:

## set up a 2-by-2 plot (and adjust plot margins)
par(
  mfrow = c(2, 2)
  # mar = c(5.1, 6.1, 4.1, 2.1)
)

## Plot 1: Linear Prediction (Categorical)
col_vec <- viridis(length(se_mfx_cat))
plot(
  x = ia_data$x,
  y = ia_data$y,
  pch = 16,
  xlab = "x",
  ylab = "y",
  type = "n",
  bty = "n",
  las = 1,
  main = 'Linear Prediction (Categorical)'
)

text(2,73, TeX(r'($\widehat{y} = 3.19 + 3.65x + 2.37z + 2.5x \times z$)'), 
     xpd = TRUE
     )

for (i in seq_len(ncol(preds_cat))) {
  lines(
    x = x_vals,
    y = preds_cat[, i],
    lty = i,
    lwd = 3,
    col = col_vec[i]
  )
}

## Plot 2: Marginal Effect (Categorical)
plot(
  x = z_vals_cat,
  y = mfx_cat,
  pch = 16,
  xlab = "z",
  ylab = "", # added manually later 
  type = "n",
  bty = "n",
  las = 1,
  main = "Marginal Effect (Categorical)",
  xlim = c(-1, 2),
  ylim = c(-2, 12),
  axes = F,
)
text(0.5, 13, TeX(r'($\widehat{y} = 3.19 + 3.65x + 2.37z + 2.5x \times z$)'), 
     xpd = TRUE
)
abline(h = 0, col = 'gray60', lwd = .5)
axis(1,
     at = z_vals_cat)
axis(2, las = 1)

# add the label for y-axis separately, horizontally 
text(bquote(frac(partialdiff ~ y, partialdiff ~ x)), xpd = TRUE, x = -1.9, y = 6)

for (i in seq_len(length(se_mfx_cat))) {
  # points
  points(
    x = z_vals_cat[i],
    y = mfx_cat[i],
    pch = 16,
    col = col_vec[i])
  
  # confidence intervals
  segments(
    x0 = z_vals_cat[i],
    y0 = mfx_cat[i] + qnorm(.025) * se_mfx_cat[i],
    x1 = z_vals_cat[i],
    y1 = mfx_cat[i] + qnorm(.975) * se_mfx_cat[i],
    col = col_vec[i],
    lwd = 3
  )
}

## Plot 3: Linear Prediction (Continuous)
col_vec <- viridis(length(se_mfx_con))
plot(
  x = ia_data$x,
  y = ia_data$y,
  pch = 16,
  xlab = "x",
  ylab = "y",
  type = "n",
  bty = "n",
  las = 1,
  main = "Linear Prediction (Continuous)",
)

text(2, 70, TeX(r'($\widehat{y} = 2.83 + 2.69x + 0.8z + 1.1x \times z$)'), 
     xpd = TRUE
)

for (i in seq_len(ncol(preds_con))) {
  lines(
    x = x_vals,
    y = preds_con[, i],
    lwd = 2,
    col = col_vec[i]
  )
}

## Plot 4: Marginal Effect (Continuous)
plot (
  x = z_vals_con,
  y = mfx_con,
  pch = 16,
  xlab = "z",
  ylab = "",
  type = "n",
  bty = "n",
  axes = F,
  main = "Marginal Effect (Continuous)",
  xlim = c(-4, 8),
  ylim = c(-3, 13),
)

text(2, 14, TeX(r'($\widehat{y} = 2.83 + 2.69x + 0.8z + 1.1x \times z$)'), 
     xpd = TRUE
)

axis(1)
axis(2,
     las = 1,
     at = seq(-2,12, by = 2))

abline(h = 0, col = 'gray60', lwd = .5)

text(bquote(frac(partialdiff ~ y, partialdiff ~ x)), xpd = TRUE, x = -7, y = 6)

for (i in seq_len(length(se_mfx_con))) {
  points(
    x = z_vals_con[i],
    y = mfx_con[i],
    pch = 16,
    col = col_vec[i]
  )

  segments(
    z_vals_con[i],
    mfx_con[i] + qnorm(.025) * se_mfx_con[i],
    z_vals_con[i],
    mfx_con[i] + qnorm(.975) * se_mfx_con[i],
    col = col_vec[i],
    lwd = 3
  )
}

## Add contiguous lines for mfx and se's to the last plot
## Compute first...
z_vals_fine <-
  seq(min(ia_data$z_con), max(ia_data$z_con), length.out = 200)
mfx_fine <- mfx_lm(b_con, z_vals_fine)
se_mfx_fine <- se_mfx(vcov_con, z_vals_fine)


## ... then plot
lines(z_vals_fine, mfx_fine, col = adjustcolor("black", alpha = 0.5))
# add lower bound
lines(
  z_vals_fine,
  mfx_fine + qnorm(.025) * se_mfx_fine,
  lty = 2,
  col = adjustcolor("black", alpha = 0.5)
)
# add upper bound 
lines(
  z_vals_fine, # x-values 
  mfx_fine + qnorm(.975) * se_mfx_fine, # y-values 
  lty = 2, # dotted line (line type = 2)
  col = adjustcolor("black", alpha = 0.5)
)

Review: Prediction So Far

We use the German 2013 election data set again:

load(file = "raw-data/election2013_2.RData")

df <- as.data.frame(election2013_2)

Similarly to the homework, we regress leftvote on unemployment and east. Additionally, we include a multiplicative interaction term unemployment*east.

reg <- lm(leftvote ~ unemployment + east + unemployment*east, data = df)
knitr::kable(tidy(reg), digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	2.261	0.308	7.338	0.000
unemployment	0.549	0.047	11.576	0.000
east	16.102	1.438	11.197	0.000
unemployment:east	-0.137	0.144	-0.947	0.344

So now let’s review: how did we make predictions so far?

# 1. Get the coefficients.
intercept <- coef(reg)[1] # or tidy(reg)$estimate[1] or reg$coefficients[1]
slopes <- coef(reg)[2:4] # or tidy(reg)$estimate[2:4] or reg$coefficients[2:4]

# 2. Choose interesting covariates' values.
x_east0 <- 0
x_east1 <- 1
x_unemp <- seq(0, 20, 0.1) # some empirically meaningful values for unemployment

# 3.1. Write your predict function.
predict_mu <- function(intercept, slopes, x1, x2) {
  intercept + slopes[1] * x1 + slopes[2] * x2 + slopes[3] * x1 * x2
}

#3.2. Let the function do the work for you.
pred_leftvote_west <- predict_mu(intercept, slopes, x1 = x_unemp, x2 = x_east0)
pred_leftvote_east <- predict_mu(intercept, slopes, x1 = x_unemp, x2 = x_east1)

Base R

# 4. Plot it.
# 4.1. Plot the observations.
plot(
  x = x_unemp,
  y = pred_leftvote_west,
  type = "n",
  bty = "n",
  las = 1,
  lwd = 2,
  ylim = c(0, 30),
  ylab = "Predicted Voteshare (Die Linke) in %",
  xlab = "Unemployment in %",
  main = "Left Voteshare and Unemployment"
)

points(df$unemployment,
       df$leftvote ,
       pch = 19,
       col = ifelse(df$east == 1, 
                    viridis(2, alpha = 0.3)[1], # color for east 
                    viridis(2, alpha = 0.3)[2]) # color for west 
       )

# 4.2. Add the lines on top.
lines(
  x = x_unemp,
  y = pred_leftvote_west,
  lwd = 2,
  col = viridis(2)[2]
)
lines(
  x = x_unemp,
  y = pred_leftvote_east,
  lwd = 2,
  col = viridis(2)[1]
)

# What is missing?

ggplot2

ggplot(
  data = df,
  aes(x = unemployment, y = leftvote, group = as.character(east))
) +
  geom_point(
    aes(color = east, alpha = 0.5)
  ) +
  geom_line(mapping = aes(y = predict(reg), 
                          color = east)) +
  theme_classic() +
  labs(
    x = "Unemployment in %",
    y = "Predicted Voteshare (Die Linke) in %",
    color = "",
    title = "Left Voteshare and Unemployment"
  ) +
  theme(legend.position = "none")

2 Meet the apply function

Before we start with our simulations - meet the apply function.

Description: “Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.”

vars <- df[, 2:4]

vars

##     leftvote unemployment east
## 1       5.70          8.6    0
## 2       4.40          7.5    0
## 3       5.00          6.2    0
## 4       4.40          5.4    0
## 5       6.90          9.3    0
## 6       4.80          7.5    0
## 7       5.00          5.2    0
## 8       5.00          4.6    0
## 9       4.40          7.0    0
## 10      4.90          5.2    0
## 11      6.50          9.3    0
## 12     20.40          9.9    1
## 13     21.30          9.5    1
## 14     23.70         11.1    1
## 15     20.60         14.9    1
## 16     21.60         14.0    1
## 17     21.50         12.1    1
## 18     11.00          7.1    0
## 19     10.70          7.1    0
## 20      8.60          7.1    0
## 21      6.50          7.1    0
## 22      7.60          7.1    0
## 23      8.40          7.1    0
## 24      4.96          8.6    0
## 25      4.10          5.1    0
## 26      5.10          8.9    0
## 27      6.50          6.4    0
## 28      5.50          6.8    0
## 29      4.60          6.2    0
## 30      4.20          5.5    0
## 31      3.00          4.1    0
## 32      2.80          4.6    0
## 33      4.50          4.7    0
## 34      5.60          4.9    0
## 35      4.60          5.9    0
## 36      4.40          4.7    0
## 37      6.90          7.1    0
## 38      4.00          3.7    0
## 39      5.10          6.2    0
## 40      4.30          6.5    0
## 41      6.20          8.0    0
## 42      8.00          8.0    0
## 43      4.50          8.0    0
## 44      4.50          7.3    0
## 45      4.70          5.2    0
## 46      5.30          7.9    0
## 47      4.80          8.0    0
## 48      4.98          7.6    0
## 49      5.60          7.6    0
## 50      6.70          7.1    0
## 51      4.90          5.8    0
## 52      5.10          7.9    0
## 53      6.30          6.2    0
## 54     10.10         10.2    0
## 55     10.00         11.6    0
## 56     22.50         11.0    1
## 57     23.80         13.8    1
## 58     18.40          8.7    1
## 59     26.30          9.8    1
## 60     22.90          9.4    1
## 61     20.70          7.1    1
## 62     21.80          7.6    1
## 63     24.70          9.9    1
## 64     22.60         11.0    1
## 65     21.70         13.3    1
## 66     24.60         12.2    1
## 67     21.70          9.1    1
## 68     22.90          9.7    1
## 69     24.00         11.3    1
## 70     22.80         11.1    1
## 71     25.60         11.9    1
## 72     24.40         11.4    1
## 73     23.70         12.1    1
## 74     25.70         12.0    1
## 75     18.70         11.6    0
## 76     25.20         11.6    1
## 77      8.00         11.6    0
## 78      9.50         11.6    0
## 79      7.20         11.6    0
## 80      8.90         11.6    0
## 81     10.30         11.6    0
## 82     14.30         11.6    1
## 83     25.10         11.6    1
## 84     29.50         11.6    1
## 85     32.90         11.6    1
## 86     34.60         11.6    1
## 87      7.70          8.4    0
## 88      6.50          8.4    0
## 89      5.20          7.0    0
## 90      5.70          7.8    0
## 91      5.20          7.4    0
## 92      5.20          6.6    0
## 93      8.10          9.0    0
## 94      6.90          9.0    0
## 95      9.20          9.0    0
## 96      6.40          6.7    0
## 97      5.60          5.5    0
## 98      4.60          5.5    0
## 99      5.40          6.0    0
## 100     5.10          6.2    0
## 101     7.20          8.4    0
## 102     8.70         11.9    0
## 103     6.80          9.4    0
## 104     5.10          6.8    0
## 105     5.50          6.8    0
## 106     6.40          8.4    0
## 107     7.80          8.4    0
## 108     4.97          5.9    0
## 109     6.40         10.6    0
## 110     5.20          7.8    0
## 111     5.40          7.0    0
## 112     4.60          6.2    0
## 113     6.00          6.7    0
## 114     6.60          8.4    0
## 115     7.90         12.3    0
## 116     8.80         12.3    0
## 117     8.00         10.5    0
## 118     6.40          9.2    0
## 119     8.10         12.3    0
## 120     6.60         12.3    0
## 121     6.80         10.7    0
## 122     6.30         10.7    0
## 123     7.60         13.5    0
## 124     4.40          4.4    0
## 125     6.20          9.8    0
## 126     3.60          4.1    0
## 127     4.20          3.2    0
## 128     4.70          4.5    0
## 129     6.30          5.9    0
## 130     4.40          5.7    0
## 131     4.98          4.9    0
## 132     8.40          8.9    0
## 133     5.80          5.9    0
## 134     4.90          5.6    0
## 135     5.10          7.6    0
## 136     5.10          6.1    0
## 137     5.20          6.0    0
## 138     6.70          9.1    0
## 139     6.40          7.2    0
## 140     7.90          9.6    0
## 141     8.00         11.8    0
## 142     7.90         12.6    0
## 143     7.80         12.6    0
## 144     6.20          9.0    0
## 145     6.30          9.9    0
## 146     5.20          6.1    0
## 147     4.70          5.1    0
## 148     5.80          5.4    0
## 149     4.80          5.7    0
## 150     5.90          6.9    0
## 151    20.60         10.3    1
## 152    21.30         10.8    1
## 153    22.60         10.8    1
## 154    19.90          9.2    1
## 155    18.70          9.0    1
## 156    19.90          9.4    1
## 157    19.20         12.4    1
## 158    17.10          8.8    1
## 159    19.00          8.8    1
## 160    18.10          8.9    1
## 161    20.60          8.6    1
## 162    23.00         10.1    1
## 163    20.00          8.6    1
## 164    19.40          8.8    1
## 165    21.30          8.3    1
## 166    20.20          8.4    1
## 167     5.50          4.8    0
## 168     8.70          8.1    0
## 169     6.00          5.3    0
## 170     5.40          4.8    0
## 171     6.80          4.4    0
## 172     5.20          5.9    0
## 173     6.50          6.3    0
## 174     4.60          3.6    0
## 175     5.50          4.9    0
## 176     4.60          4.3    0
## 177     5.20          4.9    0
## 178     4.50          4.7    0
## 179     5.90          7.2    0
## 180     6.10          4.9    0
## 181     4.20          4.2    0
## 182     8.90          7.2    0
## 183     8.10          7.2    0
## 184     6.20          5.8    0
## 185     6.60          6.9    0
## 186     6.70          5.2    0
## 187     5.50          5.1    0
## 188     4.90          4.6    0
## 189    19.90          8.6    1
## 190    22.50          7.7    1
## 191    24.50          9.5    1
## 192    22.10          8.0    1
## 193    23.00          8.8    1
## 194    25.60          8.4    1
## 195    23.00         10.1    1
## 196    24.80          6.9    1
## 197    25.00          6.2    1
## 198     5.20          5.5    0
## 199     4.70          4.4    0
## 200     5.20          5.2    0
## 201     4.80          4.2    0
## 202     5.80          6.6    0
## 203     4.40          3.7    0
## 204     6.30          4.2    0
## 205     5.10          3.7    0
## 206     5.50          5.0    0
## 207     5.10          5.4    0
## 208     5.60          6.5    0
## 209     4.80          4.5    0
## 210     7.60          7.0    0
## 211     6.50          6.5    0
## 212     4.80          4.2    0
## 213     2.90          3.7    0
## 214     2.90          2.0    0
## 215     3.30          2.2    0
## 216     3.00          2.7    0
## 217     3.60          2.2    0
## 218     4.50          4.9    0
## 219     4.20          4.9    0
## 220     4.80          4.9    0
## 221     4.80          4.9    0
## 222     2.80          3.0    0
## 223     2.90          3.1    0
## 224     2.60          2.8    0
## 225     3.00          3.4    0
## 226     3.10          3.1    0
## 227     3.30          3.4    0
## 228     3.00          3.0    0
## 229     3.80          4.3    0
## 230     2.70          2.9    0
## 231     3.00          3.9    0
## 232     3.40          3.1    0
## 233     3.90          3.1    0
## 234     3.20          3.1    0
## 235     3.40          4.9    0
## 236     4.10          3.6    0
## 237     3.30          4.3    0
## 238     4.00          4.4    0
## 239     4.80          5.0    0
## 240     3.60          3.7    0
## 241     3.90          3.0    0
## 242     4.40          3.1    0
## 243     4.98          4.1    0
## 244     7.10          7.6    0
## 245     5.80          7.1    0
## 246     3.80          2.8    0
## 247     3.60          3.7    0
## 248     4.50          3.6    0
## 249     3.40          2.9    0
## 250     4.96          3.5    0
## 251     4.00          3.5    0
## 252     5.60          5.6    0
## 253     3.20          2.5    0
## 254     3.30          2.2    0
## 255     3.70          3.1    0
## 256     3.60          3.7    0
## 257     3.20          3.1    0
## 258     6.20          5.9    0
## 259     6.70          5.9    0
## 260     4.40          3.5    0
## 261     4.60          3.4    0
## 262     4.10          3.4    0
## 263     4.20          4.1    0
## 264     4.50          3.8    0
## 265     4.70          3.8    0
## 266     4.10          3.7    0
## 267     4.70          4.5    0
## 268     4.80          3.2    0
## 269     4.20          3.6    0
## 270     4.70          4.2    0
## 271     6.00          5.4    0
## 272     4.10          3.1    0
## 273     4.10          3.7    0
## 274     5.70          4.5    0
## 275     7.50          5.9    0
## 276     4.20          3.5    0
## 277     4.80          4.1    0
## 278     4.40          3.5    0
## 279     4.60          4.3    0
## 280     4.20          3.9    0
## 281     7.90          5.1    0
## 282     4.80          3.5    0
## 283     4.60          3.3    0
## 284     4.70          3.7    0
## 285     4.00          3.1    0
## 286     4.10          3.8    0
## 287     5.00          4.3    0
## 288     4.30          3.1    0
## 289     4.80          3.8    0
## 290     6.60          3.4    0
## 291     4.30          3.8    0
## 292     3.40          2.7    0
## 293     4.40          3.2    0
## 294     4.30          2.8    0
## 295     4.10          4.0    0
## 296    11.70          9.0    0
## 297     9.00          5.6    0
## 298     9.00          5.8    0
## 299    10.20          6.7    0

What are margins?

Type 1 for rows, 2 for columns, (1:2) for both

General usage: apply(X, MARGIN, FUN, ...)

X is the object
MARGIN is dimension number (1 = rows, 2 = columns)
FUN is the function to apply
... are additional arguments to the functions

2.1 Basic Examples

Let’s start with column mean:

means <- apply(vars, 2, mean)
means

##     leftvote unemployment         east 
##     8.730870     6.768896     0.187291

Use apply to plot…

par(mfrow = c(1, 3))

apply(vars, 2, hist, col = viridis(1), border = "white", las = 1)

## $leftvote
## $breaks
## [1]  0  5 10 15 20 25 30 35
## 
## $counts
## [1] 120 116   7  12  35   7   2
## 
## $density
## [1] 0.080267559 0.077591973 0.004682274 0.008026756 0.023411371 0.004682274
## [7] 0.001337793
## 
## $mids
## [1]  2.5  7.5 12.5 17.5 22.5 27.5 32.5
## 
## $xname
## [1] "newX[, i]"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"
## 
## $unemployment
## $breaks
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15
## 
## $counts
##  [1] 15 48 43 36 29 33 29 18 12 21 10  4  1
## 
## $density
##  [1] 0.050167224 0.160535117 0.143812709 0.120401338 0.096989967 0.110367893
##  [7] 0.096989967 0.060200669 0.040133779 0.070234114 0.033444816 0.013377926
## [13] 0.003344482
## 
## $mids
##  [1]  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5 13.5 14.5
## 
## $xname
## [1] "newX[, i]"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"
## 
## $east
## $breaks
##  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
## 
## $counts
##  [1] 243   0   0   0   0   0   0   0   0  56
## 
## $density
##  [1] 8.12709 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## [10] 1.87291
## 
## $mids
##  [1] 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
## 
## $xname
## [1] "newX[, i]"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"

# How about row sums?
sums <- apply(vars, 1, sum)
sums

##   [1] 14.30 11.90 11.20  9.80 16.20 12.30 10.20  9.60 11.40 10.10 15.80 31.30
##  [13] 31.80 35.80 36.50 36.60 34.60 18.10 17.80 15.70 13.60 14.70 15.50 13.56
##  [25]  9.20 14.00 12.90 12.30 10.80  9.70  7.10  7.40  9.20 10.50 10.50  9.10
##  [37] 14.00  7.70 11.30 10.80 14.20 16.00 12.50 11.80  9.90 13.20 12.80 12.58
##  [49] 13.20 13.80 10.70 13.00 12.50 20.30 21.60 34.50 38.60 28.10 37.10 33.30
##  [61] 28.80 30.40 35.60 34.60 36.00 37.80 31.80 33.60 36.30 34.90 38.50 36.80
##  [73] 36.80 38.70 30.30 37.80 19.60 21.10 18.80 20.50 21.90 26.90 37.70 42.10
##  [85] 45.50 47.20 16.10 14.90 12.20 13.50 12.60 11.80 17.10 15.90 18.20 13.10
##  [97] 11.10 10.10 11.40 11.30 15.60 20.60 16.20 11.90 12.30 14.80 16.20 10.87
## [109] 17.00 13.00 12.40 10.80 12.70 15.00 20.20 21.10 18.50 15.60 20.40 18.90
## [121] 17.50 17.00 21.10  8.80 16.00  7.70  7.40  9.20 12.20 10.10  9.88 17.30
## [133] 11.70 10.50 12.70 11.20 11.20 15.80 13.60 17.50 19.80 20.50 20.40 15.20
## [145] 16.20 11.30  9.80 11.20 10.50 12.80 31.90 33.10 34.40 30.10 28.70 30.30
## [157] 32.60 26.90 28.80 28.00 30.20 34.10 29.60 29.20 30.60 29.60 10.30 16.80
## [169] 11.30 10.20 11.20 11.10 12.80  8.20 10.40  8.90 10.10  9.20 13.10 11.00
## [181]  8.40 16.10 15.30 12.00 13.50 11.90 10.60  9.50 29.50 31.20 35.00 31.10
## [193] 32.80 35.00 34.10 32.70 32.20 10.70  9.10 10.40  9.00 12.40  8.10 10.50
## [205]  8.80 10.50 10.50 12.10  9.30 14.60 13.00  9.00  6.60  4.90  5.50  5.70
## [217]  5.80  9.40  9.10  9.70  9.70  5.80  6.00  5.40  6.40  6.20  6.70  6.00
## [229]  8.10  5.60  6.90  6.50  7.00  6.30  8.30  7.70  7.60  8.40  9.80  7.30
## [241]  6.90  7.50  9.08 14.70 12.90  6.60  7.30  8.10  6.30  8.46  7.50 11.20
## [253]  5.70  5.50  6.80  7.30  6.30 12.10 12.60  7.90  8.00  7.50  8.30  8.30
## [265]  8.50  7.80  9.20  8.00  7.80  8.90 11.40  7.20  7.80 10.20 13.40  7.70
## [277]  8.90  7.90  8.90  8.10 13.00  8.30  7.90  8.40  7.10  7.90  9.30  7.40
## [289]  8.60 10.00  8.10  6.10  7.60  7.10  8.10 20.70 14.60 14.80 16.90

# Combining functions in a function.
multi_fun <- function(x) {
  c(min = min(x),
    mean = mean(x),
    max = max(x))
}

# And then use the multi_fun with apply.
apply(vars, 2, multi_fun)

##      leftvote unemployment     east
## min   2.60000     2.000000 0.000000
## mean  8.73087     6.768896 0.187291
## max  34.60000    14.900000 1.000000

2.2 Working with the apply function

We want to quickly calculate the 2.5%, 50% and 97.5% quantiles of the variables in the data set using the apply function. Just as with the plots above, we can specify additional arguments to the function call in the apply function directly. Here: probs = c(0.025, 0.5, 0.975).
Now combine some functions to get the 2.5% and 97.5% quantiles as well as the mean. We call this function quants_mean_fun:

quants <- apply(vars, 2, quantile, probs = c(0.025, 0.5, 0.975))
quants

##       leftvote unemployment east
## 2.5%     3.000        2.800    0
## 50%      5.600        6.200    0
## 97.5%   25.155       12.355    1

# Now combine some functions to get the 2.5 % and 97.5 % quantiles and the mean
quants_mean_fun <-  function(x) {
  c(quants = quantile(x, probs = c(0.025, 0.975)),
    mean = mean(x))
}

quants_mean <- apply(vars, 2, quants_mean_fun)
quants_mean

##              leftvote unemployment     east
## quants.2.5%   3.00000     2.800000 0.000000
## quants.97.5% 25.15500    12.355000 1.000000
## mean          8.73087     6.768896 0.187291

Become familiar with apply and your R code will be short and fast!

3 Simulation of Expected Values \(E[Y|X]\)

Now we get started with simulations. I hope you all read the King, Tomz, and Wittenberg (2000) article. And you remember the five steps of simulation from the lecture.

Our first goal is to get so-called expected values, \(E[Y|X]\). Expected values are the average (expected) value of a variable \(Y\), conditional on a particular set of \(X\) values. For example, we could be interested in the expected vote share of the Party Die Linke in a West German district with an unemployment rate of \(6.7\%\) (this amounts to the nationwide average unemployment rate). In mathematical terms, this would be \(E[\text{Leftvote} | \text{West}, \text{Unempl.} = 0.067]\).

Let’s do this:

Step 1 - Get the regression coefficients

beta_hat <- coef(reg)
beta_hat
##       (Intercept)      unemployment              east unemployment:east 
##         2.2614353         0.5485919        16.1021799        -0.1365027

Step 2 - Generate sampling distribution

Step 2.1. Get the variance-covariance matrix

V_hat <- vcov(reg) 
V_hat
##                   (Intercept) unemployment        east unemployment:east
## (Intercept)        0.09496955 -0.013448060 -0.09496955       0.013448060
## unemployment      -0.01344806  0.002245965  0.01344806      -0.002245965
## east              -0.09496955  0.013448060  2.06804468      -0.201497864
## unemployment:east  0.01344806 -0.002245965 -0.20149786       0.020756756

# What are the diagonal elements?
sqrt(diag(V_hat))
##       (Intercept)      unemployment              east unemployment:east 
##        0.30817130        0.04739161        1.43806978        0.14407205

Step 2.2. Draw from the multivariate normal distribution

Similar to drawing from a univariate normal distribution with rnorm(n_draws, mu, sigma), we can take random draws from a multivariate normal distribution. We will use MASS package for this purpose and in particular, it’s function mvrnorm. The syntax is the same as for rnorm, with the only difference that we feed the function not single values for mean and variance but rather a vector with means and variance-covariance matrix.

library(MASS) # We need the MASS package for taking random draws from MVN

# Set the number of draws/simulations
nsim <- 1000 

# Draw from the multivariate normal distribution to get S, sampling distribution
S <- mvrnorm(nsim, beta_hat, V_hat)

dim(S) # check dimensions
## [1] 1000    4
apply(S, 2, mean) # check the means
##       (Intercept)      unemployment              east unemployment:east 
##         2.2737164         0.5477618        16.0844551        -0.1341239

# We now can use S to get both expected and predicted values.

Step 3 - Choose interesting covariate values (aka set a scenario)

Next, we choose the values for explanatory variables for our simulations, i.e. a scenario. It can be any combination of values, but it makes sense to stick to reasonable values for the population and sample.

Tip: check the ordering of coefficients first, to put the values in the correct order.

names(beta_hat)
## [1] "(Intercept)"       "unemployment"      "east"             
## [4] "unemployment:east"
X_east <- c(1, mean(df$unemployment), 1, mean(df$unemployment) * 1) # East
X_east
## [1] 1.000000 6.768896 1.000000 6.768896
X_west <- c(1, mean(df$unemployment), 0, mean(df$unemployment) * 0) # West
X_west
## [1] 1.000000 6.768896 0.000000 0.000000

Step 4 - Calculate Quantities of Interest

This is what we are doing at this step:

\[ \underbrace{\left[\begin{array}{rrrr} \tilde{\beta}_0^1 & \tilde{\beta}_1^1 & \tilde{\beta}_2^1 & \tilde{\beta}_3^1 \\\tilde{\beta}_0^2 &\tilde{\beta}_1^2 &\tilde{\beta}_2^2 &\tilde{\beta}_3^2 \\ \dots & \dots & \dots & \dots \\ \tilde{\beta}_0^{1000} &\tilde{\beta}_1^{1000} &\tilde{\beta}_2^{1000} &\tilde{\beta}_3^{1000} \\ \end {array} \right]}_{\qquad \text{draws from simulated}\\\text{sampling distribution } S, \; 1000\times4} \times \overbrace{\underbrace{[1, x_1, x_2, x_3]}_{\text{scenario as a row } 1\times4}'}^{\text{scenario as a column }4\times1} = \underbrace{\left[ \begin{array}{c} \tilde{\beta}_0^1 \times 1+\tilde{\beta}_1^1 \times x_1 +\tilde{\beta}_2^1 \times x_2 + \tilde{\beta}_3^1 \times x_3 = \tilde{y}^1\\ \tilde{\beta}_0^2 \times 1+\tilde{\beta}_1^2 \times x_1 +\tilde{\beta}_2^2 \times x_2 + \tilde{\beta}_3^2 \times x_3 = \tilde{y}^2\\ \dots \\ \tilde{\beta}_0^{1000} \times 1+\tilde{\beta}_1^{1000} \times x_1 +\tilde{\beta}_2^{1000} \times x_2 + \tilde{\beta}_3^{1000} \times x_3 = \tilde{y}^{1000}\\ \end{array} \right]}_{\text{Expected Values } 1000 \times 1} \]

Expected Values \(E[Y|X]\)

# %*% is the operator for matrix multiplication
EV_east <- S %*% as.matrix(X_east)  
dim(EV_east)
## [1] 1000    1

EV_west <- S %*% as.matrix(X_west)
dim(EV_west)
## [1] 1000    1

# Even quicker: we put the scenarios in a matrix.
X <- rbind(X_east, X_west) # rbind creates a matrix 
EV_combined <- S %*% t(X)

# Even quicker:
# X <- cbind(X_east, X_west) rbind() + t() = cbind()
# all(t(X) == cbind(X_east, X_west))
# EV_combined <- S %*% X

First Differences

A first difference is the difference between two expected values:

\[ \underbrace{FD}_{\text{First Difference}} = \underbrace{E[Y | X_{1}]}_{\text{Expected Value}\\\text{of first scenario}} - \underbrace{E[Y | X_{2}]}_{\text{Expected Value}\\\text{of second scenario}} \]

\[ \underbrace{FDs}_{\text{First Differences}\\1000\times1} = \underbrace{E[Y | X_{1}]}_{\text{Expected Values}\\\text{of first scenario}\\1000\times1} - \underbrace{E[Y | X_{2}]}_{\text{Expected Values}\\\text{of second scenario}\\1000\times1} = \underbrace{\left[ \begin{array}{c} \tilde{y}^{1,1}\\ \tilde{y}^{2,1}\\ \dots \\ \tilde{y}^{1000,1}\\ \end{array} \right]}_{\text{Expected Values}\\\text{Scenario 1 }} - \underbrace{\left[ \begin{array}{c} \tilde{y}^{1,2}\\ \tilde{y}^{2,2}\\ \dots \\ \tilde{y}^{1000,2}\\ \end{array} \right]}_{\text{Expected Values}\\\text{Scenario 2 }} = \underbrace{\left[ \begin{array}{c} \tilde{y}^{1,1} - \tilde{y}^{1,2}\\ \tilde{y}^{2,1} - \tilde{y}^{2,2}\\ \dots \\ \tilde{y}^{1000,1} - \tilde{y}^{1000,2}\\ \end{array} \right]}_{\text{First Differences}} \]

# EV[East] - EV[West]
fd <- EV_combined[, 1] - EV_combined[, 2]
dim(as.matrix(fd))

## [1] 1000    1

Step 5 - Summarize Results

Plot the expected values for West.

Base R

hist(
  x = EV_combined[, 2],
  las = 1,
  breaks = 7,
  ylim = c(0, 400), 
  col = viridis(4)[1],
  border = "white",
  main = "Distribution of Simulated Vote Shares of Die Linke\nwith 95% Confidence Intervals",
  xlab = "Expected Values for the Vote Share of Die Linke for districts in the West
  (with unemployment at its national mean)",
)

# Get mean and quantiles. We use our quants_mean_fun from above.
quants_combined <- apply(EV_combined, 2, quants_mean_fun)

# Add the lines to the plot.
abline(v = c(quants_combined[, 2]),
       lty = 2,
       lwd = 2,
       col = viridis(4)[4])

# Of course we can do the same for east.
hist(
  EV_combined[, 1],
  main = "Distribution of Simulated Vote Shares of Die Linke\nwith 95% Confidence Intervals",
  las = 1,
  breaks = 7,
  ylim = c(0, 400), 
  col = viridis(4)[2],
  border = "white",
  xlab = "Expected Values for the Vote Share of Die Linke for East
     (with unemployment at its national mean)",
)
abline(v = c(quants_combined[, 1]),
       lty = 2,
       col = viridis(4)[4])

Similarly, we can plot the distribution of the first differences:

hist(
  fd,
  main = "",
  las = 1,
  col = viridis(4)[3],
  border = "white",
  xlab = "First Differences for the Vote Share of Die Linke between East and West
  (with unemployment at its national mean)"
)

# Get mean & quantiles
quants_fd <- apply(as.matrix(fd), 2, quants_mean_fun)

# Add the lines to the plot
abline(v = quants_fd, lty = 2, col = viridis(4)[4])

ggplot2

# Expected Values, West
ggplot() +
  geom_histogram(aes(x = EV_combined[, 2]),
    boundary = 5.4,
    binwidth = 0.1,
    color = "white",
    fill = viridis(4)[1]
  ) +
  labs(
    x = "Expected Values for the Vote Share of Die Linke for East
     (with unemployment at its national mean)",
    y = "Frequency",
    title = "Distribution of Simulated Vote Shares of Die Linke with 95% Confidence Intervals"
  ) +
  geom_vline(xintercept = c(quants_combined[, 2]),
             color = viridis(4)[4],
             linetype = "dashed") +
  theme_classic()

# Expected values, East
ggplot() +
  geom_histogram(aes(x = EV_combined[, 1]),
    boundary = 19,
    binwidth = 0.5,
    color = "white",
    fill = viridis(4)[2]
  ) +
  labs(
    x = "Expected Values for the voteshare of Die Linke for districts in the east
     (with unemployment at its mean)",
    y = "Frequency",
    title = "Distribution of Simulated Vote Shares of Die Linke with 95% Confidence Intervals"
  ) +
  geom_vline(xintercept = c(quants_combined[, 1]),
             color = viridis(4)[4],
             linetype = "dashed") +
  theme_classic()

# First Difference
ggplot() +
  geom_histogram(aes(x = fd),
    boundary = 19,
    binwidth = 0.5,
    color = "white",
    fill = viridis(4)[3]
  ) +
  labs(
    x = "First Differences for the voteshare of Die Linke between East and West
      (with unemployment at its national mean)",
    y = "Frequency",
    title = "Distribution of First Differences with 95% Confidence Intervals"
  ) +
  geom_vline(xintercept = c(quants_fd),
             color = viridis(4)[4],
             linetype = "dashed") +
  theme_classic()

So far the scenarios were rather boring. Let’s do something more exciting! We calculate expected values over a range of unemployment.

We go back to Step 3.

Step 3 - Choose covariate values: Range

This time we choose a range of covariate values, i.e. we set the different scenarios.

unemp <- seq(0, 20, 0.1)  # Range from 0 to 20, in steps of 0.1
scenario_east <- cbind(1, unemp, 1, unemp * 1) # why cbind, not c?
scenario_west <- cbind(1, unemp, 0, unemp * 0)

Step 4 - Calculate Quantities of Interest: Range

EV_range_east <- S %*% t(scenario_east)
EV_range_west <- S %*% t(scenario_west)

fds <- EV_range_east - EV_range_west

head(EV_range_west)

##          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
## [1,] 2.495089 2.546285 2.597482 2.648679 2.699875 2.751072 2.802269 2.853465
## [2,] 2.324557 2.375929 2.427302 2.478675 2.530048 2.581420 2.632793 2.684166
## [3,] 1.583586 1.648166 1.712747 1.777327 1.841907 1.906488 1.971068 2.035649
## [4,] 2.490607 2.542191 2.593775 2.645360 2.696944 2.748528 2.800112 2.851697
## [5,] 1.234558 1.301016 1.367475 1.433934 1.500393 1.566852 1.633311 1.699770
## [6,] 2.074277 2.128909 2.183541 2.238173 2.292805 2.347437 2.402069 2.456701
##          [,9]    [,10]    [,11]    [,12]    [,13]    [,14]    [,15]    [,16]
## [1,] 2.904662 2.955859 3.007055 3.058252 3.109449 3.160645 3.211842 3.263038
## [2,] 2.735538 2.786911 2.838284 2.889657 2.941029 2.992402 3.043775 3.095147
## [3,] 2.100229 2.164809 2.229390 2.293970 2.358551 2.423131 2.487711 2.552292
## [4,] 2.903281 2.954865 3.006449 3.058034 3.109618 3.161202 3.212786 3.264371
## [5,] 1.766229 1.832687 1.899146 1.965605 2.032064 2.098523 2.164982 2.231441
## [6,] 2.511333 2.565965 2.620597 2.675229 2.729861 2.784493 2.839125 2.893757
##         [,17]    [,18]    [,19]    [,20]    [,21]    [,22]    [,23]    [,24]
## [1,] 3.314235 3.365432 3.416628 3.467825 3.519022 3.570218 3.621415 3.672612
## [2,] 3.146520 3.197893 3.249266 3.300638 3.352011 3.403384 3.454757 3.506129
## [3,] 2.616872 2.681453 2.746033 2.810613 2.875194 2.939774 3.004355 3.068935
## [4,] 3.315955 3.367539 3.419124 3.470708 3.522292 3.573876 3.625461 3.677045
## [5,] 2.297899 2.364358 2.430817 2.497276 2.563735 2.630194 2.696653 2.763112
## [6,] 2.948389 3.003021 3.057653 3.112285 3.166917 3.221549 3.276181 3.330813
##         [,25]    [,26]    [,27]    [,28]    [,29]    [,30]    [,31]    [,32]
## [1,] 3.723808 3.775005 3.826201 3.877398 3.928595 3.979791 4.030988 4.082185
## [2,] 3.557502 3.608875 3.660247 3.711620 3.762993 3.814366 3.865738 3.917111
## [3,] 3.133516 3.198096 3.262676 3.327257 3.391837 3.456418 3.520998 3.585578
## [4,] 3.728629 3.780213 3.831798 3.883382 3.934966 3.986550 4.038135 4.089719
## [5,] 2.829570 2.896029 2.962488 3.028947 3.095406 3.161865 3.228324 3.294782
## [6,] 3.385445 3.440077 3.494709 3.549341 3.603973 3.658605 3.713237 3.767869
##         [,33]    [,34]    [,35]    [,36]    [,37]    [,38]    [,39]    [,40]
## [1,] 4.133381 4.184578 4.235775 4.286971 4.338168 4.389364 4.440561 4.491758
## [2,] 3.968484 4.019856 4.071229 4.122602 4.173975 4.225347 4.276720 4.328093
## [3,] 3.650159 3.714739 3.779320 3.843900 3.908480 3.973061 4.037641 4.102222
## [4,] 4.141303 4.192888 4.244472 4.296056 4.347640 4.399225 4.450809 4.502393
## [5,] 3.361241 3.427700 3.494159 3.560618 3.627077 3.693536 3.759995 3.826453
## [6,] 3.822501 3.877133 3.931765 3.986396 4.041028 4.095660 4.150292 4.204924
##         [,41]    [,42]    [,43]    [,44]    [,45]    [,46]    [,47]    [,48]
## [1,] 4.542954 4.594151 4.645348 4.696544 4.747741 4.798938 4.850134 4.901331
## [2,] 4.379465 4.430838 4.482211 4.533584 4.584956 4.636329 4.687702 4.739075
## [3,] 4.166802 4.231382 4.295963 4.360543 4.425124 4.489704 4.554284 4.618865
## [4,] 4.553977 4.605562 4.657146 4.708730 4.760314 4.811899 4.863483 4.915067
## [5,] 3.892912 3.959371 4.025830 4.092289 4.158748 4.225207 4.291665 4.358124
## [6,] 4.259556 4.314188 4.368820 4.423452 4.478084 4.532716 4.587348 4.641980
##         [,49]    [,50]    [,51]    [,52]    [,53]    [,54]    [,55]    [,56]
## [1,] 4.952528 5.003724 5.054921 5.106117 5.157314 5.208511 5.259707 5.310904
## [2,] 4.790447 4.841820 4.893193 4.944565 4.995938 5.047311 5.098684 5.150056
## [3,] 4.683445 4.748026 4.812606 4.877186 4.941767 5.006347 5.070928 5.135508
## [4,] 4.966652 5.018236 5.069820 5.121404 5.172989 5.224573 5.276157 5.327741
## [5,] 4.424583 4.491042 4.557501 4.623960 4.690419 4.756878 4.823336 4.889795
## [6,] 4.696612 4.751244 4.805876 4.860508 4.915140 4.969772 5.024404 5.079036
##         [,57]    [,58]    [,59]    [,60]    [,61]    [,62]    [,63]    [,64]
## [1,] 5.362101 5.413297 5.464494 5.515691 5.566887 5.618084 5.669280 5.720477
## [2,] 5.201429 5.252802 5.304174 5.355547 5.406920 5.458293 5.509665 5.561038
## [3,] 5.200088 5.264669 5.329249 5.393830 5.458410 5.522990 5.587571 5.652151
## [4,] 5.379326 5.430910 5.482494 5.534078 5.585663 5.637247 5.688831 5.740416
## [5,] 4.956254 5.022713 5.089172 5.155631 5.222090 5.288549 5.355007 5.421466
## [6,] 5.133668 5.188300 5.242932 5.297564 5.352196 5.406828 5.461460 5.516092
##         [,65]    [,66]    [,67]    [,68]    [,69]    [,70]    [,71]    [,72]
## [1,] 5.771674 5.822870 5.874067 5.925264 5.976460 6.027657 6.078854 6.130050
## [2,] 5.612411 5.663783 5.715156 5.766529 5.817902 5.869274 5.920647 5.972020
## [3,] 5.716732 5.781312 5.845892 5.910473 5.975053 6.039634 6.104214 6.168794
## [4,] 5.792000 5.843584 5.895168 5.946753 5.998337 6.049921 6.101505 6.153090
## [5,] 5.487925 5.554384 5.620843 5.687302 5.753761 5.820219 5.886678 5.953137
## [6,] 5.570724 5.625356 5.679988 5.734620 5.789252 5.843884 5.898516 5.953148
##         [,73]    [,74]    [,75]    [,76]    [,77]    [,78]    [,79]    [,80]
## [1,] 6.181247 6.232443 6.283640 6.334837 6.386033 6.437230 6.488427 6.539623
## [2,] 6.023393 6.074765 6.126138 6.177511 6.228883 6.280256 6.331629 6.383002
## [3,] 6.233375 6.297955 6.362536 6.427116 6.491696 6.556277 6.620857 6.685438
## [4,] 6.204674 6.256258 6.307843 6.359427 6.411011 6.462595 6.514180 6.565764
## [5,] 6.019596 6.086055 6.152514 6.218973 6.285432 6.351890 6.418349 6.484808
## [6,] 6.007780 6.062412 6.117043 6.171675 6.226307 6.280939 6.335571 6.390203
##         [,81]    [,82]    [,83]    [,84]    [,85]    [,86]    [,87]    [,88]
## [1,] 6.590820 6.642017 6.693213 6.744410 6.795607 6.846803 6.898000 6.949196
## [2,] 6.434374 6.485747 6.537120 6.588492 6.639865 6.691238 6.742611 6.793983
## [3,] 6.750018 6.814598 6.879179 6.943759 7.008340 7.072920 7.137500 7.202081
## [4,] 6.617348 6.668932 6.720517 6.772101 6.823685 6.875269 6.926854 6.978438
## [5,] 6.551267 6.617726 6.684185 6.750644 6.817102 6.883561 6.950020 7.016479
## [6,] 6.444835 6.499467 6.554099 6.608731 6.663363 6.717995 6.772627 6.827259
##         [,89]    [,90]    [,91]    [,92]    [,93]    [,94]    [,95]    [,96]
## [1,] 7.000393 7.051590 7.102786 7.153983 7.205180 7.256376 7.307573 7.358770
## [2,] 6.845356 6.896729 6.948102 6.999474 7.050847 7.102220 7.153592 7.204965
## [3,] 7.266661 7.331242 7.395822 7.460402 7.524983 7.589563 7.654144 7.718724
## [4,] 7.030022 7.081607 7.133191 7.184775 7.236359 7.287944 7.339528 7.391112
## [5,] 7.082938 7.149397 7.215856 7.282315 7.348773 7.415232 7.481691 7.548150
## [6,] 6.881891 6.936523 6.991155 7.045787 7.100419 7.155051 7.209683 7.264315
##         [,97]    [,98]    [,99]   [,100]   [,101]   [,102]   [,103]   [,104]
## [1,] 7.409966 7.461163 7.512359 7.563556 7.614753 7.665949 7.717146 7.768343
## [2,] 7.256338 7.307711 7.359083 7.410456 7.461829 7.513201 7.564574 7.615947
## [3,] 7.783305 7.847885 7.912465 7.977046 8.041626 8.106207 8.170787 8.235367
## [4,] 7.442696 7.494281 7.545865 7.597449 7.649033 7.700618 7.752202 7.803786
## [5,] 7.614609 7.681068 7.747527 7.813985 7.880444 7.946903 8.013362 8.079821
## [6,] 7.318947 7.373579 7.428211 7.482843 7.537475 7.592107 7.646739 7.701371
##        [,105]   [,106]   [,107]   [,108]   [,109]   [,110]   [,111]   [,112]
## [1,] 7.819539 7.870736 7.921933 7.973129 8.024326 8.075522 8.126719 8.177916
## [2,] 7.667320 7.718692 7.770065 7.821438 7.872810 7.924183 7.975556 8.026929
## [3,] 8.299948 8.364528 8.429109 8.493689 8.558269 8.622850 8.687430 8.752011
## [4,] 7.855371 7.906955 7.958539 8.010123 8.061708 8.113292 8.164876 8.216460
## [5,] 8.146280 8.212739 8.279198 8.345656 8.412115 8.478574 8.545033 8.611492
## [6,] 7.756003 7.810635 7.865267 7.919899 7.974531 8.029163 8.083795 8.138427
##        [,113]   [,114]   [,115]   [,116]   [,117]   [,118]   [,119]   [,120]
## [1,] 8.229112 8.280309 8.331506 8.382702 8.433899 8.485096 8.536292 8.587489
## [2,] 8.078301 8.129674 8.181047 8.232420 8.283792 8.335165 8.386538 8.437910
## [3,] 8.816591 8.881171 8.945752 9.010332 9.074913 9.139493 9.204073 9.268654
## [4,] 8.268045 8.319629 8.371213 8.422797 8.474382 8.525966 8.577550 8.629135
## [5,] 8.677951 8.744410 8.810868 8.877327 8.943786 9.010245 9.076704 9.143163
## [6,] 8.193059 8.247691 8.302322 8.356954 8.411586 8.466218 8.520850 8.575482
##        [,121]   [,122]   [,123]   [,124]   [,125]   [,126]   [,127]   [,128]
## [1,] 8.638685 8.689882 8.741079 8.792275 8.843472 8.894669 8.945865 8.997062
## [2,] 8.489283 8.540656 8.592029 8.643401 8.694774 8.746147 8.797519 8.848892
## [3,] 9.333234 9.397815 9.462395 9.526975 9.591556 9.656136 9.720717 9.785297
## [4,] 8.680719 8.732303 8.783887 8.835472 8.887056 8.938640 8.990224 9.041809
## [5,] 9.209622 9.276081 9.342539 9.408998 9.475457 9.541916 9.608375 9.674834
## [6,] 8.630114 8.684746 8.739378 8.794010 8.848642 8.903274 8.957906 9.012538
##        [,129]   [,130]   [,131]    [,132]    [,133]    [,134]    [,135]
## [1,] 9.048259 9.099455 9.150652  9.201849  9.253045  9.304242  9.355438
## [2,] 8.900265 8.951638 9.003010  9.054383  9.105756  9.157128  9.208501
## [3,] 9.849877 9.914458 9.979038 10.043619 10.108199 10.172779 10.237360
## [4,] 9.093393 9.144977 9.196562  9.248146  9.299730  9.351314  9.402899
## [5,] 9.741293 9.807751 9.874210  9.940669 10.007128 10.073587 10.140046
## [6,] 9.067170 9.121802 9.176434  9.231066  9.285698  9.340330  9.394962
##         [,136]    [,137]    [,138]    [,139]    [,140]    [,141]    [,142]
## [1,]  9.406635  9.457832  9.509028  9.560225  9.611422  9.662618  9.713815
## [2,]  9.259874  9.311247  9.362619  9.413992  9.465365  9.516738  9.568110
## [3,] 10.301940 10.366521 10.431101 10.495681 10.560262 10.624842 10.689423
## [4,]  9.454483  9.506067  9.557651  9.609236  9.660820  9.712404  9.763988
## [5,] 10.206505 10.272964 10.339422 10.405881 10.472340 10.538799 10.605258
## [6,]  9.449594  9.504226  9.558858  9.613490  9.668122  9.722754  9.777386
##         [,143]    [,144]    [,145]    [,146]    [,147]    [,148]    [,149]
## [1,]  9.765012  9.816208  9.867405  9.918601  9.969798 10.020995 10.072191
## [2,]  9.619483  9.670856  9.722228  9.773601  9.824974  9.876347  9.927719
## [3,] 10.754003 10.818583 10.883164 10.947744 11.012325 11.076905 11.141485
## [4,]  9.815573  9.867157  9.918741  9.970326 10.021910 10.073494 10.125078
## [5,] 10.671717 10.738176 10.804634 10.871093 10.937552 11.004011 11.070470
## [6,]  9.832018  9.886650  9.941282  9.995914 10.050546 10.105178 10.159810
##         [,150]   [,151]   [,152]   [,153]   [,154]   [,155]   [,156]   [,157]
## [1,] 10.123388 10.17458 10.22578 10.27698 10.32817 10.37937 10.43057 10.48176
## [2,]  9.979092 10.03046 10.08184 10.13321 10.18458 10.23596 10.28733 10.33870
## [3,] 11.206066 11.27065 11.33523 11.39981 11.46439 11.52897 11.59355 11.65813
## [4,] 10.176663 10.22825 10.27983 10.33142 10.38300 10.43458 10.48617 10.53775
## [5,] 11.136929 11.20339 11.26985 11.33631 11.40276 11.46922 11.53568 11.60214
## [6,] 10.214442 10.26907 10.32371 10.37834 10.43297 10.48760 10.54223 10.59687
##        [,158]   [,159]   [,160]   [,161]   [,162]   [,163]   [,164]   [,165]
## [1,] 10.53296 10.58416 10.63535 10.68655 10.73775 10.78894 10.84014 10.89134
## [2,] 10.39007 10.44145 10.49282 10.54419 10.59556 10.64694 10.69831 10.74968
## [3,] 11.72271 11.78729 11.85187 11.91645 11.98103 12.04561 12.11019 12.17477
## [4,] 10.58934 10.64092 10.69251 10.74409 10.79567 10.84726 10.89884 10.95043
## [5,] 11.66860 11.73506 11.80152 11.86798 11.93444 12.00089 12.06735 12.13381
## [6,] 10.65150 10.70613 10.76076 10.81539 10.87003 10.92466 10.97929 11.03392
##        [,166]   [,167]   [,168]   [,169]   [,170]   [,171]   [,172]   [,173]
## [1,] 10.94253 10.99373 11.04493 11.09612 11.14732 11.19852 11.24971 11.30091
## [2,] 10.80106 10.85243 10.90380 10.95517 11.00655 11.05792 11.10929 11.16066
## [3,] 12.23935 12.30393 12.36851 12.43309 12.49767 12.56225 12.62683 12.69142
## [4,] 11.00201 11.05360 11.10518 11.15676 11.20835 11.25993 11.31152 11.36310
## [5,] 12.20027 12.26673 12.33319 12.39965 12.46611 12.53257 12.59902 12.66548
## [6,] 11.08855 11.14319 11.19782 11.25245 11.30708 11.36171 11.41635 11.47098
##        [,174]   [,175]   [,176]   [,177]   [,178]   [,179]   [,180]   [,181]
## [1,] 11.35211 11.40330 11.45450 11.50570 11.55689 11.60809 11.65929 11.71048
## [2,] 11.21204 11.26341 11.31478 11.36616 11.41753 11.46890 11.52027 11.57165
## [3,] 12.75600 12.82058 12.88516 12.94974 13.01432 13.07890 13.14348 13.20806
## [4,] 11.41469 11.46627 11.51785 11.56944 11.62102 11.67261 11.72419 11.77577
## [5,] 12.73194 12.79840 12.86486 12.93132 12.99778 13.06424 13.13069 13.19715
## [6,] 11.52561 11.58024 11.63487 11.68950 11.74414 11.79877 11.85340 11.90803
##        [,182]   [,183]   [,184]   [,185]   [,186]   [,187]   [,188]   [,189]
## [1,] 11.76168 11.81288 11.86407 11.91527 11.96647 12.01766 12.06886 12.12006
## [2,] 11.62302 11.67439 11.72576 11.77714 11.82851 11.87988 11.93126 11.98263
## [3,] 13.27264 13.33722 13.40180 13.46638 13.53096 13.59554 13.66012 13.72470
## [4,] 11.82736 11.87894 11.93053 11.98211 12.03370 12.08528 12.13686 12.18845
## [5,] 13.26361 13.33007 13.39653 13.46299 13.52945 13.59591 13.66237 13.72882
## [6,] 11.96266 12.01730 12.07193 12.12656 12.18119 12.23582 12.29046 12.34509
##        [,190]   [,191]   [,192]   [,193]   [,194]   [,195]   [,196]   [,197]
## [1,] 12.17125 12.22245 12.27365 12.32484 12.37604 12.42724 12.47843 12.52963
## [2,] 12.03400 12.08537 12.13675 12.18812 12.23949 12.29086 12.34224 12.39361
## [3,] 13.78928 13.85386 13.91844 13.98302 14.04760 14.11218 14.17676 14.24134
## [4,] 12.24003 12.29162 12.34320 12.39479 12.44637 12.49795 12.54954 12.60112
## [5,] 13.79528 13.86174 13.92820 13.99466 14.06112 14.12758 14.19404 14.26050
## [6,] 12.39972 12.45435 12.50898 12.56362 12.61825 12.67288 12.72751 12.78214
##        [,198]   [,199]   [,200]   [,201]
## [1,] 12.58083 12.63202 12.68322 12.73442
## [2,] 12.44498 12.49636 12.54773 12.59910
## [3,] 14.30593 14.37051 14.43509 14.49967
## [4,] 12.65271 12.70429 12.75588 12.80746
## [5,] 14.32695 14.39341 14.45987 14.52633
## [6,] 12.83678 12.89141 12.94604 13.00067

# Quantiles, we again use apply and our quants.mean.fun
quants_range_east <- apply(EV_range_east, 2, quants_mean_fun)
quants_range_west <- apply(EV_range_west, 2, quants_mean_fun)
fds <- apply(fds, 2, quants_mean_fun)

Step 5 - Summarize Results: Range

Base R

plot(
  unemp,
  quants_range_east[2,],
  pch = 19,
  cex = 0.3,
  bty = "n",
  las = 1,
  ylim = c(0, 35),
  ylab = "Vote Share (%)",
  main = "Expected Vote Share (Die Linke)",
  xlab = "Unemployment Rate",
  type = "n"
)

# Let's add our actual observations.
points(df$unemployment,
       df$leftvote,
       pch = 19,
       col = viridis(3, alpha = 0.5)[1])

# Now we add the lines.
lines(unemp, quants_range_east[3,], col = viridis(3)[2])
lines(unemp, quants_range_west[3,], col = viridis(3)[3])

# Let's add those confidence intervals.
# First, for east:
lines(unemp, quants_range_east[1,], lty = "dashed", col = viridis(3)[2])
lines(unemp, quants_range_east[2,], lty = "dashed", col = viridis(3)[2])

# And for west:
lines(unemp, quants_range_west[1,], lty = "dashed", col = viridis(3)[3])
lines(unemp, quants_range_west[2,], lty = "dashed", col = viridis(3)[3])

# Add a legend
legend("topleft",
       lty = "solid",
       col = viridis(3)[2:3],
       legend = c("East", "West"),
       cex = 0.8,
       bty = "n")

There are many different ways to plot confidence intervals. Pick the style you like most. Another option is to use polygons to plot the confidence intervals:

plot(
  unemp,
  quants_range_east[2,],
  pch = 19,
  cex = 0.3,
  bty = "n",
  las = 1,
  ylim = c(0, 35),
  ylab = "Vote Share (%)",
  main = "Expected Vote Share (Die Linke)",
  xlab = "Unemployment Rate",
  type = "n"
)

points(df$unemployment,
       df$leftvote,
       pch = 19,
       col = viridis(3, alpha = 0.5)[1])

polygon(
  x = c(unemp, rev(unemp)),
  y = c(quants_range_east[1 ,], rev(quants_range_east[2,])),
  col = viridis(3, alpha = 0.5)[2],
  border = NA
)
polygon(
  x = c(unemp, rev(unemp)),
  y = c(quants_range_west[1 ,], rev(quants_range_west[2,])),
  col = viridis(3, alpha = 0.5)[3],
  border = NA
)

# Add the lines
lines(unemp, quants_range_east[3,], col = viridis(3)[2])  
lines(unemp, quants_range_west[3,], col = viridis(3)[3])

# Add a legend
legend("topleft",
       lty = "solid",
       col = viridis(3)[2:3],
       legend = c("East", "West"),
       cex = 0.8,
       bty = "n")

ggplot2

# data frame for EV east
plot_ev <- data.frame(t(cbind(quants_range_west, quants_range_east))) %>%
  janitor::clean_names()
plot_ev$unemployment <- rep(unemp, times = 2)
plot_ev$east <- rep(c("West", "East"), each = length(unemp))

# plot
ggplot(
  data = plot_ev,
  aes(
    x = unemployment, y = mean,
    group = east
  )
) +
  geom_line() +
  geom_point(
    data = df,
    aes(y = leftvote),
    color = viridis(3, alpha = 0.3)[2]
  ) +
  # add mean expected values
  geom_ribbon(aes(
    fill = east,
    color = east,
    y = mean,
    ymin = quants_2_5,
    ymax = quants_97_5
  ),
  stat = "identity"
  ) +
  theme_classic() +
  scale_color_viridis(end = 0.7, discrete = T) +
  scale_fill_viridis(end = 0.7, alpha = 0.5, discrete = T) +
  labs(
    x = "Unemployment in %",
    y = "Expected Voteshare (Die Linke) in %",
    color = "",
    fill = "",
    title = "Left Voteshare and Unemployment"
  ) +
  theme(legend.position = "top")

4 Exercise Session

Today’s exercise session is not about the generation of new code, but about better understanding what we did so far. Together with your neighbor, try to answer the following questions by going through and playing around with the code that we have written in the previous lines.

What does mvrnorm do?
What are the dimensions of S? Why?
What do you need to specify in a scenario vector/interesting covariate values? (E.g. scenario_east)
Is the order of the elements in the scenario vector important?
What would we need to do to get predicted values instead of expected values?

5 Simulation of Predicted Values \(Y|X\)

Now we also want to simulate predicted values. What’s different from expected values?

Step 1 - Get the regression coefficients
- Exactly the same as above
Step 2 - Generate sampling distribution
- Exactly the same as above
Step 3 - Choose covariate values
- Exactly the same as above

X_east <- c(1, mean(df$unemployment), 1, mean(df$unemployment) * 1) # East
X_west <- c(1, mean(df$unemployment), 0, mean(df$unemployment) * 0) # West

X <- cbind(X_east, X_west)

Step 4 - Calculate Quantities of Interest

This is still the same as above

EV_combined <- S %*% X

Now we need to add something. Remember sigma_est (i.e., \(\hat\sigma\)) from last lab/lecture? That’s the fundamental uncertainty!

# Y ~ N(mu_c, sigma_est)
sigma_est <- sqrt(sum(residuals(reg)^2) / (nrow(df) - length(beta_hat)))
# built-in version: sigma_est <- sigma(reg)

Y_hat_east <- EV_combined[,1] + rnorm(nsim, 0, sigma_est)
Y_hat_west <- EV_combined[,2] + rnorm(nsim, 0, sigma_est)

# Quantiles
quants_east <- quants_mean_fun(Y_hat_east)
quants_west <- quants_mean_fun(Y_hat_west)
quants_west

##  quants.2.5% quants.97.5%         mean 
##     2.145625     9.659661     5.988286

Let’s plot it:

# Histogram Predicted Values West Germany
hist(Y_hat_west,
     las = 1,
     col = viridis(3)[1], 
     border = "white",
     main = "Histogram of Predicted Values (West Germany)",
     xlab = "Predicted Values for the Vote Share of Die Linke for districts in the West
  (with unemployment at its national mean)"
  )

abline(v = c(quants_west[1:3]), lty = 2, col = viridis(3)[3])

# Histogram Predicted Values East Germany
hist(Y_hat_east,
     las = 1,
     main = "Histogram of Predicted Values (East Germany)",
     col = viridis(3)[2], 
     border = "white",
     xlab = "Predicted Values for the Vote Share of Die Linke for districts in the West
  (with unemployment at its national mean)")
abline(v = c(quants_east[1:3]), lty = 2, col = viridis(3)[3])

# We could put both distributions in one plot.
plot(density(Y_hat_west), 
     xlim = c(0,40), 
     lwd = 2 ,
     bty = "n", 
     yaxt = "n", 
     ylab = "", 
     xlab = "Left Vote Share in %",
     main = "Predicted Values for Vote Share",
     type = "n")
lines(density(Y_hat_west, from = min(Y_hat_west), to = max(Y_hat_west)), 
      lwd = 2, col = viridis(3)[1])
lines(density(Y_hat_east, from = min(Y_hat_east), to = max(Y_hat_east)), 
      lwd = 2, col = viridis(3)[2])
abline(v = c(quants_west[1:3]), lty = 2, col = viridis(3)[3])
abline(v = c(quants_east[1:3]), lty = 2, col = viridis(3)[3])

Step 4 - Calculate Quantities of Interest: Range

EV_range_east <- S %*% t(scenario_east)
EV_range_west <- S %*% t(scenario_west)

Y_hat_range_east <- EV_range_east + rnorm(nsim, 0, sigma_est)
Y_hat_range_west <- EV_range_west + rnorm(nsim, 0, sigma_est)

# Quantiles, we again use apply and our quants_mean_fun
Y_quants_range_east <- apply(Y_hat_range_east, 2, quants_mean_fun)
Y_quants_range_west <- apply(Y_hat_range_west, 2, quants_mean_fun)

Step 5 - Summarize Results

Plot it with polygons as confidence intervals.

Base R

plot(
  unemp,
  Y_quants_range_east[2, ],
  las = 1,
  bty = "n",
  pch = 19,
  cex = 0.3,
  ylim = c(0, 45),
  ylab = "Voteshare (%)",
  main = "Predicted Voteshare (Die Linke)",
  xlab = "Range of Unemployment",
  type = "n"
)

points(
  df$unemployment,
  df$leftvote,
  pch = 19,
  col = adjustcolor(viridis(3)[1], alpha = 0.8)
)

polygon(
  x = c(unemp, rev(unemp)),
  y = c(Y_quants_range_east[1, ], rev(Y_quants_range_east[2, ])),
  col = adjustcolor(viridis(3)[2], alpha = 0.5),
  border = NA
)
polygon(
  x = c(unemp, rev(unemp)),
  y = c(Y_quants_range_west[1, ], rev(Y_quants_range_west[2, ])),
  col = adjustcolor(viridis(3)[3], alpha = 0.5),
  border = NA
)

lines(unemp, Y_quants_range_east[3, ], col = viridis(3)[2])
lines(unemp, Y_quants_range_west[3, ], col = viridis(3)[3])

# Add a legend
legend("topleft",
  lty = "solid",
  col = viridis(3)[2:3],
  legend = c("East", "West"),
  cex = 0.8,
  bty = "n"
)

ggplot2

plot_pv <- data.frame(t(cbind(Y_quants_range_west, Y_quants_range_east))) %>%
  janitor::clean_names()
plot_pv$unemployment <- rep(unemp, times = 2)
plot_pv$east <- rep(c("West", "East"), each = length(unemp))

# plot
ggplot(
  data = plot_pv,
  aes(
    x = unemployment, y = mean,
    group = east
  )
) +
  geom_line() +
  geom_point(
    data = df,
    aes(y = leftvote),
    color = viridis(3, alpha = 0.3)[2]
  ) +
  # add mean expected values
  geom_ribbon(aes(
    fill = east,
    color = east,
    y = mean,
    ymin = quants_2_5,
    ymax = quants_97_5
  ),
  stat = "identity"
  ) +
  theme_classic() +
  scale_color_viridis(end = 0.7, discrete = T) +
  scale_fill_viridis(end = 0.7, alpha = 0.5, discrete = T) +
  labs(
    x = "Unemployment in %",
    y = "Predicted Vote share (Die Linke) in %",
    color = "",
    fill = "",
    title = "Left Vote Share and Unemployment"
  ) +
  theme(legend.position = "top")

Comparing Expected and Predicted Values

The confidence bounds are wider because we take the fundamental uncertainty of our model into account. To see this we can plot the expected values plot and predicted values plot side by side.

Base R

Expected values vary much less than predictions. The reason is the added fundamental uncertainty: we can not ignore \(\epsilon\) when predicting the outcome for individuals. When estimating expected values or long-run averages, the \(\epsilon\)’s cancel out.

Remember:

Averages involve questions about parameters: for average effects get expected values
The fates of individuals require predictions: for particular predictions get predicted values

ggplot2

Concluding Remarks

In your homework you will have a lot of fun with simulations.

References

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61.

QM 2021 Week 7: OLS: Interactions and Simulations

Oliver Rittmann

Viktoriia Semenova

David Grundmanns

October 21 | 25 | 26, 2021

Today we will learn

1 Understanding Marginal Effects for Categorical and Continuous Moderators

1.1 Calculate linear predictions

1.2 Calculate marginal effects

1.3 Calculate standard errors of estimated marginal effects

1.4 Plot the results

Review: Prediction So Far

Base R

ggplot2

2 Meet the apply function

2.1 Basic Examples

2.2 Working with the apply function

3 Simulation of Expected Values \(E[Y|X]\)

Step 1 - Get the regression coefficients

Step 2 - Generate sampling distribution

Step 2.1. Get the variance-covariance matrix

Step 2.2. Draw from the multivariate normal distribution

Step 3 - Choose interesting covariate values (aka set a scenario)

Step 4 - Calculate Quantities of Interest

Expected Values \(E[Y|X]\)

First Differences

Step 5 - Summarize Results

Base R

ggplot2

Step 3 - Choose covariate values: Range

Step 4 - Calculate Quantities of Interest: Range

Step 5 - Summarize Results: Range

Base R

ggplot2

4 Exercise Session

5 Simulation of Predicted Values \(Y|X\)

Step 4 - Calculate Quantities of Interest

Step 4 - Calculate Quantities of Interest: Range

Step 5 - Summarize Results

Base R

ggplot2

Comparing Expected and Predicted Values

Base R

ggplot2

Concluding Remarks

References