Course objectives for Bayesian Regression Modeling with Rstanarm


Review of the frequentist regression using the kidiq data from rstanarm package.

  • data are scores of kid’s IQ and predictors are mom’s IQ score, age, and whether she completed HS.
kidiq %>% head(5)
##   kid_score mom_hs    mom_iq mom_age
## 1        65      1 121.11753      27
## 2        98      1  89.36188      25
## 3        85      1 115.44316      27
## 4        83      1  99.44964      25
## 5       115      1  92.74571      27

estimating with the LM function

summary(lm(kid_score~ mom_iq, data=kidiq))
## 
## Call:
## lm(formula = kid_score ~ mom_iq, data = kidiq)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.753 -12.074   2.217  11.710  47.691 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 25.79978    5.91741    4.36 1.63e-05 ***
## mom_iq       0.60997    0.05852   10.42  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.27 on 432 degrees of freedom
## Multiple R-squared:  0.201,  Adjusted R-squared:  0.1991 
## F-statistic: 108.6 on 1 and 432 DF,  p-value: < 2.2e-16
tidy version of the coefficient estimates
tidy(lm(kid_score~ mom_iq, data=kidiq))
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   25.8      5.92        4.36 1.63e- 5
## 2 mom_iq         0.610    0.0585     10.4  7.66e-23
Note. p-value tells us the probability of observing data that gives rise to a test statistic this large if the true value of the parameter = 0.
  • key problem with the frequentist approach.



example to illustrate this issue: Probability of a women having cancer given a + mammogram P(cancer|+test)

  • P(+test|cancer)=.90 Women with breast cancer have a positive mammogram 90% of the time.
    • P(Test|Cancer)=.1
  • Prevalence of breast cancer in women P(cancer)=.004 (note. our prior)
    • probability of not having breast cancer is P(cancer)=.0041.0=.996
  • P(+test)=(.9.004)+.1.996)=.1 The probability of a random women getting a positive mammogram is 10%
  • Given a positive mammogram what are the chances that the same women has breast cancer?
    • P(Cancer|+test)=(.9.004)/.1=.036 3.6% chance of having breast cancer given that they received a positive mammogram result which is different than our P(+test|cancer)=.90.


Following exercises will use spotify data which was copied over in the follow chunk code and compiled into a dataframe.

artist_name <- c("Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", 
                 "Beyoncé", "Beyoncé","Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", 
                 "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé","Beyoncé", "Beyoncé", "Beyoncé", 
                 "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé","Beyoncé", 
                 "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", 
                 "Beyoncé","Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift",  
                 "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Adele", "Adele", "Adele", "Adele", "Adele", "Adele", "Adele", "Adele", "Adele", 
                 "Adele", "Adele", "Adele", "Taylor Swift", "Taylor Swift", "Taylor Swift",
                 "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Taylor Swift","Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Taylor Swift", "Taylor Swift","Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé","Beyoncé", "Beyoncé", 
                 "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé",
                 "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Taylor Swift", 
                 "Taylor Swift", "Taylor Swift","Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Taylor Swift", "Taylor Swift", "Taylor Swift","Taylor Swift", "Taylor Swift", 
                 "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift","Taylor Swift", 
                 "Taylor Swift", "Adele", "Adele", "Adele", "Adele", "Adele", "Adele", "Adele",
                 "Adele", "Adele", "Adele", "Adele", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", 
                 "Beyoncé", "Beyoncé","Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", 
                 "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé","Taylor Swift", "Taylor Swift", 
                 "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift","Taylor Swift", 
                 "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift",
                 "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Taylor Swift","Taylor Swift", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", 
                 "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", "Beyoncé", 
                 "Beyoncé",  "Beyoncé", "Beyoncé", "Taylor Swift", "Taylor Swift","Taylor Swift", 
                 "Taylor Swift",  "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift",
                 "Taylor Swift","Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Taylor Swift","Taylor Swift", "Taylor Swift", "Adele", "Adele", "Adele", 
                 "Adele", "Adele","Adele", "Adele","Adele", "Adele", "Taylor Swift", 
                 "Taylor Swift", "Taylor Swift","Taylor Swift", "Taylor Swift","Taylor Swift", 
                 "Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift", 
                 "Taylor Swift","Taylor Swift", "Taylor Swift", "Taylor Swift", "Taylor Swift")


song_age <- c(5351, 5351, 5351, 5351, 5351, 5351, 5351, 5351, 5351, 5351, 5351, 5351, 5351, 5351, 5351, 
              5351,4186, 4186, 4186, 4186, 4186, 4186, 4186, 4186, 4186, 4186, 4186, 4186, 4186, 4186, 
              4186, 4186,4186, 4186, 4186, 4186, 4132, 4132, 4132, 4132, 4132, 4132, 4132, 4132, 4132, 
              4132, 4132, 4132,4132, 4132, 3671, 3671, 3671, 3671, 3671, 3671, 3671, 3671, 3671, 3671, 
              3671, 3671, 3383, 3383,3383, 3383, 3383, 3383, 3383, 3383, 3383, 3383, 3383, 3383, 3383, 
              3383, 3383, 3383, 3383, 3383,3380, 3380, 3380, 3380, 3380, 3380, 3380, 3380, 3380, 3380, 
              3380, 3380, 3380, 3380, 3380, 3380,3380, 3380, 3380, 2670, 2670, 2670, 2670, 2670, 2670, 
              2670, 2670, 2670, 2670, 2670, 2670, 2670,2670, 2670, 2670, 2670, 2584, 2584, 2584, 2584, 
              2584, 2584, 2584, 2584, 2584, 2584, 2584, 2428,2428, 2428, 2428, 2428, 2428, 2428, 2428, 
              2428, 2428, 2428, 2428, 2428, 2428, 2428, 1942, 1942,1942, 1942, 1942, 1942, 1942, 1942, 
              1942, 1942, 1942, 1942, 1942, 1942, 1942, 1942, 1942, 1942,1942, 1525, 1525, 1525, 1525, 
              1525, 1525, 1525, 1525, 1525, 1525, 1525, 1525, 1525, 1525, 1207,1207, 1207, 1207, 1207, 
              1207, 1207, 1207, 1207, 1207, 1207, 1207, 1207, 1207, 1207, 1207, 818,818, 818, 818, 818, 
              818, 818, 818, 818, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97)


tempo <- c( 99.259, 99.973, 91.039, 166.602, 74.934, 83.615, 112.439, 74.283, 112.726, 84.200, 116.153,
            99.947, 133.173, 93.615,173.954, 89.889, 91.977, 175.868, 107.045, 157.992, 124.821, 84.547, 
            151.947,92.161, 96.540, 94.922, 98.960, 93.852, 105.253, 169.673, 197.431, 122.394, 167.875, 
            122.868, 175.861,91.962, 152.175, 105.587, 115.012, 175.662, 112.979, 146.172, 131.646, 
            167.974, 150.802, 177.874,143.983, 96.052, 156.035, 100.021, 108.959, 105.181, 80.072, 
            109.959, 85.892, 80.758, 81.405, 137.202,72.416, 126.381, 96.997, 122.574, 79.985, 199.965, 
            143.952, 85.953, 81.921, 100.029, 95.407, 119.004,115.972, 185.776, 129.987, 147.988, 99.990, 
            133.927, 160.884, 128.046, 126.075, 96.034, 193.438, 146.118, 78.798, 78.454, 89.946, 95.026, 
            121.969, 83.413, 86.959, 115.025, 97.526, 135.734, 155.881, 136.901,135.990,116.966, 109.888, 
            78.294, 76.380, 121.066, 114.979, 141.886, 118.993, 119.524, 163.945,139.970, 124.872, 
            81.994, 145.862, 134.029, 162.090, 97.957, 203.824, 159.853, 147.857, 131.974,104.945, 
            120.041, 79.830, 115.082,108.003, 159.824, 137.801, 160.009,  51.660, 116.925, 135.109,63.345, 
            86.934, 167.963, 99.099, 160.979, 72.032,150.006, 94.100, 167.349, 117.999, 110.992, 127.086, 
            116.010, 124.040, 110.005, 129.958, 124.966, 109.980,154.031, 93.040, 104.012, 145.977,
            172.095, 100.114, 93.987, 157.005, 129.974, 117.920, 79.539, 125.997,79.055, 125.941, 79.876, 
            125.070, 129.935, 140.030, 120.065, 111.580, 185.571, 133.970, 140.415, 170.085,140.553, 
            80.334, 102.601, 90.567, 136.024, 98.011, 117.008, 95.995, 95.036, 92.001, 96.949, 
            160.040,118.014, 170.167, 140.024, 120.001, 72.817, 159.986, 103.973, 183.803, 85.026, 
            121.946, 164.023, 143.860,165.398, 94.982, 81.723, 109.812, 105.846, 141.961, 155.875, 
            160.015, 159.073, 82.989, 135.917, 95.045,128.070, 74.957, 92.027, 172.054, 110.010, 
            160.024, 120.085,163.960, 163.954, 94.922)

popularity <- c(72,59,57,39,42,54,43,41,41,42,43,52, 0,35,45,11,65,78,50,48,45,66,50,42,53,43,52,45,
                63,56,52,60,53,44,43,38,61,63,52,51,48,50,52,59,53,67,54,51,48,53,62,54,62,54,61,60,
                56,56,72,54,53,58,51,51,49,48,47,52,52,56,50,53,57,54,48,51,50,51,48,47,18,17,13,18,
                20,14,19,16,12,10,13,10,13,24,12,12,10,10,10,57,58,58,56,56,58,56,55,57,55,51,54,55,
                53,58,54,51,75,67,69,66,74,60,60,57,66,63,79,32,27,29,41,30,25,25,38,36,36,33,36,42,
                45,47,56,63,56,62,62,64,57,62,57,57,56,54,54,63,55,58,55,55,52,69,77,63,59,75,63,64,
                70,68,61,62,61,79,60,56,61,67,60,61,65,56,57,60,60,56,57,59,58,61,62,75,75,69,75,63,
                63,66,72,60,80,85,75,74,76,84,70,80,73,71,70,70,70,75,70)

duration_ms <- c(235933,208600,244867,222533,260160,301173,259093,298533,360440,219160,297800,293107,
                 16360,163707,246067,204467,199853,227853,210240,235680,198280,273053,248853,222013,
                 379120,198280,201200,205973,240280,203347,282360,219547,228413,221200,227667,200813,
                 232107,173067,199200,239013,207107,248107,236053,242200,213080,201107,213053,203227,
                 220147,179067,220507,258507,210507,191867,208040,203907,190280,197347,212040,195653,
                 258853,271227,237067,311040,238107,261453,237600,242000,294347,235280,254320,234440,
                 231147,263987,200560,261800,244240,225333,245347,281053,192587,200320,235680,261160,
                 249867,270760,207480,278400,223587,221080,255733,217667,215173,217507,224387,274573,
                 187573,267827,221467,230707,260933,293027,240760,403920,237733,265667,290467,352187,
                 217160,302253,242080,367133,317453,237920,234547,275960,228293,221427,250133,243200,
                 241693,277947,228213,241653,346813,316440,285040,273880,239467,179067,253747,245480,
                 222480,199107,267413,212147,223987,239213,236093,253760,293133,377467,295187,220827,
                 240773,217973,327893,230133,242573,191880,204827,298293,201853,283680,240133,243933,
                 217827,237613,285560,222707,220320,257653,323480,309720,228693,319467,184013,391907,
                 215947,250960,276560,230893,266920,213507,180187,212600,231827,231000,235800,193293,
                 219200,207440,211933,220440,247533,250093,195707,271000,245560,267107,230467,223080,
                 290907,245427,240427,225427,285947,227067,271800,251693,208187,244827,238253,236413,
                 232253,211853,227907,209680,233627,214320,211507,230373,207133,203507,235467) 

songs <- data.frame(artist_name, song_age, tempo, popularity, duration_ms)
head(songs)
##   artist_name song_age   tempo popularity duration_ms
## 1     Beyoncé     5351  99.259         72      235933
## 2     Beyoncé     5351  99.973         59      208600
## 3     Beyoncé     5351  91.039         57      244867
## 4     Beyoncé     5351 166.602         39      222533
## 5     Beyoncé     5351  74.934         42      260160
## 6     Beyoncé     5351  83.615         54      301173

using the LM function with the songs data set

lm_model <- lm(popularity~song_age, data=songs)

summary(lm_model)
## 
## Call:
## lm(formula = popularity ~ song_age, data = songs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.908  -2.064   2.936   7.718  34.627 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 68.6899499  2.2619577  30.367  < 2e-16 ***
## song_age    -0.0058526  0.0007327  -7.988 8.52e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.9 on 213 degrees of freedom
## Multiple R-squared:  0.2305, Adjusted R-squared:  0.2269 
## F-statistic: 63.81 on 1 and 213 DF,  p-value: 8.516e-14
tidy(lm_model)
## # A tibble: 2 x 5
##   term         estimate std.error statistic  p.value
##   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)  68.7      2.26         30.4  2.45e-79
## 2 song_age     -0.00585  0.000733     -7.99 8.52e-14



Bayesian regression and why use it?

  • p-values fail us if we want to make inferences about actual values of parameters.
  • Bayes estimation is one solution, bayesian methods sample from a posterior distribution which we can then make summaries of the distributions to make inferences of the parameter valeus.


rstanarm package is an interface to stan - probabilistic programming language.

  • example fitting the previous linear regression model
    • summary of model shows information about the estimated model, parameter estimates, and model diagnostics (note. no p-values) we get a distribution of possible parameter values rather than point estimate.
    • “sigma” = standard deviation of errors
    • “mean_PPD” = mean of posterior predictive samples
    • “log-posterior” = analogous to a likelihood frequentist distribution, represents the log of the combined posterior distributions for model comparisons.
    • in the diagnostics section, “Rhat” = if a model converges and the parameters are stable values will be =< 1.1, compares variance within chains across chains to measure stability across our estimates.
# call summary of estimates
summary(stan_model)
## 
## Model Info:
## 
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      popularity ~ song_age
##  algorithm:    sampling
##  priors:       see help('prior_summary')
##  sample:       4000 (posterior sample size)
##  observations: 215
##  predictors:   2
## 
## Estimates:
##                 mean   sd     2.5%   25%    50%    75%    97.5%
## (Intercept)     68.7    2.2   64.3   67.2   68.7   70.2   73.0 
## song_age         0.0    0.0    0.0    0.0    0.0    0.0    0.0 
## sigma           15.0    0.7   13.6   14.5   14.9   15.4   16.5 
## mean_PPD        52.5    1.5   49.7   51.6   52.5   53.5   55.4 
## log-posterior -894.5    1.3 -897.7 -895.1 -894.1 -893.6 -893.1 
## 
## Diagnostics:
##               mcse Rhat n_eff
## (Intercept)   0.0  1.0  4133 
## song_age      0.0  1.0  3701 
## sigma         0.0  1.0  3762 
## mean_PPD      0.0  1.0  4219 
## log-posterior 0.0  1.0  1603 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# call coefficient estimates
tidy(stan_model)
## # A tibble: 2 x 3
##   term         estimate std.error
##   <chr>           <dbl>     <dbl>
## 1 (Intercept)  68.7      2.21    
## 2 song_age     -0.00586  0.000718


comparing frequentist and Bayesian methods

  • Frequentist assumption: parameters are fixed, data is random
  • Bayesian assumption: parameters are random, data is fixed
    • Probability of parameter values, given the observed data
    • interest is in determining the range of parameter values that would give rise to observed data.


Evaluating Bayesian parameters through credible intervals.

  • Confidence interval: probability that a range contains the true value, but does not tell us how probable a parameter value is.
    • ex. there is a 95% probability that the true value is within a range.
    • Probability of range boundaries
  • Credible interval: probability that the true value is within a range
    • ex. there is a 95% probability that the true value falls within this range.
    • Probability of parameter values
Calculating credible intervals and comparing them to the frequentist approach
# 95% credible intervals
posterior_interval(stan_model, pars = "song_age", prob=.95)
##                 2.5%        97.5%
## song_age -0.00728249 -0.004434891
95% confidence intervals from lm model
confint(lm_model, parm = "song_age", level=.95)
##               2.5 %       97.5 %
## song_age -0.0072968 -0.004408318
with the bayes method we can ask what is the probability that our parameter for son_age is between -.006 & -.005
posterior <- spread_draws(stan_model, song_age)
mean(between(posterior$song_age, -.006, -.005))
## [1] 0.463
The probability that our parameter is between these values is .46.
  • Note, we cannot do this type of inference with confidence intervals.





Modifying a Bayesian Model



What is in a bayesian model?

  • Posterior distribution is sampled in groups called chains, and each sample in a chain is an iteration.
    • The chains begin in a random location and as it samples the chain moves towards the area where the combination of likelihood and prior indicates a high probability of the true parameter value residing.
    • The more iterations in a chain the larger the sample of posterior distribution will be.
    • summaries are impacted by the length of the chain, while large samples allow for more robust estimates for summary statistics.
  • Convergence is important as it ensures stable estimates. Can visualize this process in a traceplot.
    • We can discard the first iterations before convergence, this is known as burn-in or warm-up.
    • default, rstan starts with chains=4, iter=2000, and warmup=1000.
    • fewer iterations = shorter estimation time.
    • not enough iteration = convergence problems.



Prior Distribution

  • What priors did we use for the stan_model
prior_summary(stan_model)
## Priors for model 'stan_model' 
## ------
## Intercept (after predictors centered)
##  ~ normal(location = 0, scale = 10)
##      **adjusted scale = 169.51
## 
## Coefficients
##  ~ normal(location = 0, scale = 2.5)
##      **adjusted scale = 0.03
## 
## Auxiliary (sigma)
##  ~ exponential(rate = 1)
##      **adjusted scale = 16.95 (adjusted rate = 1/adjusted scale)
## ------
## See help('prior_summary.stanreg') for more details
prior model for y-intercept

aN(0,102)

prior model for slope

bN(0,2.52)

prior model for residual

sExp(1)

Note. adjusted scale is calculated for our data.
adjusted scale for intercept

a:10×sd(Y)

or
10 * sd(songs$popularity) 
## [1] 169.5062
adjusted scale for coefficients

b:(2.5÷sd(X))×sd(Y)

or
(2.5/  sd(songs$song_age)) * sd(songs$popularity) 
## [1] 0.03047504
Rstanarm uses unadjusted priors in order to ensure that the specified priors are not too informative.
  • Turning off this default
prior_summary(stan_model_no_scale)
## Priors for model 'stan_model_no_scale' 
## ------
## Intercept (after predictors centered)
##  ~ normal(location = 0, scale = 10)
## 
## Coefficients
##  ~ normal(location = 0, scale = 2.5)
## 
## Auxiliary (sigma)
##  ~ exponential(rate = 1)
## ------
## See help('prior_summary.stanreg') for more details


specifying priors

  • Including knowledge from experts
  • make sure autoscale = FALSE
  • There are many different prior distributions that can be used, use ?priors to look at the full view of prior distributions that can be specified
    • normal()
    • exponential()
    • student_t()
    • cauchy()
  • If we we want flat priors;
prior_summary(stan_model)
## Priors for model 'stan_model' 
## ------
## Intercept (after predictors centered)
##  ~ flat
## 
## Coefficients
##  ~ flat
## 
## Auxiliary (sigma)
##  ~ flat
## ------
## See help('prior_summary.stanreg') for more details
Using informed priors
tidy(stan_model)
## # A tibble: 2 x 3
##   term         estimate std.error
##   <chr>           <dbl>     <dbl>
## 1 (Intercept)  67.3      2.30    
## 2 song_age     -0.00535  0.000773
prior_summary(stan_model)
## Priors for model 'stan_model' 
## ------
## Intercept (after predictors centered)
##  ~ flat
## 
## Coefficients
##  ~ cauchy(location = -0.004, scale = 0.001)
## 
## Auxiliary (sigma)
##  ~ flat
## ------
## See help('prior_summary.stanreg') for more details
Note. priors do affect parameter estimates.



Tuning model for stability - Error messages

  • Divergent transitions - size between the estimator are too big
    • adjust step size, adapt delta argument default is .95, increase value to decrease step size
tidy(stan_model)
## # A tibble: 2 x 3
##   term         estimate std.error
##   <chr>           <dbl>     <dbl>
## 1 (Intercept)  68.6      2.25    
## 2 song_age     -0.00583  0.000744

Exceeding the maximum treedepth

  • Sample evaluates branches and looks for a good place to “U-Turn”.
  • Max tree depth indicated poor efficiency and may terminate the process before finding a good stopping place.
    • The posterior distribution is not being efficiently sampled.
    • max_treedepth argument, default = 10, increasing allows the sampler to look further for a “U-turn”
tidy(stan_model)
## # A tibble: 2 x 3
##   term         estimate std.error
##   <chr>           <dbl>     <dbl>
## 1 (Intercept)  68.6      2.26    
## 2 song_age     -0.00584  0.000734
Tuning the estimation
  • Estimation errors are threats to the validity of the model



Using R squared statistic

  • how well the predictors are able to explain the outcome variable.
    • The proportion of variance in the outcome variable explained by the predictors
    • known as the coefficient of determination

R2=1(YiY^i)2i(YiY¯)2

In the lm model R-squared was

summary(lm_model)$r.squared
## [1] 0.2305064

R-squared of a Bayesian model

ss_res <- var(residuals(stan_model))
ss_total <- var(fitted(stan_model)) + var(residuals(stan_model))
1 - (ss_res/ss_total)
## [1] 0.2295941





Assessing Model Fit

Posterior predictive model checks and comparing to our observed data

predictions <- posterior_linpred(stan_model)
predictions[1:10, 1:5] # print first 10 iterations for songs 1 through 5
##           
## iterations        1        2        3        4        5
##       [1,] 37.39273 37.39273 37.39273 37.39273 37.39273
##       [2,] 38.40191 38.40191 38.40191 38.40191 38.40191
##       [3,] 37.67996 37.67996 37.67996 37.67996 37.67996
##       [4,] 39.02225 39.02225 39.02225 39.02225 39.02225
##       [5,] 38.86703 38.86703 38.86703 38.86703 38.86703
##       [6,] 36.22995 36.22995 36.22995 36.22995 36.22995
##       [7,] 38.42025 38.42025 38.42025 38.42025 38.42025
##       [8,] 38.17125 38.17125 38.17125 38.17125 38.17125
##       [9,] 35.55546 35.55546 35.55546 35.55546 35.55546
##      [10,] 35.64800 35.64800 35.64800 35.64800 35.64800
Note. deviations from these predictors are an indicator of poor fit.
ex let us examine the deviated predictive score for the first song vs the mean for song popularity
iter1 <- predictions[1,]
iter2 <- predictions[2,]

rbind(summary(songs$popularity),
summary(iter1),
summary(iter2)
)
##          Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
## [1,]  0.00000 47.50000 56.00000 52.54884 62.00000 85.00000
## [2,] 37.39273 47.23476 53.09897 52.58368 58.58531 68.17250
## [3,] 38.40191 47.64808 53.15726 52.67317 58.31145 67.31822
Note. means are similar, mins and max are not similar

let’s check a predictive posterior for a random song

songs[45,7]
## NULL
summary(predictions[,45])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   39.60   43.54   44.51   44.50   45.45   51.69
The score for song # 45 was 53, our mean posterior predictive score is 44.5 meaning that our model is not doing too well in predicting scores.



Model fit with posterior predictive model checks

using r-squared posterior distribution in bayesian methods

r2_posterior <- bayes_R2(stan_model)
summary(r2_posterior)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.04953 0.19993 0.22959 0.22931 0.25966 0.38640

95% credible interval

quantile(r2_posterior, probs=c(.025, .975))
##      2.5%     97.5% 
## 0.1465685 0.3082826

Hist of the r-squared distribution

r2_posterior <- data.frame(r2_posterior)
ggplot(data=r2_posterior, aes(x=r2_posterior)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Compare the distributions from all replications to the observed data
# the dark blue line represents our data.
pp_check(stan_model, "dens_overlay")

Test characteristics of the outcome variable from all replications
  • means from each replication is plotted as a histogram against the mean of the observed variable.
# the dark blue line represents our data.
pp_check(stan_model, "stat")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

###### Note. our observed mean falls within our predicted posterior distribution range, the model fits.

examining the mean and standard deviation for replications and observed data. Our observe mean and sd falls near the center. Evidence that our model fits the data.

pp_check(stan_model, "stat_2d")



Comparing Bayesian models

  • The loo package, Leave-one-out, is a tool for approximated cross-validation
    • elpd_loo = meaning relative to comparing it to another model.
    • p_loo = effective parameters in the model
    • looic = loo estimate converted into a deviance scale, -2*elpd_loo
loo(stan_model)
## 
## Computed from 4000 by 215 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -888.0 13.0
## p_loo         3.2  0.4
## looic      1776.1 25.9
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

comparing it to another model, a positive elpd_diff score favors the second model.

compare(loo(stan_model), loo(stan_model_2))
## elpd_diff        se 
##       0.7       2.1
se helps us decide if the difference is meaningful, if the difference of the model is smaller than the se than it is not meaningful.





Presenting and Using a Bayesian Regression



Visualizing a Bayesian Model.

saving model coefficients

tidy_coef
## # A tibble: 2 x 3
##   term         estimate std.error
##   <chr>           <dbl>     <dbl>
## 1 (Intercept)  68.7      2.31    
## 2 song_age     -0.00587  0.000747
model_intercept
## [1] 68.72257
model_slope
## [1] -0.005866562

creating the plot

  • specifying model parameters to make the regression line
ggplot(songs, aes(x=song_age, y=popularity)) +
  geom_point() + 
  geom_abline(intercept=model_intercept, slope=model_slope)

using the posterior distribution to plot our uncertainty about the parameter estimates.

draws <- spread_draws(stan_model, `(Intercept)`, song_age)
head(draws)
## # A tibble: 6 x 5
##   .chain .iteration .draw `(Intercept)` song_age
##    <int>      <int> <int>         <dbl>    <dbl>
## 1      1          1     1          68.5 -0.00565
## 2      1          2     2          70.9 -0.00649
## 3      1          3     3          67.0 -0.00512
## 4      1          4     4          70.6 -0.00629
## 5      1          5     5          67.1 -0.00558
## 6      1          6     6          69.6 -0.00581
ggplot(songs, aes(x=song_age, y=popularity)) +
  geom_point() + 
  geom_abline(data=draws, aes(intercept=`(Intercept)`, slope=song_age),
              size=.1, alpha=.2, color="skyblue") +
  geom_abline(intercept=model_intercept, slope=model_slope)



Making predictions

  • The posterior_predict function is useful for making predictions on new data that was not in the model.
posteriors <- posterior_predict(stan_model)
posteriors[1:10, 1:5]
##              1         2         3        4         5
##  [1,] 19.22087 33.621913 36.696973 30.38532  8.051973
##  [2,] 59.81539 53.360151 38.751336 27.25390  7.674771
##  [3,] 40.04493 27.925747 34.313162 42.16199 57.464111
##  [4,] 35.03974 39.546218 35.988896 35.52850 34.867065
##  [5,] 27.39924  5.308039 17.386039 24.83862  2.785586
##  [6,] 37.46735 25.465261 21.656415 45.95478 13.686074
##  [7,] 37.58420 41.673328 -2.011426 17.20164 57.471732
##  [8,] 43.01953 29.303537 23.715073 33.28803 32.348840
##  [9,] 34.09349 37.979895 39.037511 27.15901 71.799770
## [10,] 46.54097 11.704264 54.742663 57.11819 26.145158

Predictions for new data

predict_data <- data.frame(song_age = 663, artist_name = "Beyoncé")

predict_data
##   song_age artist_name
## 1      663     Beyoncé

creates predictions from the posterior distributions of all draws

new_predictions <- posterior_predict(stan_model, newdata = predict_data)
new_predictions[1:10,]
##  [1] 37.21357 69.61744 48.08095 65.40637 66.58972 58.41120 56.01635
##  [8] 58.26399 35.47280 56.17266

Visualizing predictions

  • formatting the data
# Convert to data frame and rename variables
new_predictions <- as.data.frame(new_predictions)
colnames(new_predictions) <- c("Adele", "Taylor Swift", "Beyoncé")

# Create tidy data structure
plot_posterior <- gather(new_predictions, key = "artist_name", value = "predict")

create the plot

ggplot(plot_posterior, aes(x=predict)) +
  facet_wrap(~artist_name, ncol=1) +
  geom_density()