In the video, you learned about the county-level birth rate data. Counties exist within states and perhaps states contribute to variability. During these exercises, you'll build a series of mixed-effects models using this data. In this exercise, you'll build a hierarchical model with a global intercept (fixed-effect) and random-effect for state. You will then look at the summary() of the model and the plot() of the residuals. Like other types of regression analysis, examining residuals can help you see if anything is wrong with the model. With lmer(), there are two methods for doing this: y ~ 1 + (1 | random_effect) or the shortcut, y ~ (1 | random_effect). Use the shortcut in this exercise so that your answer passes the DataCamp test. When building mixed-effect models, starting with simple models such as the global intercept model can check to see if problems exist with either the data or code. A global intercept assumes a single intercept can describe all of the variability in the data. One way to view a global intercept is that you cannot do any better modeling that data than to only model the mean without including any other predictor variables. #Fit a lmer() model to the county_births_data data. Include State as a random-effect that predicts BirthRate. birth_rate_state_model <- lmer(BirthRate ~ (1 | State), data = county_births_data) #Look at the summary from the model using summary() and plot the residuals of the model with plot(). summary(birth_rate_state_model) plot(birth_rate_state_model) # Plot the predicted values from the model on top of the plot shown during the video county_births_data$birthPredictState <- predict(birth_rate_state_model, county_births_data) ggplot() + theme_minimal() + geom_point(data = county_births_data, aes(x = TotalPopulation, y = BirthRate)) + geom_point(data = county_births_data, aes(x = TotalPopulation, y = birthPredictState), color = 'blue', alpha = 0.5) You've now built a simple lmer() model that included a global mean. You might do this as a first step modeling your own data or as a debugging step. For example, if R could not fit a lmer() model, this simple model could help you see if the trouble was with your random-effects. In the next exercise, you will see how to include a fixed-effect. Later in this chapter, you will learn more about interpreting lmer() outputs. ###################################################################################################################################################### During the previous exercise, you built a model with only a global intercept. Usually, hierarchical models include predictor variables of interest. The county-level birth data includes the average age of the mother, AverageAgeofMother. Perhaps this explains a county's birth rate. In this case, the formula in R "knows" AverageAgeofMother is numeric and will treat the corresponding coefficient as a slope. Build a hierarchical model with county_births_data, which has been loaded for you, and include a fixed-effect. Does the average age of the mother at birth predict the birth rate? #Build a model with AverageAgeofMother as a fixed-effect slope parameter including State as a random-effect. The response variable is BirthRate. age_mother_model <- lmer(BirthRate ~ AverageAgeofMother + (1 | State), county_births_data) summary(age_mother_model) ###################################################################################################################################################### Random-effect slopes In the previous exercise, you estimated random-effect intercepts for each state. This allowed you to account for each state having its own intercept. During this exercise, you will estimate a random-effect slope for each state. For example, perhaps the log(total population of each county), LogTotalPop, changes the birth rate of a county AND varies by state. Recall from the video, a random-effect slope may be estimated for each group using (slope | group) syntax with lmer(). During this exercise, fit a mixed-effects model estimating the effect of the mother's average age while accounting for state and total population as random-effects. #Build the model from the previous exercise with AverageAgeofMother as a fixed-effect and State a random-effect. model_a <- lmer(BirthRate ~ AverageAgeofMother + (1 | State), county_births_data) tidy(model_a) #Build the previous model, adding LogTotalPop as a random-effects slope. #Include the AverageAgeofMother as fixed-effect and LogTotalPop and State as random-effects model_b <- lmer(BirthRate ~ AverageAgeofMother + (LogTotalPop | State), county_births_data) tidy(model_b) The estimate for AverageAgeofMother became more negative (farther from zero) in model B. Including LogTotalPop changed the estimate for AverageAgeofMother and is an important coefficient. You may have also noticed the AverageAgeofMother coefficient in model b had a smaller standard error (std.error). This also caused a larger t-value (statistic). Both of these show including LogTotalPop in the model explains a source of variability in the dataset. ###################################################################################################################################################### Uncorrelated random-effect slope In the previous exercise, you use lme4's' default setting and assumed slopes and intercepts within each group were correlated for the random-effect estimates. However, this assumption may not always be valid or we may want to simplify the model if we are having trouble numerically fitting the model. Building a model with uncorrelated random-effects is one method for potentially simplifying the model. Furthermore, lmer() models can be hard to fit and checking model outputs can be a useful step when debugging your model. Alternatively, you may have subject matter expertise and want to assume the random-effects are not correlated. To fit a model with an uncorrelated random-effect slope, use || rather than | with lmer() syntax. The second model you built in the previous exercise, model_b has been loaded for you. Compare the outputs of model_c to the old outputs from model_b. #Build a model with AverageAgeofMother as a fixed-effect and LogTotalPop as an uncorrelated random-effect slope with State. model_c <- lmer(BirthRate ~ AverageAgeofMother + (LogTotalPop || State), county_births_data) # Compare outputs of both models summary(model_b) summary(model_c) Fitting model_c gave a singular warning message. In this case, the more complex model is required. The results from model_B provide insight into why. Specifically, look at the summary of model_b. Which statement is true? The random effect correlation for LogTotalPop is -1.00. A perfect negative correlation exists between the random-effect slope and intercept. Thus, fitting the model without the correlation produces an error message. The random-effect slope and random-effect intercept cannot be estimated independent of each other. ###################################################################################################################################################### Fixed- and random-effect predictor In the previous exercises, you fit mixed-effect models with different fixed- and random-effects. Sometimes, a model can have the same predictor as both a fixed and random-effect. For example, perhaps you are interested in estimating the average effect the age of a mother at birth (AverageAgeofMother). Including the predictor as fixed-effect allows you to estimate the effect of a mother's age across all locations. Including the predictor as a random-effect allows you to simultaneously account (or correct) for different slope estimates among states. #Construct a lmer() using AverageAgeofMother as a fixed-effect and AverageAgeofMother as a random-effect nested within State to predict BirthRate with the #county_births_data. Make sure the fixed-effect goes before the random-effects in the formula. out <- lmer(BirthRate ~ AverageAgeofMother + (AverageAgeofMother | State), county_births_data) summary(out) ###################################################################################################################################################### Extracting coefficients We often want to know what values a model estimates as coefficients. Although the summary() function provides the model outputs, we might also want to directly access model outputs. The fixed-effects estimates can be called directly using the fixef() function. The random-effects estimates can be called directly using the ranef() function. We can also extract confidence intervals for the fixed-effects using the function confint(). The broom.mixed package also contains tidy methods for extracting model results from lmer() models, namely the tidy() function. However, these results are more complex and less tidy than many tidy outputs due to the complexity of mixed-effect models. # Extract the fixed-effect coefficients fixef(out) # Extract the random-effect coefficients ranef(out) # Estimate the confidence intervals confint(out) # Use the tidy function tidy(out, conf.int = TRUE) ###################################################################################################################################################### During this exercise, you will extract and plot fixed-effects. Besides plotting the coefficients (with geom_point()) and their 95% confidence intervals (with geom_linerange()), you will add a red-line to the plot to help visualize where zero is located (using geom_hline()). If the 95% confidence intervals do not include zero, the coefficient's estimate differs from zero. coord_flip() is required because ggplot does not allow for xmin or xmax, only ymin and ymax. And, theme_minimal() changes the theme from the default. #Extract the coefficients from the model out using tidy() from the broom.mixed package. Include the confidence interval. #Use the exiting code to filter out the random-effect estimates. coef_estimates <- tidy(out, conf.int = TRUE) %>% filter(effect == "fixed") # Print the new dataframe print(coef_estimates) # Plot the results using ggplot2 ggplot(coef_estimates, aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) + geom_hline( yintercept = 0, color = 'red' ) + geom_linerange() + geom_point() + coord_flip() + theme_minimal()