Exploring NY hate data The State of New York reports the number of hate crimes committed against people in each county. During this case study, you will examine and see if the number of hates crimes are changing through time. These exercises serve two purposes. First, they demonstrate how generalized mixed-effect regressions (glmer()) can be used for repeated measures in R. Second, they provide another example of using mixed-effect models for statistical inference. Give the different population sizes of New York counties, you can reasonably assume the need for random-effect intercepts a priori. However, do you need random-effect slopes? Plot the data to see if trends appear to vary by county. Additionally, plotting the data will help you see what is going on. # Plot the TotalIncidents of hate crimes in NY by Year, grouped by County ggplot(data = hate, aes(x = Year, y = TotalIncidents, group = County)) + geom_line() # Add a Poisson trend line for each county ggplot(data = hate, aes(x = Year, y = TotalIncidents, group = County)) + geom_line() + geom_smooth(method="glm", method.args = list(family = "poisson"), se = FALSE) Different trends across groups implies you should use different random-effect slopes for each group. ###################################################################################################### Building the model As part of the model building process, first build a simple Poisson regression. A glm() runs quicker than glmer() and is easier to debug. For example, the Poisson regression might catch if you have non-integer data. Plus, the Poisson regression can provide intuition about the more complicated model. #Build a Poisson regression with the glm() function by setting family to "poisson". #Model how the TotalIncidents of hate crimes are predicted by Year and County as fixed-effects #in your formula using the hate data frame. glm(TotalIncidents ~ Year + County, data = hate, family = "poisson") #The glm() ran without any problems. Now, build a glmer(). #Use County as a random-effect intercept and Year as both a fixed- and random-effect slope with the hate data. #Make sure you correctly specify which family you are using in the glmer(). glmer(TotalIncidents ~ Year + (Year | County), data = hate, family = "poisson") #glmer() gave this output: convergence code 0; 3 optimizer warnings; 0 lme4 warnings, #which means it did not converge. The reason is the scale of Year. #Fix this by creatingYear2, which starts with zero rather than 2010. hate$Year2 <- hate$Year - min(hate$Year) glmer_out <- glmer(TotalIncidents ~ Year2 + (Year2 | County), data = hate, family = "poisson") # Examine the summary of glmer_out summary(glmer_out) Based upon the regression coefficient for Year2, what was the trend for hate crimes in New York between 2010 and 2016? There was a statistically significant decreasing trend. As you may have noticed, Poisson regression coefficients can be hard to describe. ###################################################################################################### Displaying the results The last, and arguably most important step in creating a model, is sharing the results. During this exercise, you'll extract out the county-level estimates and plot them with ggplot2. The county-level random-effect slopes need to be added to the fixed-effect slopes to get the slope estimates for each county. In addition to this addition, the code includes ordering the counties by rate of crime (the slope estimates) to help visualize the data clearly. * Extract and save the Year2 slope estimates as Year2_slope. * Extract out the County-level random-effects. * Create a new column for the slope by adding together fixed- and random-effect slopes. # Extract out the fixed-effect slope for Year2 Year2_slope <- fixef(glmer_out)['Year2'] # Extract out the random-effect slopes for county county_slope <- ranef(glmer_out)$County # Create a new column for the slope county_slope$slope <- county_slope$Year2 + Year2_slope # Use the row names to create a county name column county_slope$county <- rownames(county_slope) # Create an ordered county-level factor based upon slope values county_slope$county_plot <- factor(county_slope$county, levels = county_slope$county[order(county_slope$slope)]) # Now plot the results using ggplot2 ggplot(data = county_slope, aes(x = county_plot, y = slope)) + geom_point() + coord_flip() + theme_bw() + ylab("Change in hate crimes per year") + xlab("County") Notice how the change in hate crime rates varies greatly across counties. In this case, a random-effect model can help capture this source of variability.