Visualizing Maryland crime data
Before fitting a model, plotting the data can be helpful to see if trends or data points jump out, outliers exist, 
or other attributes of the data require future consideration. Using ggplot2, you can plot lines for county and examine 
how crimes change through time. For this exercise, examine Maryland crime data (md_crime). This includes the Year,
a count of violent Crimes in the county, and the County's name.

To explore this data, first plot the data points for each county through time. This lets you see how each county changes through time. 
Rather than using an aesthetic such as color, group is used here because there are too many counties to easily distinguish colors.
After plotting the raw data, add trend lines for each county.

Both the connect points (geom_line) and trend lines (geom_smooth) provide insight into what, if any, kinds of random effects are required. 
If all of the points appear to have similar ranges and means, a random-effect intercept may not be important. Likewise, if trends look 
consistent across counties (i.e., the trend lines look similar or parallel across groups), a random-effect slope may not be required.

#Plot how Crime (y variable) has changed across Year (x variable) in each County (group variable) using the md_crime data.
plot1 <- ggplot(data = md_crime, aes(x = Year, y = Crime, group = County)) +
    geom_line() + theme_minimal() + ylab("Major crimes reported per county")
print(plot1)

# Add the trend line for each county
plot1 + geom_smooth(method = "lm", se = FALSE)


#############################################################################################################################################


Rescaling slopes
The last plot showed changes in crime rate varied by county. This shows you that you should include Year as both a random- and fixed-effect 
in your model. Including Year this way will estimate a global slope across all counties as well as slope for each county. The fixed-effect 
slope estimates the change in major crimes across all Maryland counties. The random-effect slope estimates model for that counties have
different changes in crime.

But, fitting this model produces a warning message! To address this warning, change Year from starting at 2006 to starting at 0. We provide 
you with this new variable, Year2 (e.g., 2006 in Year is 0 in Year2). Sometimes when fitting regression, you need to scale or center the
intercept to start at 0. This improves numerical stability of the model.

# Fit the model with Year as both a fixed and random-effect
lmer(Crime ~ Year + (1 + Year | County) , data = md_crime)

# Fit the model with Year2 rather than Year
lmer(Crime ~ Year2 + (1 + Year2 | County) , data = md_crime)

Notice how you had to rescale year so that the model could be fit by R. This is a common regression modeling approach to improve model fits. 
Some people will rescale the slope so the middle value is zero instead. Either approach is acceptable. The benefit of starting at zero is that 
it can sometimes create a regression coefficient that is easier to explain and understand. In the next exercise, you will examine the second model.


#############################################################################################################################################


Null hypothesis testing
lmerTest uses the same lmer() syntax as the lme4 package, but includes different outputs. 
During this exercise, you will fit a lmer() model using lmerTest and lme4.

# Load lmerTest
library(lmerTest)

# Fit a lmer use lme4
out_lme4 <- 
	lme4::lmer(Crime ~ Year2 + (1 + Year2 | County), 
              data = md_crime)

# Fit a lmer use lmerTest
out_lmerTest <- 
	lmerTest::lmer(Crime ~ Year2 + (1 + Year2 | County), 
                  data = md_crime)

# Look at the summaries
summary(out_lme4)
summary(out_lmerTest)


#############################################################################################################################################


Model comparison with ANOVA
Comparing models can be difficult. Many methods exist although these are beyond the scope of this course such as model selection (e.g., AIC).
Analysis of Variance (ANOVA) exists as a basic option to compare lmer models. The ANOVA tests to see if one model explains more variability 
than a second model. The ANOVA does this by examining the amount of variability explained by the models.
For example, you can see if Year predicts Crime in Maryland. To do this, build a null model with only County as a random-effect and a year model
that includes Year. You can then compare the two models using the anova() function.
If Year explains a significant amount of variability, then the P-value will be less than your pre-specified threshold (usually 0.05).

# Build the Null model with only County as a random-effect
null_model <- lmer(Crime ~ (1 | County) , data = md_crime)

# Build the Year2 model with Year2 as a fixed and random slope and County as the random-effect
year_model <- lmer(Crime ~ Year2 + (1 + Year2 | County) , data = md_crime)

# Compare null_model and year_model using an anova
anova(null_model, year_model)