Exploring the data
One of the first things to do with new datasets is to plot it prior to analysis. 
During this exercise, you will plot the data using ggplot2 and create a 
publication quality figure. This will help you to see the data.

This data examines if two different drugs change the amount of sleep individuals get. 
The response variable of interest is the amount of extra sleep a patient gets. 
The predictor variable is the drug group. The ID of each patient allows us to do a 
repeated measures analysis. This is a random-effect intercept and corresponds to the 
baseline effect of giving a person a sleeping drug. We do not care how much an 
individuals sleeps in this case, only the change in sleep of the groups.

The data.frame sleep contains these variables.

# Plot the raw data
ggplot(data = sleep, aes(x = group, y = extra)) +
	geom_point()

#Replace geom_point() with geom_line() because you are plotting results 
#from the same individuals.
ggplot(data = sleep, aes(x = group, y = extra, group = ID)) +
	geom_line()

ggplot(sleep, aes(x = group, y = extra, group = ID)) +
  geom_line() +
  xlab(label = "Drug") +
  ylab(label = "Extra sleep") + 
  theme_minimal()
 

#####################################################################################


Building models
In this exercise, you will build a simple linear model (lm()) and then build a linear 
mixed-effects model (lmer()). The purpose of the first step is to make sure the data 
works well with a simple model because lm() outputs are easier to debug than lmer() 
outputs. During the next exercise, you will compare two different methods of 
statistical inference on the model.

#Build a linear model using lm(). The goal of this step is to simply make sure the 
#model builds without errors or warnings. Have extra predicted by fixed-effects 
#group (1st) and ID (2nd) from the sleep data.
lm(extra ~ group + ID, data = sleep)

#Build a lmer() model with extra predicted by the fixed-effect group and 
#random-effect intercept ID using the sleep data. Save the output as lmer_out.
lmer_out <- lmer(extra ~ group + (1 | ID), data = sleep)


#####################################################################################


Comparing regressions and ANOVAs
In the previous exercise, you built a regression model. Two methods for statistical 
inference include examining the amount of variance explained by coefficients in the 
model (an ANOVA-like analysis) and using linear predictor variables to model the 
data (a regression analysis framework). The choice of approaches largely depends upon
personal preference and statistical training. Both of these approaches may be done
using frequentists or Bayesian methods. Although this course only uses frequentist 
methods, the same ideas apply to Bayesian models.

First, you will run an anova() on it to see if group explains a significant amount 
of variability. Second, you will examine the regression coefficient from group to 
see if it significantly differs from zero.

# Run an anova() on lmer_out
anova(lmer_out)

# Look at the summary() of lmer_out to see the regression coefficient for group.
summary(lmer_out)

Notice how both models find a statistically significant effect, but do so using 
different tests. In this case, both models produce identical p-values. Personally, 
I prefer regression inferences to ANOVA inferences because the output includes 
the estimated difference by default. The ANOVA tells us which variables explain 
a significant amount of variability, but the ANOVA does not tell us how things 
are different. Conversely, the regression tells us how much one unit of input 
would be expected to change the model's output.


#####################################################################################


Plotting results
In the previous exercises, you have examined the raw data, used the data to build 
a model, and applied the model for statistical inferences. You found drug 2 
increased the amount of extra sleep compared to drug 1. During this exercise, 
you will plot the results to see how much drug 2 increased extra sleep.

First, wrangle the data by using pivot_wider() function from the tidyr package. 
Then, calculate the difference in extra sleep for each individual.
Last, plot this difference as a histogram.

# Load the tidyr package
library(tidyr)

# Make the data wider
sleep_wide <- 
	pivot_wider(sleep, 
                names_from = group, values_from = extra)

# Calculate the difference 
sleep_wide$diff <- sleep_wide$`2` - sleep_wide$`1`

ggplot(sleep_wide, aes(x = diff)) + 
  geom_histogram() +
  ylab(label = "Count") +
  xlab(label = "Extra sleep from drug 2") +
  theme_bw()