Exploring the data One of the first things to do with new datasets is to plot it prior to analysis. During this exercise, you will plot the data using ggplot2 and create a publication quality figure. This will help you to see the data. This data examines if two different drugs change the amount of sleep individuals get. The response variable of interest is the amount of extra sleep a patient gets. The predictor variable is the drug group. The ID of each patient allows us to do a repeated measures analysis. This is a random-effect intercept and corresponds to the baseline effect of giving a person a sleeping drug. We do not care how much an individuals sleeps in this case, only the change in sleep of the groups. The data.frame sleep contains these variables. # Plot the raw data ggplot(data = sleep, aes(x = group, y = extra)) + geom_point() #Replace geom_point() with geom_line() because you are plotting results #from the same individuals. ggplot(data = sleep, aes(x = group, y = extra, group = ID)) + geom_line() ggplot(sleep, aes(x = group, y = extra, group = ID)) + geom_line() + xlab(label = "Drug") + ylab(label = "Extra sleep") + theme_minimal() ##################################################################################### Building models In this exercise, you will build a simple linear model (lm()) and then build a linear mixed-effects model (lmer()). The purpose of the first step is to make sure the data works well with a simple model because lm() outputs are easier to debug than lmer() outputs. During the next exercise, you will compare two different methods of statistical inference on the model. #Build a linear model using lm(). The goal of this step is to simply make sure the #model builds without errors or warnings. Have extra predicted by fixed-effects #group (1st) and ID (2nd) from the sleep data. lm(extra ~ group + ID, data = sleep) #Build a lmer() model with extra predicted by the fixed-effect group and #random-effect intercept ID using the sleep data. Save the output as lmer_out. lmer_out <- lmer(extra ~ group + (1 | ID), data = sleep) ##################################################################################### Comparing regressions and ANOVAs In the previous exercise, you built a regression model. Two methods for statistical inference include examining the amount of variance explained by coefficients in the model (an ANOVA-like analysis) and using linear predictor variables to model the data (a regression analysis framework). The choice of approaches largely depends upon personal preference and statistical training. Both of these approaches may be done using frequentists or Bayesian methods. Although this course only uses frequentist methods, the same ideas apply to Bayesian models. First, you will run an anova() on it to see if group explains a significant amount of variability. Second, you will examine the regression coefficient from group to see if it significantly differs from zero. # Run an anova() on lmer_out anova(lmer_out) # Look at the summary() of lmer_out to see the regression coefficient for group. summary(lmer_out) Notice how both models find a statistically significant effect, but do so using different tests. In this case, both models produce identical p-values. Personally, I prefer regression inferences to ANOVA inferences because the output includes the estimated difference by default. The ANOVA tells us which variables explain a significant amount of variability, but the ANOVA does not tell us how things are different. Conversely, the regression tells us how much one unit of input would be expected to change the model's output. ##################################################################################### Plotting results In the previous exercises, you have examined the raw data, used the data to build a model, and applied the model for statistical inferences. You found drug 2 increased the amount of extra sleep compared to drug 1. During this exercise, you will plot the results to see how much drug 2 increased extra sleep. First, wrangle the data by using pivot_wider() function from the tidyr package. Then, calculate the difference in extra sleep for each individual. Last, plot this difference as a histogram. # Load the tidyr package library(tidyr) # Make the data wider sleep_wide <- pivot_wider(sleep, names_from = group, values_from = extra) # Calculate the difference sleep_wide$diff <- sleep_wide$`2` - sleep_wide$`1` ggplot(sleep_wide, aes(x = diff)) + geom_histogram() + ylab(label = "Count") + xlab(label = "Extra sleep from drug 2") + theme_bw()