In toxicology studies, organisms are often dosed and binary outcomes often occur such as dead/alive or inhibited/mobile. This is called a dose-response study. For example, the response to different doses might be mortality (1) or survival (0) at the end of a study. During this exercise, we will fit a logistic regression using all three methods described in the video. You have been given two datasets. df_long, in a "long" format with each row corresponding to an observation (i.e., a 0 or 1). df_short, in an aggregated format with each row corresponding to a treatment (e.g., 6 successes, 4 failures, number of replicates = 10, proportion = 0.6). When using the "wide" or "short" data frame, the "success, failure" methods for inputing logistic regression results require success and failure be a matrix. The easiest way to do this is with the cbind() function. Tip: When working with data in the wild, always check to see what 0 and 1 correspond to. Different people use different notation and assumptions can cause problems for you if you assume wrong! #Using the data df_long, fit a glm() with the "binomial" distribution family (or, synonymously, binomial error term) #where mortality is predicted by dose. fit_long <- glm(mortality ~ dose, data = df_long, family = "binomial") summary(fit_long) #Using the data df_short, fit a glm() with the "binomial" distribution family where the matrix cbind(mortality, survival) #is predicted by dose. fit_short <- glm(cbind(mortality, survival) ~ dose, data = df_short, family = "binomial") summary(fit_short) #Using the data df_short, fit a glm() with the "binomial" distribution family where mortalityP is predicted #by dose with weights of nReps. fit_short_p <- glm(mortalityP ~ dose , data = df_short, weights = nReps , family = "binomial") summary(fit_short_p) All three methods produced outputs with the same coefficient estimates, but differed with respect to the degrees of freedom estimated. As a tip, use the input method that best matches your data and requires the least amount of data wrangling. The difference in degrees of freedom does not change the models. If you use model selection, make sure your data is in the same format (long vs. Short) for all models you compare. ############################################################################################################################ Poisson Regression A Poisson regression is another type of GLM. This requires integers or count data (i.e., 0, 1, 2, 3,...). For some situations, a Poisson regression can be more powerful (e.g., detecting statistically significantly trends) than a linear model or "Gaussian" regression. During this exercise, we're going to build a linear regression using the lm() function and a Poisson regression using glm(). The objects x and y are loaded into R for you. x: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 y: 0 1 0 1 0 1 0 1 0 2 1 2 0 1 1 0 1 5 1 1 # Fit the linear model summary(lm(y ~ x)) # Fit the generalized linear model summary(glm(y ~ x, family = "poisson")) ############################################################################################################################ Plotting GLMs Often, we want to "look" at our data and trends in our data. ggplot2 allows us to add trend lines to our plots. The default lines are created using a technique called local regression. However, we can specify that different models are used to create the lines, including GLMs. # Plot the data using jittered points and the default stat_smooth ggplot(data = df_long, aes(x = dose, y = mortality)) + geom_jitter(height = 0.05, width = 0.1) + stat_smooth(fill = 'pink', color = 'red') #This time, specify the method to be "glm" and family to be "binomial" to fit a logistic regression. ggplot(data = df_long, aes(x = dose, y = mortality)) + geom_jitter(height = 0.05, width = 0.1) + stat_smooth(method = "glm", method.args = list(family = "binomial"))