EDA for vocabulary score vs. social class Before we conduct inference, we should take a look at the distributions of vocabulary scores across the levels of (self identified) social class. * Make this a histogram, using an appropriate binwidth. Facet this histogram, wrapping by social class level. > glimpse(gss) Rows: 795 Columns: 2 $ wordsum 6, 9, 6, 5, 6, 6, 8, 10, 8, 9, 7, 10, 3, 8, 3, 2, 6, 1, 9, ... $ class "MIDDLE", "WORKING", "WORKING", "WORKING", "WORKING", "WORK... # Using gss, plot wordsum ggplot(gss, mapping = aes(x = wordsum)) + # Add a histogram layer geom_histogram(binwidth = 1) + # Facet by class facet_wrap(~class) ------------------------------------------------------------------------------------------------------------------------------------------- ANOVA for vocabulary score vs. (self identified) social class Let's conduct the ANOVA for evaluating whether there is a difference in the average vocabulary scores between the levels of (self identified) social class. # Run an analysis of variance on wordsum vs. class aov_wordsum_class <- aov(wordsum ~ class, gss) # Tidy the model tidy(aov_wordsum_class) ------------------------------------------------------------------------------------------------------------------------------------------- Checking the constant variance condition In addition to checking the normality of distributions of vocabulary scores across levels of social class, we need to check that the variances from each are roughly constant. gss %>% # Group by class group_by(class) %>% # Calculate the std dev of wordsum as std_dev_wordsum summarize(std_dev_wordsum = sd(wordsum)) # A tibble: 4 x 2 class std_dev_wordsum 1 LOWER 2.24 2 MIDDLE 1.89 3 UPPER 2.34 4 WORKING 1.87 ------------------------------------------------------------------------------------------------------------------------------------------- Compare pairwise means Compare means of vocabulary scores using the pairwise.t.test() function for all pairings of social classes. * Conduct a pairwise t-test on vocabulary scores and social class. Set p.adjust.method to "none" (we'll adjust the significance level, not the p-value). # Run a pairwise t-test on wordsum and class, without adjustment t_test_results <- pairwise.t.test(gss$wordsum, gss$class, p.adjust.method = "none") # Tidy the result tidy(t_test_results)