Build Text Speed Model Let's practice defining models. Remember to name your latent variables with a name that is not in your current dataset. However, manifest variables should be column names from your dataset. Use the HolzingerSwineford1939 dataset to create a new model of textual speed with the variables x4, x5, and x6, which represent reading comprehension and understanding word meaning. x7, x8, and x9 represent speed counting and addition. The model will have one latent variable that predicts scores on these six manifest variables. * Name your model text.model. * Name your latent variable textspeed. * Use variables x4, x5, x6, x7, x8, and x9 as the manifest variables. # Load the lavaan library library(lavaan) # Look at the dataset data(HolzingerSwineford1939) head(HolzingerSwineford1939[ , 7:15]) # Define your model specification text.model <- 'textspeed =~ x4 + x5 + x6 + x7 + x8 + x9' You have defined your first model, which has one latent variable and six manifest variables. ------------------------------------------------------------------------------------------------------------------------ Build Political Democracy Model You can now expand your model specification skills to a new dataset. Create a model of Political Democracy ratings from 1960 using the PoliticalDemocracy dataset. This dataset includes ratings of politics in developing countries from the 1960s. Variables y1, y2, y3, and y4 measure freedom of the press, freedom of political opposition, election fairness, and effectiveness of the legislature. You should create a model with one latent variable, named poldemo60, and four manifest variables. * Name your model politics.model. * Name your latent variable poldemo60. * Use variables y1, y2, y3, and y4 as the manifest variables. # Load the lavaan library library(lavaan) # Look at the dataset data(PoliticalDemocracy) head(PoliticalDemocracy) # Define your model specification politics.model <- 'poldemo60 =~ y1 + y2 + y3 + y4' You have set up a model of freedom of the press and election fairness. ------------------------------------------------------------------------------------------------------------------------ Analyze Text Speed Model Let's analyze your text speed model from the first lesson. This model included one latent variable, textspeed, represented by six manifest variables. Variables x4, x5, x6 measured reading comprehension, and x7, x8, and x9 measured speed counting and addition from the HolzingerSwineford1939 dataset. We will use the cfa() function to analyze text.model using the data from HolzingerSwineford1939. Our summary should indicate the model was identified with 9 degrees of freedom. You should examine the latent variable estimates to determine which items measure the latent variable well (high scores) and which do not (low scores). * Use the cfa() function to fit a model called text.fit. Remember to include both model and data arguments! * Use the summary() function to view the model fit. # Load the lavaan library library(lavaan) # Load the dataset and define model data(HolzingerSwineford1939) text.model <- 'textspeed =~ x4 + x5 + x6 + x7 + x8 + x9' # Analyze the model with cfa() text.fit <- cfa(model = text.model, data = HolzingerSwineford1939) # Summarize the model summary(text.fit) lavaan 0.6-11 ended normally after 20 iterations Estimator ML Optimization method NLMINB Number of model parameters 12 Number of observations 301 Model Test User Model: Test statistic 149.786 Degrees of freedom 9 P-value (Chi-square) 0.000 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) textspeed =~ x4 1.000 x5 1.130 0.067 16.946 0.000 x6 0.925 0.056 16.424 0.000 x7 0.196 0.067 2.918 0.004 x8 0.186 0.062 2.984 0.003 x9 0.279 0.062 4.539 0.000 Variances: Estimate Std.Err z-value P(>|z|) .x4 0.383 0.048 7.903 0.000 .x5 0.424 0.059 7.251 0.000 .x6 0.368 0.044 8.419 0.000 .x7 1.146 0.094 12.217 0.000 .x8 0.988 0.081 12.215 0.000 .x9 0.940 0.077 12.142 0.000 textspeed 0.968 0.112 8.647 0.000 ------------------------------------------------------------------------------------------------------------------------ Examine Standardized Loadings You have created and summarized the text-speed model in previous steps using the HolzingerSwineford1939 dataset. You were able to view the coefficients for the model using the summary() function. However, the unstandardized coefficients in the Estimate column are often hard to interpret for how well they represent the latent variable. In this exercise, add the standardized = TRUE argument to your summary() function to view the standardized loadings. Look at the Std.all column for the completely standardized solution to see which variables have a poor relationship to the text speed latent variable. * Use the summary() function on your text.fit model. * Include the argument to view the standardized loadings. * Do not include fit.measures arguments in this exercise. # Load the lavaan library library(lavaan) # Load the data and define model data(HolzingerSwineford1939) text.model <- 'textspeed =~ x4 + x5 + x6 + x7 + x8 + x9' # Analyze the model with cfa() text.fit <- cfa(model = text.model, data = HolzingerSwineford1939) # Summarize the model summary(text.fit, standardized = TRUE) lavaan 0.6-11 ended normally after 20 iterations Estimator ML Optimization method NLMINB Number of model parameters 12 Number of observations 301 Model Test User Model: Test statistic 149.786 Degrees of freedom 9 P-value (Chi-square) 0.000 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all textspeed =~ x4 1.000 0.984 0.846 x5 1.130 0.067 16.946 0.000 1.112 0.863 x6 0.925 0.056 16.424 0.000 0.910 0.832 x7 0.196 0.067 2.918 0.004 0.193 0.177 x8 0.186 0.062 2.984 0.003 0.183 0.181 x9 0.279 0.062 4.539 0.000 0.275 0.273 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .x4 0.383 0.048 7.903 0.000 0.383 0.284 .x5 0.424 0.059 7.251 0.000 0.424 0.256 .x6 0.368 0.044 8.419 0.000 0.368 0.308 .x7 1.146 0.094 12.217 0.000 1.146 0.969 .x8 0.988 0.081 12.215 0.000 0.988 0.967 .x9 0.940 0.077 12.142 0.000 0.940 0.926 textspeed 0.968 0.112 8.647 0.000 1.000 1.000 Looking at Std.all, we can tell that variables 7, 8, and 9 do not measure text speed very well, as these loading coefficients are close to zero. ------------------------------------------------------------------------------------------------------------------------ Explore Fit Indices After reviewing the standardized loadings in the previous exercise, we found that several of the manifest variables may not represent our latent variable well. As a second measure of our model, you can examine the fit indices to see if the model appropriately fits the data. You can look at both the goodness of fit and badness of fit statistics using the fit.measures argument within the summary() function. Remember that goodness of fit statistics, like the CFI and TLI, should be large (over .90) and close to one, while badness of fit measures like the RMSEA and SRMR should be small (less than .10) and close to zero. * Use the summary() function on your text.fit model. * Include the argument to view the fit indices. * Do not include the standardized loadings. # Load the lavaan library library(lavaan) # Load the data and define model data(HolzingerSwineford1939) text.model <- 'textspeed =~ x4 + x5 + x6 + x7 + x8 + x9' # Analyze the model with cfa() text.fit <- cfa(model = text.model, data = HolzingerSwineford1939) # Summarize the model summary(text.fit, fit.measures = TRUE) lavaan 0.6-11 ended normally after 20 iterations Estimator ML Optimization method NLMINB Number of model parameters 12 Number of observations 301 Model Test User Model: Test statistic 149.786 Degrees of freedom 9 P-value (Chi-square) 0.000 Model Test Baseline Model: Test statistic 681.336 Degrees of freedom 15 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 0.789 Tucker-Lewis Index (TLI) 0.648 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -2476.130 Loglikelihood unrestricted model (H1) -2401.237 Akaike (AIC) 4976.261 Bayesian (BIC) 5020.746 Sample-size adjusted Bayesian (BIC) 4982.689 Root Mean Square Error of Approximation: RMSEA 0.228 90 Percent confidence interval - lower 0.197 90 Percent confidence interval - upper 0.261 P-value RMSEA <= 0.05 0.000 Standardized Root Mean Square Residual: SRMR 0.148 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) textspeed =~ x4 1.000 x5 1.130 0.067 16.946 0.000 x6 0.925 0.056 16.424 0.000 x7 0.196 0.067 2.918 0.004 x8 0.186 0.062 2.984 0.003 x9 0.279 0.062 4.539 0.000 Variances: Estimate Std.Err z-value P(>|z|) .x4 0.383 0.048 7.903 0.000 .x5 0.424 0.059 7.251 0.000 .x6 0.368 0.044 8.419 0.000 .x7 1.146 0.094 12.217 0.000 .x8 0.988 0.081 12.215 0.000 .x9 0.940 0.077 12.142 0.000 textspeed 0.968 0.112 8.647 0.000 We can see that our fit indices are poor, with low CFI and TLI and high RMSEA and SRMR values. ------------------------------------------------------------------------------------------------------------------------ Examine Political Democracy For this final exercise, you will put together all the steps you've completed so far in building a one-factor model. You will examine the standardized loadings and fit indices for the political democracy model. The model was analyzed with the cfa() function. You will now use the summary() function with both the standardized and fit.measures arguments to view everything together. In the Std.all column for loadings, you should find that the items appear to measure political democracy well with high numbers close to one. However, when you look at the fit indices, you should find a mix of good and bad values. These results are often common and indicate that the model may fit the data, but also has room for improvement. * Use the summary() function on your politics.fit model. * Include the argument to view both standardized loadings and fit indices. # Load the lavaan library library(lavaan) # Load the data and define model data(PoliticalDemocracy) politics.model <- 'poldemo60 =~ y1 + y2 + y3 + y4' # Analyze the model with cfa() politics.fit <- cfa(model = politics.model, data = PoliticalDemocracy) # Summarize the model summary(politics.fit, standardized = TRUE, fit.measures = TRUE) lavaan 0.6-11 ended normally after 26 iterations Estimator ML Optimization method NLMINB Number of model parameters 8 Number of observations 75 Model Test User Model: Test statistic 10.006 Degrees of freedom 2 P-value (Chi-square) 0.007 Model Test Baseline Model: Test statistic 159.183 Degrees of freedom 6 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 0.948 Tucker-Lewis Index (TLI) 0.843 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -704.138 Loglikelihood unrestricted model (H1) -699.135 Akaike (AIC) 1424.275 Bayesian (BIC) 1442.815 Sample-size adjusted Bayesian (BIC) 1417.601 Root Mean Square Error of Approximation: RMSEA 0.231 90 Percent confidence interval - lower 0.103 90 Percent confidence interval - upper 0.382 P-value RMSEA <= 0.05 0.014 Standardized Root Mean Square Residual: SRMR 0.046 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all poldemo60 =~ y1 1.000 2.133 0.819 y2 1.404 0.197 7.119 0.000 2.993 0.763 y3 1.089 0.167 6.529 0.000 2.322 0.712 y4 1.370 0.167 8.228 0.000 2.922 0.878 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .y1 2.239 0.512 4.371 0.000 2.239 0.330 .y2 6.412 1.293 4.960 0.000 6.412 0.417 .y3 5.229 0.990 5.281 0.000 5.229 0.492 .y4 2.530 0.765 3.306 0.001 2.530 0.229 poldemo60 4.548 1.106 4.112 0.000 1.000 1.000 Our standardized loadings indicate the items measure the latent variable well, but the fit indices are a mix of good values (high CFI, low SRMR) and bad values (low TLI, high RSMEA).