Analyze a Latent Heywood Case A local animal shelter has designed a survey to measure the impact of their Adopt Me program. Viewers rated each dog's picture, background story, and other characteristics to indicate the "adoptableness" of each animal. The adoptsurvey data contains the six items they rated including pictures, background, loveskids that measure a "good story" latent variable, while energy, wagstail, playful measure an "in person" latent variable. You will build a two-factor model of their survey and examine it for Heywood cases. The lavaan library has been loaded for you. * The goodstory latent is measured by pictures, background, and loveskids. * The inperson latent is measured by energy, wagstail, and playful. * Analyze the two-factor model with the cfa() function. # Look at the data head(adoptsurvey) # Build the model adopt.model <- 'goodstory =~ pictures + background + loveskids inperson =~ energy + wagstail + playful' # Analyze the model adopt.fit <- cfa(model = adopt.model, data = adoptsurvey) Warning message: lavaan WARNING: covariance matrix of latent variables is not positive definite; use lavInspect(fit, "cov.lv") to investigate. ------------------------------------------------------------------------------------------------------------------------ Fix the Latent Heywood Model In the last exercise, you found that the adoption survey had a correlation > 1 on the latent variable. You should fix the Heywood case by collapsing the two latent variables into one latent variable. Then you should analyze and summarize the model to explore if merging these two factors into one has solved the problematic correlation. * Change the model to create only one goodstory factor that is measured by all six manifest variables in the adoptsurvey dataset. * Analyze the model with the cfa() function to ensure there are no error messages. * Run the summary() for the model to view the final model fit. # Look at the data head(adoptsurvey) # Edit the original model adopt.model <- 'goodstory =~ pictures + background + loveskids + energy + wagstail + playful' # Analyze the model adopt.fit <- cfa(model = adopt.model, data = adoptsurvey) # Look for Heywood cases summary(adopt.fit, standardized = TRUE, fit.measures = TRUE) lavaan 0.6-11 ended normally after 49 iterations Estimator ML Optimization method NLMINB Number of model parameters 12 Number of observations 100 Model Test User Model: Test statistic 27.071 Degrees of freedom 9 P-value (Chi-square) 0.001 Model Test Baseline Model: Test statistic 74.694 Degrees of freedom 15 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 0.697 Tucker-Lewis Index (TLI) 0.495 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -1672.246 Loglikelihood unrestricted model (H1) -1658.711 Akaike (AIC) 3368.493 Bayesian (BIC) 3399.755 Sample-size adjusted Bayesian (BIC) 3361.856 Root Mean Square Error of Approximation: RMSEA 0.142 90 Percent confidence interval - lower 0.082 90 Percent confidence interval - upper 0.205 P-value RMSEA <= 0.05 0.009 Standardized Root Mean Square Residual: SRMR 0.086 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all goodstory =~ pictures 1.000 1.773 0.562 background 0.892 0.337 2.650 0.008 1.581 0.387 loveskids 0.547 0.344 1.587 0.112 0.969 0.208 energy 1.194 0.372 3.214 0.001 2.118 0.537 wagstail 1.712 0.517 3.310 0.001 3.035 0.597 playful 0.773 0.312 2.480 0.013 1.371 0.354 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .pictures 6.824 1.323 5.160 0.000 6.824 0.685 .background 14.168 2.228 6.359 0.000 14.168 0.850 .loveskids 20.736 3.009 6.891 0.000 20.736 0.957 .energy 11.051 2.049 5.394 0.000 11.051 0.711 .wagstail 16.661 3.486 4.779 0.000 16.661 0.644 .playful 13.128 2.021 6.496 0.000 13.128 0.875 goodstory 3.143 1.369 2.296 0.022 1.000 1.000 ------------------------------------------------------------------------------------------------------------------------ Analyze a Manifest Heywood Case After reporting your findings, the adoption group recreated their survey to measure two factors: how effective their online story posts were in goodstory and how much an inperson interaction mattered. The new data is loaded under adoptsurvey. In this exercise, you will look for a Heywood cases on one of the manifest variables, rather than on the latent variable. * Analyze the updated data adoptsurvey for the two factor adopt.model with the cfa() function. * Watch for warnings after the cfa() has been analyzed. * Use the summary() function to explore which manifest variable is problematic. # Build the model adopt.model <- 'goodstory =~ pictures + background + loveskids inperson =~ energy + wagstail + playful' # Analyze the model and include the data argument adopt.fit <- cfa(model = adopt.model, data = adoptsurvey) Warning message: lavaan WARNING: some estimated ov variances are negative # Summarize the model to view the negative variances summary(adopt.fit, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE) lavaan 0.6-11 ended normally after 300 iterations Estimator ML Optimization method NLMINB Number of model parameters 13 Number of observations 100 Model Test User Model: Test statistic 7.134 Degrees of freedom 8 P-value (Chi-square) 0.522 Model Test Baseline Model: Test statistic 25.380 Degrees of freedom 15 P-value 0.045 User Model versus Baseline Model: Comparative Fit Index (CFI) 1.000 Tucker-Lewis Index (TLI) 1.156 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -1649.956 Loglikelihood unrestricted model (H1) -1646.389 Akaike (AIC) 3325.912 Bayesian (BIC) 3359.779 Sample-size adjusted Bayesian (BIC) 3318.722 Root Mean Square Error of Approximation: RMSEA 0.000 90 Percent confidence interval - lower 0.000 90 Percent confidence interval - upper 0.109 P-value RMSEA <= 0.05 0.686 Standardized Root Mean Square Residual: SRMR 0.050 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all goodstory =~ pictures 1.000 1.360 0.437 background 1.471 0.763 1.928 0.054 2.000 0.521 loveskids 1.746 0.892 1.958 0.050 2.375 0.501 inperson =~ energy 1.000 0.208 0.058 wagstail 45.266 1090.315 0.042 0.967 9.409 1.969 playful 0.869 1.110 0.783 0.434 0.181 0.054 Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all goodstory ~~ inperson -0.014 0.332 -0.041 0.967 -0.048 -0.048 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .pictures 7.814 1.514 5.162 0.000 7.814 0.809 .background 10.762 2.695 3.993 0.000 10.762 0.729 .loveskids 16.791 3.936 4.266 0.000 16.791 0.749 .energy 12.642 2.066 6.119 0.000 12.642 0.997 .wagstail -65.684 2124.546 -0.031 0.975 -65.684 -2.875 .playful 11.148 1.760 6.335 0.000 11.148 0.997 goodstory 1.850 1.310 1.411 0.158 1.000 1.000 inperson 0.043 1.046 0.041 0.967 1.000 1.000 R-Square: Estimate pictures 0.191 background 0.271 loveskids 0.251 energy 0.003 wagstail NA playful 0.003 The variance for the wagstail variable is negative, which is a Heywood case. ------------------------------------------------------------------------------------------------------------------------ Fix the Manifest Heywood Model To fix the error in the last model, we can use the var() function to calculate the variance of the manifest variable that is estimated as negative. Add a new line to the model specification code that sets the variance of the manifest variable and reanalyze the model to determine if the new adoption survey can be fit to a two-factor model. You should see that the model does not produce errors after fixing the variance for the problem manifest variable. * Update the summary() function to view the R squared values. * Use the var() function to calculate the variance of the negative manifest variable in adoptsurvey. * Update the model code to set that variance to a specific number. * Analyze and summarize the updated model with R squared values. # Summarize the model to view the negative variances summary(adopt.fit, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE) # View the variance of the problem manifest variable var(adoptsurvey$wagstail) [1] 23.07446 # Update the model using 5 decimal places adopt.model2 <- 'goodstory =~ pictures + background + loveskids inperson =~ energy + wagstail + playful wagstail ~~ 23.07446*wagstail' # Analyze and summarize the updated model adopt.fit2 <- cfa(model = adopt.model2, data = adoptsurvey) # Summarize the model to view the negative variances summary(adopt.fit2, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE) lavaan 0.6-11 ended normally after 66 iterations Estimator ML Optimization method NLMINB Number of model parameters 12 Number of observations 100 Model Test User Model: Test statistic 8.493 Degrees of freedom 9 P-value (Chi-square) 0.485 Model Test Baseline Model: Test statistic 25.380 Degrees of freedom 15 P-value 0.045 User Model versus Baseline Model: Comparative Fit Index (CFI) 1.000 Tucker-Lewis Index (TLI) 1.081 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -1650.635 Loglikelihood unrestricted model (H1) -1646.389 Akaike (AIC) 3325.270 Bayesian (BIC) 3356.532 Sample-size adjusted Bayesian (BIC) 3318.633 Root Mean Square Error of Approximation: RMSEA 0.000 90 Percent confidence interval - lower 0.000 90 Percent confidence interval - upper 0.108 P-value RMSEA <= 0.05 0.664 Standardized Root Mean Square Residual: SRMR 0.058 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all goodstory =~ pictures 1.000 1.344 0.432 background 1.461 0.758 1.928 0.054 1.964 0.511 loveskids 1.818 0.947 1.919 0.055 2.444 0.516 inperson =~ energy 1.000 0.959 0.269 wagstail 1.391 2.244 0.620 0.535 1.334 0.268 playful 0.807 1.640 0.492 0.623 0.774 0.231 Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all goodstory ~~ inperson -0.077 0.450 -0.172 0.864 -0.060 -0.060 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .wagstail 23.074 23.074 0.928 .pictures 7.857 1.510 5.203 0.000 7.857 0.813 .background 10.906 2.672 4.082 0.000 10.906 0.739 .loveskids 16.461 4.103 4.012 0.000 16.461 0.734 .energy 11.765 2.683 4.385 0.000 11.765 0.928 .playful 10.582 2.082 5.084 0.000 10.582 0.946 goodstory 1.807 1.296 1.395 0.163 1.000 1.000 inperson 0.920 2.209 0.416 0.677 1.000 1.000 R-Square: Estimate wagstail 0.072 pictures 0.187 background 0.261 loveskids 0.266 energy 0.072 playful 0.054 ------------------------------------------------------------------------------------------------------------------------ Basic SEM Diagram The adoption group are not statisticians, so it would help them understand your work if you could create a picture of their model. The previous two-factor model of goodstory and inperson has been loaded for you. Create a basic plot of the model using the semPlot library and the semPaths() function with the default output. The model plot should have two circles for the latents, and six squares for the manifest variables. * Load the semPlot library. * Use the semPaths() function where the object is saved analyzed model adopt.fit. # Load the semPlot library library(semPlot) # Create a default picture semPaths(adopt.fit) ------------------------------------------------------------------------------------------------------------------------ Edit the Layout The tree layout in semPaths() is the default view, where the rotation is set to 1. Setting the rotation to 2 can often help non-statisticians understand models because the model reads left to right, where the latent variables on the left predict the responses to the manifest variables on the right. Update the model picture to use these settings. * Add the layout argument to the semPaths() function. * Add the rotation argument to the semPaths() function. * Set layout equal to "tree" and rotation equal to 2. # Load the semPlot library library(semPlot) # Update the default picture semPaths(object = adopt.fit, layout = "tree", rotation = 2) ------------------------------------------------------------------------------------------------------------------------ Edit the Labels The adoption group wants to perfect their survey for future use. To help them understand the results, you can add labels to the paths to indicate which items were the most related to their latent variables. You can shade those labels by strength to add more data visualization to the model picture. You should see that the goodstory manifest variables are much stronger than the inperson variables. * Use the whatLabels argument to show the standardized loadings. * Use the what argument to color by the standardized loadings. * Use the edge.color argument to color the model blue. # Load the library library(semPlot) # Update the default picture semPaths(object = adopt.fit, layout = "tree", rotation = 2, whatLabels = "std", edge.label.cex = 1, what = "std", edge.color = "blue")