Analyze a Latent Heywood Case
A local animal shelter has designed a survey to measure the impact of their Adopt Me program. Viewers rated each dog's
picture, background story, and other characteristics to indicate the "adoptableness" of each animal.

The adoptsurvey data contains the six items they rated including pictures, background, loveskids that measure a 
"good story" latent variable, while energy, wagstail, playful measure an "in person" latent variable. You will build 
a two-factor model of their survey and examine it for Heywood cases. The lavaan library has been loaded for you.

* The goodstory latent is measured by pictures, background, and loveskids.
* The inperson latent is measured by energy, wagstail, and playful.
* Analyze the two-factor model with the cfa() function.

# Look at the data
head(adoptsurvey)

# Build the model
adopt.model <- 'goodstory =~ pictures + background + loveskids
inperson =~ energy + wagstail + playful'

# Analyze the model
adopt.fit <- cfa(model = adopt.model, data = adoptsurvey)

Warning message: lavaan WARNING: covariance matrix of latent variables
                is not positive definite;
                use lavInspect(fit, "cov.lv") to investigate.

------------------------------------------------------------------------------------------------------------------------

Fix the Latent Heywood Model
In the last exercise, you found that the adoption survey had a correlation > 1 on the latent variable. You should fix the
Heywood case by collapsing the two latent variables into one latent variable. Then you should analyze and summarize the
model to explore if merging these two factors into one has solved the problematic correlation.

* Change the model to create only one goodstory factor that is measured by all six manifest variables 
  in the adoptsurvey dataset.
* Analyze the model with the cfa() function to ensure there are no error messages.
* Run the summary() for the model to view the final model fit.

# Look at the data
head(adoptsurvey)

# Edit the original model 
adopt.model <- 'goodstory =~ pictures + background + loveskids + energy + wagstail + playful'

# Analyze the model
adopt.fit <- cfa(model = adopt.model, data = adoptsurvey)

# Look for Heywood cases
summary(adopt.fit, standardized = TRUE, fit.measures = TRUE)


lavaan 0.6-11 ended normally after 49 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        12
                                                      
  Number of observations                           100
                                                      
Model Test User Model:
                                                      
  Test statistic                                27.071
  Degrees of freedom                                 9
  P-value (Chi-square)                           0.001

Model Test Baseline Model:

  Test statistic                                74.694
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.697
  Tucker-Lewis Index (TLI)                       0.495

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -1672.246
  Loglikelihood unrestricted model (H1)      -1658.711
                                                      
  Akaike (AIC)                                3368.493
  Bayesian (BIC)                              3399.755
  Sample-size adjusted Bayesian (BIC)         3361.856

Root Mean Square Error of Approximation:

  RMSEA                                          0.142
  90 Percent confidence interval - lower         0.082
  90 Percent confidence interval - upper         0.205
  P-value RMSEA <= 0.05                          0.009

Standardized Root Mean Square Residual:

  SRMR                                           0.086

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  goodstory =~                                                          
    pictures          1.000                               1.773    0.562
    background        0.892    0.337    2.650    0.008    1.581    0.387
    loveskids         0.547    0.344    1.587    0.112    0.969    0.208
    energy            1.194    0.372    3.214    0.001    2.118    0.537
    wagstail          1.712    0.517    3.310    0.001    3.035    0.597
    playful           0.773    0.312    2.480    0.013    1.371    0.354

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .pictures          6.824    1.323    5.160    0.000    6.824    0.685
   .background       14.168    2.228    6.359    0.000   14.168    0.850
   .loveskids        20.736    3.009    6.891    0.000   20.736    0.957
   .energy           11.051    2.049    5.394    0.000   11.051    0.711
   .wagstail         16.661    3.486    4.779    0.000   16.661    0.644
   .playful          13.128    2.021    6.496    0.000   13.128    0.875
    goodstory         3.143    1.369    2.296    0.022    1.000    1.000

------------------------------------------------------------------------------------------------------------------------

Analyze a Manifest Heywood Case
After reporting your findings, the adoption group recreated their survey to measure two factors: how effective their
online story posts were in goodstory and how much an inperson interaction mattered. The new data is loaded under
adoptsurvey. In this exercise, you will look for a Heywood cases on one of the manifest variables, rather than on the
latent variable.

* Analyze the updated data adoptsurvey for the two factor adopt.model with the cfa() function.
* Watch for warnings after the cfa() has been analyzed.
* Use the summary() function to explore which manifest variable is problematic.

# Build the model
adopt.model <- 'goodstory =~ pictures + background + loveskids
inperson =~ energy + wagstail + playful'

# Analyze the model and include the data argument
adopt.fit <- cfa(model = adopt.model, data = adoptsurvey)

Warning message: lavaan WARNING: some estimated ov variances are negative

# Summarize the model to view the negative variances
summary(adopt.fit, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE)

lavaan 0.6-11 ended normally after 300 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        13
                                                      
  Number of observations                           100
                                                      
Model Test User Model:
                                                      
  Test statistic                                 7.134
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.522

Model Test Baseline Model:

  Test statistic                                25.380
  Degrees of freedom                                15
  P-value                                        0.045

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.156

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -1649.956
  Loglikelihood unrestricted model (H1)      -1646.389
                                                      
  Akaike (AIC)                                3325.912
  Bayesian (BIC)                              3359.779
  Sample-size adjusted Bayesian (BIC)         3318.722

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.109
  P-value RMSEA <= 0.05                          0.686

Standardized Root Mean Square Residual:

  SRMR                                           0.050

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  goodstory =~                                                          
    pictures          1.000                               1.360    0.437
    background        1.471    0.763    1.928    0.054    2.000    0.521
    loveskids         1.746    0.892    1.958    0.050    2.375    0.501
  inperson =~                                                           
    energy            1.000                               0.208    0.058
    wagstail         45.266 1090.315    0.042    0.967    9.409    1.969
    playful           0.869    1.110    0.783    0.434    0.181    0.054

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  goodstory ~~                                                          
    inperson         -0.014    0.332   -0.041    0.967   -0.048   -0.048

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .pictures          7.814    1.514    5.162    0.000    7.814    0.809
   .background       10.762    2.695    3.993    0.000   10.762    0.729
   .loveskids        16.791    3.936    4.266    0.000   16.791    0.749
   .energy           12.642    2.066    6.119    0.000   12.642    0.997
   .wagstail        -65.684 2124.546   -0.031    0.975  -65.684   -2.875
   .playful          11.148    1.760    6.335    0.000   11.148    0.997
    goodstory         1.850    1.310    1.411    0.158    1.000    1.000
    inperson          0.043    1.046    0.041    0.967    1.000    1.000

R-Square:
                   Estimate
    pictures          0.191
    background        0.271
    loveskids         0.251
    energy            0.003
    wagstail             NA
    playful           0.003

The variance for the wagstail variable is negative, which is a Heywood case.

------------------------------------------------------------------------------------------------------------------------

Fix the Manifest Heywood Model
To fix the error in the last model, we can use the var() function to calculate the variance of the manifest variable 
that is estimated as negative. Add a new line to the model specification code that sets the variance of the manifest 
variable and reanalyze the model to determine if the new adoption survey can be fit to a two-factor model. You should
see that the model does not produce errors after fixing the variance for the problem manifest variable.

* Update the summary() function to view the R squared values.
* Use the var() function to calculate the variance of the negative manifest variable in adoptsurvey.
* Update the model code to set that variance to a specific number.
* Analyze and summarize the updated model with R squared values.

# Summarize the model to view the negative variances
summary(adopt.fit, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE)

# View the variance of the problem manifest variable
var(adoptsurvey$wagstail)

[1] 23.07446

# Update the model using 5 decimal places
adopt.model2 <- 'goodstory =~ pictures + background + loveskids
inperson =~ energy + wagstail + playful
wagstail ~~ 23.07446*wagstail'

# Analyze and summarize the updated model
adopt.fit2 <- cfa(model = adopt.model2, data = adoptsurvey)

# Summarize the model to view the negative variances
summary(adopt.fit2, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE)


lavaan 0.6-11 ended normally after 66 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        12
                                                      
  Number of observations                           100
                                                      
Model Test User Model:
                                                      
  Test statistic                                 8.493
  Degrees of freedom                                 9
  P-value (Chi-square)                           0.485

Model Test Baseline Model:

  Test statistic                                25.380
  Degrees of freedom                                15
  P-value                                        0.045

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.081

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -1650.635
  Loglikelihood unrestricted model (H1)      -1646.389
                                                      
  Akaike (AIC)                                3325.270
  Bayesian (BIC)                              3356.532
  Sample-size adjusted Bayesian (BIC)         3318.633

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.108
  P-value RMSEA <= 0.05                          0.664

Standardized Root Mean Square Residual:

  SRMR                                           0.058

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  goodstory =~                                                          
    pictures          1.000                               1.344    0.432
    background        1.461    0.758    1.928    0.054    1.964    0.511
    loveskids         1.818    0.947    1.919    0.055    2.444    0.516
  inperson =~                                                           
    energy            1.000                               0.959    0.269
    wagstail          1.391    2.244    0.620    0.535    1.334    0.268
    playful           0.807    1.640    0.492    0.623    0.774    0.231

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  goodstory ~~                                                          
    inperson         -0.077    0.450   -0.172    0.864   -0.060   -0.060

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .wagstail         23.074                              23.074    0.928
   .pictures          7.857    1.510    5.203    0.000    7.857    0.813
   .background       10.906    2.672    4.082    0.000   10.906    0.739
   .loveskids        16.461    4.103    4.012    0.000   16.461    0.734
   .energy           11.765    2.683    4.385    0.000   11.765    0.928
   .playful          10.582    2.082    5.084    0.000   10.582    0.946
    goodstory         1.807    1.296    1.395    0.163    1.000    1.000
    inperson          0.920    2.209    0.416    0.677    1.000    1.000

R-Square:
                   Estimate
    wagstail          0.072
    pictures          0.187
    background        0.261
    loveskids         0.266
    energy            0.072
    playful           0.054

------------------------------------------------------------------------------------------------------------------------

Basic SEM Diagram
The adoption group are not statisticians, so it would help them understand your work if you could create a picture of
their model. The previous two-factor model of goodstory and inperson has been loaded for you.

Create a basic plot of the model using the semPlot library and the semPaths() function with the default output.
The model plot should have two circles for the latents, and six squares for the manifest variables.

* Load the semPlot library.
* Use the semPaths() function where the object is saved analyzed model adopt.fit.

# Load the semPlot library
library(semPlot)

# Create a default picture
semPaths(adopt.fit)

------------------------------------------------------------------------------------------------------------------------

Edit the Layout
The tree layout in semPaths() is the default view, where the rotation is set to 1. Setting the rotation to 2 can often
help non-statisticians understand models because the model reads left to right, where the latent variables on the left
predict the responses to the manifest variables on the right. Update the model picture to use these settings.

* Add the layout argument to the semPaths() function.
* Add the rotation argument to the semPaths() function.
* Set layout equal to "tree" and rotation equal to 2.

# Load the semPlot library
library(semPlot)

# Update the default picture
semPaths(object = adopt.fit, 
         layout = "tree",
         rotation = 2)

------------------------------------------------------------------------------------------------------------------------

Edit the Labels
The adoption group wants to perfect their survey for future use. To help them understand the results, you can add labels
to the paths to indicate which items were the most related to their latent variables.

You can shade those labels by strength to add more data visualization to the model picture. You should see that the
goodstory manifest variables are much stronger than the inperson variables.

* Use the whatLabels argument to show the standardized loadings.
* Use the what argument to color by the standardized loadings.
* Use the edge.color argument to color the model blue.

# Load the library
library(semPlot)

# Update the default picture
semPaths(object = adopt.fit,
         layout = "tree",
         rotation = 2,
         whatLabels = "std",
         edge.label.cex = 1,
         what = "std",
         edge.color = "blue")