Using Structural Equation Modeling to Examine the Influence of Social, Behavioral, and Nutritional Variables on Health Outcomes Based on NHANES Data: Addressing Complex Design, Nonnormally Distributed Variables, and Missing Information

https://doi.org/10.1093/cdn/nzz010Get rights and content
Under a Creative Commons license
open access

Abstract

Background

Structural equation modeling (SEM) is a multivariate analysis method for exploring relations between latent constructs and measured variables. As a theory-guided approach, SEM estimates directional pathways in complex models based on longitudinal or cross-sectional data where randomized control trials would either be unethical or cost prohibitive. However, this method is infrequently used in nutrition research, despite recommendations by epidemiologists for its increased use.

Objectives

The aim of this study was to explore 3 key methodologic areas for consideration by researchers when conducting SEM with complex survey datasets: the use of sampling weights, treatment of missing data, and model estimation techniques.

Methods

With the use of data from NHANES waves 2005–2010, we developed an SEM to estimate the relation between the latent construct of depression and measured variables of food security, tobacco use (serum cotinine), and age. We used a hierarchic approach to compare 5 SEM model iterations through the use of: 1 and 2) complete cases without and with the application of sampling weights; 3) an applied missingness dataset to test the accuracy of multiple imputation (MI); 4) the full NHANES dataset with imputed data and sampling weights; and 5) a final respecified model. Each iteration was conducted with maximum likelihood (ML) and quasimaximum likelihood with the Satorra-Bentler correction (QML) to compare path coefficients, standard errors, and model fit statistics.

Results

Path coefficients differed between 15.68% and 19.17% among model iterations. Nearly one-third of the cases had missing values, and MI reliably imputed values, allowing all cases to be represented in the final model iterations. QML provided better model fit statistics in all iterations.

Conclusions

Nutrition epidemiologists should use complex weights, MI, and QML as a best-practices approach to SEM when conducting analyses with complex design survey data.

Keywords:

Structural equation modeling
multiple imputation
complex survey design
quasi-maximum likelihood
NHANES

Abbreviations used:

CFI
comparative fit index
EM
expectation maximization algorithm
FIML
full information maximum likelihood
MAR
missing at random
MCAR
missing completely at random
MI
multiple imputation
ML
maximum likelihood
NMAR
not missing at random
PHQ
Patient Health Questionnaire
NHANES
(depression screener questionnaire variable prefix: DPQ)
PSU
primary sampling unit
QML
quasimaximum likelihood with Satorra-Bentler correction
RMSEA
root mean square error of approximation
SEM
structural equation modeling
SRMR
standardized root mean square residual
TLI
Tucker-Lewis Index.

Cited by (0)

Research reported in this paper was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number P20GM109097. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author disclosures: the authors declare no conflicts of interest.