Skip to main content
Log in

On the Asymptotic Relative Efficiency of Planned Missingness Designs

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In planned missingness (PM) designs, certain data are set a priori to be missing. PM designs can increase validity and reduce cost; however, little is known about the loss of efficiency that accompanies these designs. The present paper compares PM designs to reduced sample (RN) designs that have the same total number of data points concentrated in fewer participants. In 4 studies, we consider models for both observed and latent variables, designs that do or do not include an “X set” of variables with complete data, and a full range of between- and within-set correlation values. All results are obtained using asymptotic relative efficiency formulas, and thus no data are generated; this novel approach allows us to examine whether PM designs have theoretical advantages over RN designs removing the impact of sampling error. Our primary findings are that (a) in manifest variable regression models, estimates of regression coefficients have much lower relative efficiency in PM designs as compared to RN designs, (b) relative efficiency of factor correlation or latent regression coefficient estimates is maximized when the indicators of each latent variable come from different sets, and (c) the addition of an X set improves efficiency in manifest variable regression models only for the parameters that directly involve the X-set variables, but it substantially improves efficiency of most parameters in latent variable models. We conclude that PM designs can be beneficial when the model of interest is a latent variable model; recommendations are made for how to optimize such a design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. For between-set correlation values of .5 or lower, all within-set correlation values (between .01 and .99) were possible. Above .5, each .01 increment in between-set correlation value corresponded to a .02 increment in the minimum within-set correlation value (i.e., when the between-set correlation was .51, and the minimum within-set correlation was .02; when the between-set correlation was .99, the minimum within-set correlation was .98).

  2. The finding that ARE of regression coefficients and intercepts falls in line with previous evidence that when exogenous variables have complete data, information loss may be characterized more simply. Savalei and Rhemtulla (2011) reported that when a single complete variable predicts a single dependent variable with 20 % missing data, both regression coefficients and intercepts have 20 % missing information, regardless of the correlation between the two variables. In Model 5A, we do see that ARE depends on correlation strength, but the effect is the same for both parameters.

  3. The addition of an X-set indicator to each factor also changes the model simply by introducing a fourth indicator to each factor. To investigate whether the observed benefits of the X set are due to having four indicators rather than 3, we also ran a version of Model 6 that used a 4-form design (sets A–D) where 25 % of the sample was missing each item set. Compared to this PM model, the PM model 6X had factor loadings with .05 higher RE, factor variances with .20 higher RE, and factor covariances that had between .02 and .15 higher RE (increasing with increasing factor correlations). Thus, though the 4-form PM design performed better than the 3-form PM design with no X set, the 3-form design with X set outperformed both of these.

References

  • Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. M. R. E. Schumacker (Ed.), Advanced structural equation modeling: Issues and techniques (pp. 243–277). Mahwah: Lawrence Erlbaum Associates Inc.

    Google Scholar 

  • Arminger, G., & Sobel, M. E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195–203.

    Article  Google Scholar 

  • Bentler, P. M. (2007). EQS 6 structural equations program manual. Encino: Multivariate Software.

    Google Scholar 

  • Bentler, P. M., & Lee, S.-Y. (1978). Matrix derivatives with chain rule and rules for simple, Hadamard, and Kronecker products. Journal of Mathematical Psychology, 17, 255–262.

    Article  Google Scholar 

  • Bunting, B. P., Adamson, G., & Mulhall, P. K. (2002). A Monte Carlo Examination of an MTMM Model with planned incomplete data structures. Structural Equation Modeling, 9, 369–389. doi:10.1207/S15328007SEM0903_4.

  • Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.

  • Graham, J. W., Hofer, S. M., & Piccinin, A. M. (1994). Analysis with missing data in drug prevention research. In L. M. Collins & L. Seitz (Eds.), Advances in data analysis for prevention intervention research: National Institute on Drug Abuse Research monograph series No. 142 (pp. 13–63). Washington, DC: National Institute on Drug Abuse.

    Google Scholar 

  • Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197–218. doi:10.1207/s15327906mbr3102_3.

  • Graham, J. W., Taylor, B. J., & Cumsille, P. E. (2001). Planned missing data designs in the analysis of change. In L. M. Collins & A. G. Sayer (Eds.), New methods for the analysis of change (pp. 335–353). Washington, DC: American Psychological Association. doi:10.1037/10409-011.

    Chapter  Google Scholar 

  • Graham, J. W., Taylor, B. J., Olchowski, A. E., & Cumsille, P. E. (2006). Planned missing data designs in psychological research. Psychological Methods, 11, 323–343. doi:10.1037/1082-989X.11.4.323.

    Article  PubMed  Google Scholar 

  • Harel, O., Stratton, J., Aseltine, R. (2011). Designed missingness to better estimate efficacy of behavioral studies. Technical Report (11–15), The Department of Statistics, University of Connecticut.

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley.

  • Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics. New York: Wiley.

    Google Scholar 

  • McArdle, J. J., & Woodcock, R. W. (1997). Expanding test-retest designs to include developmental time-lag components. Psychological Methods, 2, 403–435. doi:10.1037/1082-989X.2.4.403.

    Article  Google Scholar 

  • Mistler, S. A., & Enders, C. K. (2012). Planned missing data designs for developmental research. In B. Laursen, T. D. Little, & N. A. Card (Eds.), Handbook of Developmental Research Methods (pp. 742–754). New York: Guilford Press.

    Google Scholar 

  • Mooijaart, A., & Bentler, P. M. (1991). Robustness of normal theory statistics in structural equation models. Statistica Neerlandica, 45, 159–170.

    Article  Google Scholar 

  • Nel, D. G. (1980). On matrix differentiation in statistics. South African Statistical Journal, 14, 137–193.

    Google Scholar 

  • Orchard, T., & Woodbury, M. A. (1972). A missing information principle: Theory and applications. Paper presented at the Sixth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, CA: University of California.

  • R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/. Accessed 1 Jan 2014.

  • Raghunathan, T. E., & Grizzle, J. E. (1995). A split questionnaire survey design. Journal of the American Statistical Association, 90, 54–63. doi:10.1080/01621459.1995.10476488.

    Article  Google Scholar 

  • Raykov, T., Marcoulides, G. A., & Patelis, T. (2013). Saturated versus identified models: A note on their distinction. Educational and Psychological Measurement, 73, 162–168.

    Article  Google Scholar 

  • Revilla, M., & Saris, W. E. (2013). The split-ballot multitrait-multimethod approach: Implementation and problems. Structural Equation Modeling, 20, 27–46. doi:10.1080/10705511.2013.742379.

    Article  Google Scholar 

  • Saris, W. E., Satorra, A., & Coenders, G. (2004). A new approach to evaluating the quality of measurement instruments: The split-ballot MTMM design. Sociological Methodology, 34, 311–347. doi:10.1111/j.0081-1750.2004.00155.x.

    Article  Google Scholar 

  • Savalei, V. (2010). Expected vs. observed information in SEM with incomplete normal and nonnormal data. Psychological Methods, 15, 352–367. doi:10.1037/a0020143.

    Article  PubMed  Google Scholar 

  • Savalei, V., & Rhemtulla, M. (2011). Properties of local and global measures of fraction of missing information: Some explorations. In Talk presented at the 76th Annual and the 17th International Meeting of the Psychometric Society, Hong Kong.

  • Savalei, V., & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19, 477–494.

  • Shoemaker, D. M. (1973). Principles and procedures of multiple matrix sampling. Cambridge: Ballinger Publishing Company.

    Google Scholar 

  • Sirotnik, K., & Wellington, R. (1977). Incidence sampling: An integrated theory for “matrix sampling”. Journal of Educational Measurement, 14, 343–399. doi:10.1111/j.1745-3984.1977.tb00050.x.

    Article  Google Scholar 

  • Thomas, N., Raghunathan, T. E., Schenker, N., Katzoff, M. J., & Johnson, C. L. (2006). An evaluation of matrix sampling methods using data from the National Health and Nutrition Examination Survey. Survey Methodology, 32, 217.

    Google Scholar 

  • Wacholder, S., Carroll, R. J., Pee, D., & Gail, M. H. (1994). The partial questionnaire design for case–control studies. Statistics in Medicine, 13, 623–634.

    Article  PubMed  Google Scholar 

  • Yuan, K., & Bentler, P. M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. Sociological Methodology, 30, 165–200. doi:10.1111/0081-1750.00078.

    Article  Google Scholar 

Download references

Acknowledgments

We thank Kristopher Preacher and Alex Schoemann for comments on an earlier draft of this manuscript. This research was supported in part by a Banting Postdoctoral Fellowship from the Social Sciences and Humanities Research Council of Canada (SSHRC) to Mijke Rhemtulla, an SSHRC Grant to Victoria Savalei, and NSF Grant NSF0066969 to Todd Little (W. Wu, co-PI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mijke Rhemtulla.

Appendix A: Translating Relative Efficiency into Change in Power

Appendix A: Translating Relative Efficiency into Change in Power

Let \(\theta \) be the parameter of interest. We want to test the null \(H_0 :\theta \le 0\) against the alternative \(H_1:\theta >0\). Let \({\hat{\theta }}_\mathrm{CD}\) be the complete data estimator based on a given \(n\). We assume that the sampling distribution of this estimator is approximately \(N\left( {\theta ,\frac{1}{n}\mathrm{SE}_{\theta ,\mathrm{CD}}^2}\right) \). Similarly, let \({\hat{\theta }}_\mathrm{PM}\) be the incomplete data estimator (from a PM design) based on a given \(n\), with the approximate sampling distribution \(N\left( {\theta ,\frac{1}{n}\mathrm{SE}_{\theta ,\mathrm{PM}}^2}\right) \). Without loss of generality, and for simplicity, we assume \(n=1\) below (none of the relevant equations depend on \(n\)). Additionally, we assume for simplicity that standard errors are known, to avoid complicating the expressions with the “hat” notation. The expressions are then approximate and become exact asymptotically. Finally, the most serious but necessary simplifying assumption is that the variance of the estimator does not depend on the population value; i.e., \(\mathrm{SE}_{\theta ,\mathrm{CD}}=\mathrm{SE}_\mathrm{CD}\) and \(\mathrm{SE}_{\theta ,\mathrm{PM}}=\mathrm{SE}_\mathrm{PM}\) for all values of \(\theta \).

The Wald test statistic is given by \(z_\mathrm{CD}=\frac{{\hat{\theta }}_\mathrm{CD}}{\mathrm{SE}_\mathrm{CD}}\), and its critical value for a one-tailed test at a given significance level \(\alpha \) is \(z_\mathrm{CRIT}=\Phi ^{-1}(1-\alpha )\). Suppose the true value is \(\theta _0\). Then, power is given by

$$\begin{aligned} \pi _{\theta _0 ,\mathrm{CD}}&= 1-\Pr \left( {z_\mathrm{CD} \le z_\mathrm{CRIT} |\theta =\theta _0 } \right) =1-\Pr \left( {\frac{\hat{{\theta }}_\mathrm{CD} }{\mathrm{SE}_\mathrm{CD} }\le z_\mathrm{CRIT} |\theta =\theta _0 } \right) \\&=1-\Pr \left( {\frac{\hat{{\theta }}_\mathrm{CD} -\theta _0 }{\mathrm{SE}_\mathrm{CD} }\le z_\mathrm{CRIT} -\frac{\theta _0 }{\mathrm{SE}_\mathrm{CD} }|\theta =\theta _0 } \right) =1-\Phi \left( {z_\mathrm{CRIT} -\frac{\theta _0 }{\mathrm{SE}_\mathrm{CD} }} \right) . \end{aligned}$$

It follows that

$$\begin{aligned} 1-\pi _{\theta _0 ,\mathrm{CD}}&= \Phi \left( {z_\mathrm{CRIT} -\frac{\theta _0 }{\mathrm{SE}_\mathrm{CD} }} \right) \nonumber \\ \Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{CD}} } \right)&= z_\mathrm{CRIT} -\frac{\theta _0 }{\mathrm{SE}_\mathrm{CD} }=\Phi ^{-1}(1-\alpha )-\frac{\theta _0 }{\mathrm{SE}_\mathrm{CD} }\nonumber \\ \mathrm{SE}_\mathrm{CD}&= \frac{\theta _0 }{\Phi ^{-1}(1-\alpha )-\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{CD}} } \right) }. \end{aligned}$$
(2)

Similarly, for incomplete data,

$$\begin{aligned} \mathrm{SE}_\mathrm{PM} =\frac{\theta _0 }{\Phi ^{-1}(1-\alpha )-\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{PM}} } \right) }. \end{aligned}$$
(3)

The square-root of the (adjusted) relative efficiency (RE or ARE; recall that these are identical for PM designs) is the ratio of (2) and (3), and plugging these expressions in, we obtain

$$\begin{aligned} \sqrt{\mathrm{ARE}_\theta }=\frac{\mathrm{SE}_\mathrm{CD} }{\mathrm{SE}_\mathrm{PM} }=\frac{\Phi ^{-1}(1-\alpha )-\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{PM}} } \right) }{\Phi ^{-1}(1-\alpha )-\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{CD}} } \right) }. \end{aligned}$$

After some algebraic manipulation, we obtain the final expression for the power of the incomplete data design as a function of relative efficiency and the power of the complete data design:

$$\begin{aligned}&\left[ {\Phi ^{-1}(1-\alpha )-\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{CD}} } \right) } \right] \sqrt{\mathrm{ARE}_\theta }=\Phi ^{-1}(1-\alpha )-\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{PM}} } \right) \\&\Phi ^{-1}(1-\alpha )-\left[ {\Phi ^{-1}(1-\alpha )-\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{CD}} } \right) } \right] \sqrt{\mathrm{ARE}_\theta }=\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{PM}} } \right) \\&\pi _{\theta _0 ,\mathrm{PM}} =1-\Phi \left\{ {\Phi ^{-1}(1-\alpha )-\left[ {\Phi ^{-1}(1-\alpha )-\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{CD}} } \right) } \right] \sqrt{\mathrm{ARE}_\theta }} \right\} . \end{aligned}$$

Or equivalently,

$$\begin{aligned} \pi _{\theta _0 ,\mathrm{PM}} =1-\Phi \left\{ {\Phi ^{-1}(1-\alpha )(1-\sqrt{\mathrm{ARE}_\theta })+\Phi ^{-1}\left( {1-\pi _{\theta _0 ,\mathrm{CD}} } \right) \sqrt{\mathrm{ARE}_\theta }} \right\} . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rhemtulla, M., Savalei, V. & Little, T.D. On the Asymptotic Relative Efficiency of Planned Missingness Designs. Psychometrika 81, 60–89 (2016). https://doi.org/10.1007/s11336-014-9422-0

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-014-9422-0

Keywords

Navigation