Skip to main content
Log in

Validation of Multilevel Constructs: Validation Methods and Empirical Findings for the EDI

  • Published:
Social Indicators Research Aims and scope Submit manuscript

Abstract

The purposes of this paper are to highlight the foundations of multilevel construct validation, describe two methodological approaches and associated analytic techniques, and then apply these approaches and techniques to the multilevel construct validation of a widely-used school readiness measure called the Early Development Instrument (EDI; Janus and Offord 2007). Validation evidence is presented regarding the multilevel covariance structure of the EDI, the appropriateness of aggregation to classroom and neighbourhood levels, and the effects of teacher and classroom characteristics on these structural patterns. The results are then discussed in the context of the theoretical framework of the EDI, with suggestions for future validation work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrich, D., & Styles, I. (2004). Report on the psychometric properties of the Early Development Instrument (EDI) using the Rasch model. A technical paper commissioned for the development of the Australian Early Development Instrument (AEDI), Murdoch University.

  • Beswick, J. F., Sloat, E. A., & Willms, J. D. (2004). A comparative study of bias in teacher ratings of language and emergent literacy skills. Unpublished manuscript.

  • Blatchford, P., Bassett, P., Goldstein, H., & Martin, C. (2003). Are class size differences related to pupils’ educational progress and classroom processes? Findings from the institute of education class size study of children aged 5 to 7 years old. British Educational Research Journal, 29, 709–730.

    Article  Google Scholar 

  • Blatchford, P., Russell, A., Bassett, P., Brown, P., & Martin, C. (2007). The effect of class size on the teaching of pupils aged 7–11 years old. School Effectiveness and School Improvement, 18, 147–172.

    Article  Google Scholar 

  • Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In K. J. Klein & S. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 349–381). San Francisco: Jossey-Bass.

    Google Scholar 

  • Bliese, P. D., & Halverson, R. R. (1998). Group size and measures of group-level properties: An examination of eta-squared and ICC values. Journal of Management, 24, 157–172.

    Google Scholar 

  • Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110, 305–314.

    Article  Google Scholar 

  • Brinkman, S., Silburn, S., Lawrence, D., Goldfeld, S., Sayers, M., & Oberklaid, F. (2007). Investigating the validity of the Australian Early Development Index. Early Education and Development, 18, 427–451.

    Article  Google Scholar 

  • Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. American Psychologist, 32, 513–531.

    Article  Google Scholar 

  • Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Carpiano, R. M., Lloyd, J. E. V., & Hertzman, C. (2009). Concentrated affluence, concentrated disadvantage, and children’s readiness for school: A population-based, multi-level investigation. Social Science and Medicine, 69, 420–432.

    Article  Google Scholar 

  • Chan, D. (1998). Functional relations among constructs in the same content domain at different levels of analysis: A typology of composition models. Journal of Applied Psychology, 83, 234–246.

    Article  Google Scholar 

  • Chen, G., Mathieu, J. E., & Bliese, P. D. (2004a). A framework for conducting multi-level construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behaviour and processes. (pp. 273–303). The Netherlands: Elsevier.

  • Chen, G., Mathieu, J. E., & Bliese, P. D. (2004b). Validating frogs and ponds in multi-level contexts: Some afterthoughts. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behaviour and processes. (pp. 335–343). The Netherlands: Elsevier.

  • Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage Publications, Inc.

    Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Cronbach, L., & Meehl, P. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.

    Article  Google Scholar 

  • Dansereau, F., Alutto, J. A., & Yammarino, F. J. (1984). Theory testing in organizational behavior: The “varient” approach. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Dansereau, F., Cho, J., & Yammarino, F. J. (2006). Avoiding the fallacy of the wrong level. Group and Organization Management, 31, 536–577.

    Article  Google Scholar 

  • Dansereau, F., & Yammarino, F. J. (2000). Within and between analysis: The varient paradigm as an underlying approach to theory building and testing. In K. J. Klein & S. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 425–466). San Francisco: Jossey-Bass.

    Google Scholar 

  • Dansereau, F., & Yammarino, F. J. (2004). Overview: Multi-level issues in organizational behavior and processes. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behaviour and processes (pp. Xiii–xxxiii). The Netherlands: Elsevier.

    Chapter  Google Scholar 

  • Dansereau, F., & Yammarino, F. J. (2006). Is more discussion about levels of analysis really necessary? When is such discussion sufficient? The Leadership Quarterly, 17, 537–552.

    Article  Google Scholar 

  • Diez-Roux, A. V. (1998). Bringing context back into epidemiology: Variables and fallacies in multilevel analysis. American Journal of Public Health, 88, 216–222.

    Article  Google Scholar 

  • Duku, E., & Janus, M. (2004). Stability and reliability of the Early Development Instrument: A population-based measure for communities (EDI). Department of Psychiatry and Biobehavioural Sciences, McMaster University, Canada. Annual Research Day.

  • Duncan, G. J., Claessens, A., Huston, A. C., Pagani, L. S., Engel, M., Sexton, H., et al. (2007). School readiness and later achievement. Developmental Psychology, 43, 1428–1446.

    Article  Google Scholar 

  • Ellwein, M. C., Walsh, D. J., Eads, G. M., & Miller, A. (1991). Using readiness tests to route kindergarten students: The snarled intersection of psychometrics, policy, and practice. Educational Evaluation and Policy Analysis, 13, 159–175.

    Google Scholar 

  • Finn, J. D., Pannozzo, G. M., & Achilles, C. M. (2003). The “whys” of class size: Student behavior in small classes. Review of Educational Research, 73, 321–368.

    Article  Google Scholar 

  • Forer, B. (2009). Validation of multilevel constructs: Methods and empirical findings for the Early Development Instrument. Unpublished doctoral dissertation, University of British Columbia.

  • Forget-Dubois, N., Lemelin, J.-P., Boivin, M., Ginette, D., Séguin, J. R., Vitaro, F., et al. (2007). Predicting early school achievement with the EDI: A longitudinal population-based study. Early Education and Development, 18, 405–426.

    Article  Google Scholar 

  • Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Griffith, J. (2002). Is quality/effectiveness an empirically demonstrable school attribute? Statistical aids for determining appropriate levels of analysis. School Effectiveness and School Improvement, 13, 91–122.

    Article  Google Scholar 

  • Grilli, L., & Rampichini, C. (2007). Multilevel factor models for ordinal variables. Structural Equation Modeling, 14, 1–25.

    Google Scholar 

  • Guhn, M. & Goelman, H. (2011). Bioecological theory, early child development, and the validation of the population-level Early Development Instrument. Social Indicators Research. doi:10.1007/s11205-011-9842-5.

  • Guhn, M., Janus, M., & Hertzman, C. (2007). The Early Development Instrument: Translating school readiness assessment into community actions and policy planning. Early Education and Development, 18, 369–374.

    Article  Google Scholar 

  • Hofmann, D. A., & Jones, L. M. (2004). Some foundational and guiding questions for multi-level construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behavior and processes (pp. 305–315). The Netherlands: Elsevier.

    Chapter  Google Scholar 

  • Hox, J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Hu, L. T., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 76–99). Thousand Oaks, CA: Sage.

    Google Scholar 

  • Hymel, S., LeMare, L., & McKee, W. (2011). The Early Development Instrument (EDI): An examination of convergent and discriminant validity. Social Indicators Research. doi:10.1007/s11205-011-9845-2.

  • Janus, M. (2001). Validation of a teacher measure of school readiness with parent and child-care provider reports. Department of Psychiatry and Biobehavioural Sciences, McMaster University, Canada. Annual Research Day.

  • Janus, M. (2002). Validation of the Early Development Instrument in a sample of First Nations children. Department of Psychiatry and Biobehavioural Sciences, McMaster University, Canada. Annual Research Day.

  • Janus, M., Brinkman, S., Duku, E., Hertzman, C., Santos, R., Sayers, M., et al. (2007). The Early Development Instrument: A population-based measure for communities. A handbook on development, properties, and use. Hamilton, ON: Offord Centre for Child Studies.

    Google Scholar 

  • Janus, M., Harren, T., & Duku, E. (2004). Neighbourhood perspective on school readiness in kindergarten, academic testing in Grade 3, and affluence levels. Paper presented at the McMaster University Psychiatry Research Day, Hamilton, ON.

  • Janus, M., & Offord, D. (2000). Readiness to learn at school. ISUMA Canadian Journal of Policy Research, 1, 71–75.

    Google Scholar 

  • Janus, M., & Offord, D. (2007). Development and psychometric properties of the Early Development Instrument (EDI): A measure of children’s school readiness. Canadian Journal of Behavioural Science, 39, 1–22.

    Google Scholar 

  • Janus, M., Offord, D., & Walsh, C. (2001). Population-level assessment of readiness to learn at school for 5-year-olds in Canada: Relation to child and parent measures. Presented at the Society for Research on Child Development conference, Minneapolis.

  • Janus, M., Walsh, C., & Duku, E. (2005). Early development instrument: Factor structure, sub-domains and Multiple Challenge Index. Department of Psychiatry and Biobehavioural Sciences, McMaster University, Annual Research Day.

  • Janus, M., Walsh, C., Viveiros, H., Duku, E., & Offord, D. (2003). School readiness to learn and neighbourhood characteristics. Poster presented at the Biennial meeting of the Society for Research on Child Development, Tampa, FL.

  • Janus, M., Willms, J. D., & Offord, D. R. (2000). Psychometric properties of the Early Development Instrument (EDI): A teacher-completed measure of children’s readiness to learn at school entry. Unpublished manuscript.

  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.

    Google Scholar 

  • Keating, D. P. (2007). Formative evaluation of the Early Development Instrument: Progress and prospects. Early Education and Development, 18(3), 561–570.

    Article  Google Scholar 

  • Keating, D. P., & Hertzman, C. (Eds.). (1999). Developmental health and the wealth of nations. New York: Guildford.

    Google Scholar 

  • Kershaw, P., & Forer, B. (2006). What are the social and economic indicators of nurturing neighborhoods? Poster presented at the American Educational Research Association conference, San Francisco, California.

  • Kershaw, P., Irwin, L., Trafford, K., & Hertzman, C. (2005). The British Columbia atlas of child development. Vancouver, BC: Human Early Learning Partnership.

    Google Scholar 

  • Kim, K. (2004). An additional view of conducting multi-level construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behaviour and processes (pp. 317–333). The Netherlands: Elsevier.

    Chapter  Google Scholar 

  • Kim, J., & Suen, H. K. (2003). Predicting children’s academic achievement from early assessment scores: A validity generalization study. Early Childhood Research Quarterly, 18, 547–566.

    Article  Google Scholar 

  • Klein, K. J., Bliese, P. D., Kozlowski, S. W. J., Dansereau, F., Gavin, M. B., Griffin, M. A., et al. (2000). Multilevel analytic techniques: Commonalities, differences, and continuing questions. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 512–553). San Francisco: Jossey-Bass.

    Google Scholar 

  • Klein, K. J., Dansereau, F., & Hall, R. J. (1994). Levels issues in theory development, data collection, and analysis. Academy of Management Review, 19, 195–229.

    Google Scholar 

  • Kreiger, J. (2003). Class size reduction: Implementation and solutions. Paper presented at the SERVE Research and Policy Class Size Symposium, Raleigh, SC.

  • LaPointe, V. R. (2006). Conceptualizing and examining the impact of neighbourhoods on the school readiness of kindergarten children in British Columbia. Unpublished Doctoral Dissertation, Vancouver, The University of British Columbia.

  • LaPointe, V. R., Ford, L., & Zumbo, B. D. (2007). Examining the relationship between neighbourhood environment and school readiness for kindergarten children. Early Education and Development, 18, 473–496.

    Article  Google Scholar 

  • Linn, R. L. (2009). The concept of validity in the context of NCLB. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 195–212). Charlotte, NC: IAP-Information Age Publishing, Inc.

    Google Scholar 

  • Lissitz, R. W. (2009). The concept of validity: Revisions, new directions and applications. Charlotte, NC.: IAP-Information Age Publishing, Inc.

    Google Scholar 

  • Lloyd, J. E. V., & Hertzman, C. (2009). From Kindergarten readiness to fourth-grade assessment: Longitudinal analysis with linked population data. Social Science and Medicine, 68, 111–123.

    Article  Google Scholar 

  • Mashburn, A. J., Hamre, B. K., Downer, J. T., & Pianta, R. C. (2006). Teacher and classroom characteristics associated with teachers’ ratings of prekindergartners’ relationships and behaviors. Journal of Psychoeducational Assessment, 24, 367–380.

    Article  Google Scholar 

  • Meisels, S. J. (1997). Using Work Sampling in authentic performance assessments. Educational Leadership, 54, 60–65.

    Google Scholar 

  • Meisels, S. J. (1999). Assessing readiness. In R. C. Pianta & M. J. Cox (Eds.), The transition to kindergarten (pp. 39–66). Baltimore, MD: Paul H. Brookes.

    Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: Macmillan.

    Google Scholar 

  • Muthen, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods and Research, 22, 276–398.

    Article  Google Scholar 

  • Muthen, L. K., & Muthen, B. O. (2006). Mplus User’s Guide, version 4. Los Angeles: Muthen and Muthen.

    Google Scholar 

  • National Research Council Institute of Medicine. (2000). From neurons to neighbourhoods: The science of early childhood development. In J. P. Shonkoff & D. A. Phillips (Eds.), Committee on integrating the science of early childhood development (pp. 328–336). Washington, D.C.: National Academy Press.

    Google Scholar 

  • O’Brien, R. M. (1990). Estimating the reliability of aggregate-level variables based on individual-level characteristics. Sociological Methods and Research, 18, 473–504.

    Article  Google Scholar 

  • Ostroff, C. (1992). The relationship between satisfaction, attitudes and performance: An organizational-level analysis. Journal of Applied Psychology, 77, 963–974.

    Article  Google Scholar 

  • Pedder, D. (2006). Are small classes better? Understanding relationships between class size, classroom processes, and pupils’ learning. Oxford Review of Education, 32, 213–234.

    Article  Google Scholar 

  • Pellegrini, A., & Blatchford, P. (2000). The child at school: Interactions with peers and teachers. London: Edward Arnold.

    Google Scholar 

  • Province of British Columbia (2008). Strategic plan 2008/092010/11. Retrieved from http://www.bcbudget.gov.bc.ca/2008/stplan/2008_Strategic_Plan.pdf on September 18, 2008.

  • Rimm-Kaufman, S. E., & Pianta, R. C. (2000). An ecological perspective on the transition to kindergarten: A theoretical framework to guide empirical research. Journal of Applied Developmental Psychology, 21, 491–511.

    Article  Google Scholar 

  • Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351–357.

    Article  Google Scholar 

  • Rowan, B., Raudenbush, S. W., & Kang, S. K. (1991). School climate in secondary schools. In S. W. Raudenbush & J. D. Willms (Eds.), Schools, classrooms, and pupils: International studies of schooling from a multilevel perspective (pp. 203–223). San Diego: Academic.

    Google Scholar 

  • Satorra, A., & Bentler, P. M. (1999). A scaled difference Chi-square test statistic for moment structure analysis. Technical Report, University of California, Los Angeles. http://preprints.stat.ucla.edu/260/chisquare.pdf.

  • Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York: Routledge.

    Google Scholar 

  • Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. M. Cole, V. John-Steiner, S. Scribner, & E. Souberman (Eds.), Cambridge, MA: Harvard University Press.

  • Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics, Vol. 26: Psychometrics (pp. 45–79). Elsevier Science B.V.: The Netherlands.

    Google Scholar 

  • Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: IAP-Information Age Publishing, Inc.

    Google Scholar 

  • Zumbo, B. D. & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J. Bovaird, K. Geisinger, & C. Buckendahl (Eds.). High stakes testing in educationscience and practice in K-12 settings (Festschrift to Barbara Plake). American Psychological Press: Washington, D.C.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barry Forer.

Appendices

Appendix A

Steps in the Calculation of the Root Mean Square Residual–Proportion (RMR-P) Fit Statistic

The calculation of the RMR-P follows the same basic procedural steps used to calculate the Standardized Root Mean Square Residual fit statistic (SRMR; Hu and Bentler 1995) which is based on residual covariances.

  1. 1.

    The calculation begins with the residual proportions that result from conducting a CFA with categorical items. For each pair of items, there will be c 1 *c 2 residual proportions, where c 1 and c 2 are the number of categories in items 1 and 2. For example, for two items each with three categories, there will be nine residual proportions per item pair. For a domain with x items, there are x(x-1)/2 item pairs. Therefore, for the Emotional Maturity domain for example, which has 30 items each with three categories, there will be 435 item pairs each with nine residual proportions, for a total of 3,915 data points.

  2. 2.

    Since the residual proportions for any item pair must necessarily add up to zero, the relative size of the residuals is assessed using the strategy of squaring and later taking the square root. Therefore, the next step is to square all of the residual proportions.

  3. 3.

    The mean of all the squared residuals is then calculated.

  4. 4.

    Finally, the square root of the mean squared residuals is taken, resulting in the unscaled RMR-P statistic.

  5. 5.

    The theoretical maximum value of the unscaled RMR-P depends on the number of categories in each item (as shown below). Because some EDI items are binary and others are three-category ordinal, a scaling factor needs to be applied to set a common scale for the RMR-P for each domain. The calculation of this scaling factor is as follows:

    1. a.

      For any item pair, regardless of the number of categories, the maximum sum of the squared residual proportions is 2. This follows mathematically from the observations that the sum of all observed proportions must equal 1, and the sum of all residual proportions must equal 0. The poorest possible model fit is when one residual proportion is equal to 1 and a second residual proportion is equal to (−1), and the rest of the residuals are equal to 0. In this case, the sum of the squared residual proportions is equal to 2 [i.e., 12 + (−1)2]. Any other combination of possible residual proportions will result in a lower sum of squares.

    2. b.

      For a domain with only binary items (i.e., Language and Cognitive Development), the maximum RMR-P statistic is 0.71. This is because in step 3, the maximum mean of the squared residuals would be 0.5 (maximum of 2 for each four squared residual proportions); the square root of 0.5 (step 4) is 0.71. For a domain with only three-category items (i.e., Social Competence, Emotional Maturity, and Communication and General Knowledge), the maximum unscaled RMR-P is 0.47, based on a maximum mean of the squared residuals of 0.22 (maximum of 2 for each nine squared residual proportions); the square root of 0.22 is 0.47.

    3. c.

      The only EDI domain with a mixture of binary and three-category items is Physical Health and Well-being. The maximum RMR-P for this domain is .59, based on weighting the number of item pairs with four (2 × 2), six (2 × 3), and nine (3 × 3) residual proportions.

    4. d.

      Having calculated the maximum RMR-P for each domain, these maxima can then be used as scaling factors, so that for all domains, the range of the RMR-P statistic is from 0 to 1. Therefore, the unscaled RMR-P statistics resulting from step 4 are divided by the appropriate scaling factor for each domain.

    5. e.

      These final RMR-P statistics are now on the same scale used for the SRMR. This certainly does not mean that RMR-P scores are in any way standardized, but it does result in a metric with the same range as the SRMR. On the basis of this similar metric, and given that the RMSR fit statistic is itself not based on any theoretical distribution, the same conventional .05 cutoff is then applied to the scaled RMR-P statistic when assessing model fit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forer, B., Zumbo, B.D. Validation of Multilevel Constructs: Validation Methods and Empirical Findings for the EDI. Soc Indic Res 103, 231–265 (2011). https://doi.org/10.1007/s11205-011-9844-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11205-011-9844-3

Keywords

Navigation