Abstract
The purposes of this paper are to highlight the foundations of multilevel construct validation, describe two methodological approaches and associated analytic techniques, and then apply these approaches and techniques to the multilevel construct validation of a widely-used school readiness measure called the Early Development Instrument (EDI; Janus and Offord 2007). Validation evidence is presented regarding the multilevel covariance structure of the EDI, the appropriateness of aggregation to classroom and neighbourhood levels, and the effects of teacher and classroom characteristics on these structural patterns. The results are then discussed in the context of the theoretical framework of the EDI, with suggestions for future validation work.
Similar content being viewed by others
References
Andrich, D., & Styles, I. (2004). Report on the psychometric properties of the Early Development Instrument (EDI) using the Rasch model. A technical paper commissioned for the development of the Australian Early Development Instrument (AEDI), Murdoch University.
Beswick, J. F., Sloat, E. A., & Willms, J. D. (2004). A comparative study of bias in teacher ratings of language and emergent literacy skills. Unpublished manuscript.
Blatchford, P., Bassett, P., Goldstein, H., & Martin, C. (2003). Are class size differences related to pupils’ educational progress and classroom processes? Findings from the institute of education class size study of children aged 5 to 7 years old. British Educational Research Journal, 29, 709–730.
Blatchford, P., Russell, A., Bassett, P., Brown, P., & Martin, C. (2007). The effect of class size on the teaching of pupils aged 7–11 years old. School Effectiveness and School Improvement, 18, 147–172.
Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In K. J. Klein & S. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 349–381). San Francisco: Jossey-Bass.
Bliese, P. D., & Halverson, R. R. (1998). Group size and measures of group-level properties: An examination of eta-squared and ICC values. Journal of Management, 24, 157–172.
Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110, 305–314.
Brinkman, S., Silburn, S., Lawrence, D., Goldfeld, S., Sayers, M., & Oberklaid, F. (2007). Investigating the validity of the Australian Early Development Index. Early Education and Development, 18, 427–451.
Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. American Psychologist, 32, 513–531.
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models. Thousand Oaks, CA: Sage.
Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: Erlbaum.
Carpiano, R. M., Lloyd, J. E. V., & Hertzman, C. (2009). Concentrated affluence, concentrated disadvantage, and children’s readiness for school: A population-based, multi-level investigation. Social Science and Medicine, 69, 420–432.
Chan, D. (1998). Functional relations among constructs in the same content domain at different levels of analysis: A typology of composition models. Journal of Applied Psychology, 83, 234–246.
Chen, G., Mathieu, J. E., & Bliese, P. D. (2004a). A framework for conducting multi-level construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behaviour and processes. (pp. 273–303). The Netherlands: Elsevier.
Chen, G., Mathieu, J. E., & Bliese, P. D. (2004b). Validating frogs and ponds in multi-level contexts: Some afterthoughts. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behaviour and processes. (pp. 335–343). The Netherlands: Elsevier.
Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage Publications, Inc.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Erlbaum.
Cronbach, L., & Meehl, P. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
Dansereau, F., Alutto, J. A., & Yammarino, F. J. (1984). Theory testing in organizational behavior: The “varient” approach. Englewood Cliffs, NJ: Prentice-Hall.
Dansereau, F., Cho, J., & Yammarino, F. J. (2006). Avoiding the fallacy of the wrong level. Group and Organization Management, 31, 536–577.
Dansereau, F., & Yammarino, F. J. (2000). Within and between analysis: The varient paradigm as an underlying approach to theory building and testing. In K. J. Klein & S. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 425–466). San Francisco: Jossey-Bass.
Dansereau, F., & Yammarino, F. J. (2004). Overview: Multi-level issues in organizational behavior and processes. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behaviour and processes (pp. Xiii–xxxiii). The Netherlands: Elsevier.
Dansereau, F., & Yammarino, F. J. (2006). Is more discussion about levels of analysis really necessary? When is such discussion sufficient? The Leadership Quarterly, 17, 537–552.
Diez-Roux, A. V. (1998). Bringing context back into epidemiology: Variables and fallacies in multilevel analysis. American Journal of Public Health, 88, 216–222.
Duku, E., & Janus, M. (2004). Stability and reliability of the Early Development Instrument: A population-based measure for communities (EDI). Department of Psychiatry and Biobehavioural Sciences, McMaster University, Canada. Annual Research Day.
Duncan, G. J., Claessens, A., Huston, A. C., Pagani, L. S., Engel, M., Sexton, H., et al. (2007). School readiness and later achievement. Developmental Psychology, 43, 1428–1446.
Ellwein, M. C., Walsh, D. J., Eads, G. M., & Miller, A. (1991). Using readiness tests to route kindergarten students: The snarled intersection of psychometrics, policy, and practice. Educational Evaluation and Policy Analysis, 13, 159–175.
Finn, J. D., Pannozzo, G. M., & Achilles, C. M. (2003). The “whys” of class size: Student behavior in small classes. Review of Educational Research, 73, 321–368.
Forer, B. (2009). Validation of multilevel constructs: Methods and empirical findings for the Early Development Instrument. Unpublished doctoral dissertation, University of British Columbia.
Forget-Dubois, N., Lemelin, J.-P., Boivin, M., Ginette, D., Séguin, J. R., Vitaro, F., et al. (2007). Predicting early school achievement with the EDI: A longitudinal population-based study. Early Education and Development, 18, 405–426.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.
Griffith, J. (2002). Is quality/effectiveness an empirically demonstrable school attribute? Statistical aids for determining appropriate levels of analysis. School Effectiveness and School Improvement, 13, 91–122.
Grilli, L., & Rampichini, C. (2007). Multilevel factor models for ordinal variables. Structural Equation Modeling, 14, 1–25.
Guhn, M. & Goelman, H. (2011). Bioecological theory, early child development, and the validation of the population-level Early Development Instrument. Social Indicators Research. doi:10.1007/s11205-011-9842-5.
Guhn, M., Janus, M., & Hertzman, C. (2007). The Early Development Instrument: Translating school readiness assessment into community actions and policy planning. Early Education and Development, 18, 369–374.
Hofmann, D. A., & Jones, L. M. (2004). Some foundational and guiding questions for multi-level construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behavior and processes (pp. 305–315). The Netherlands: Elsevier.
Hox, J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Erlbaum.
Hu, L. T., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 76–99). Thousand Oaks, CA: Sage.
Hymel, S., LeMare, L., & McKee, W. (2011). The Early Development Instrument (EDI): An examination of convergent and discriminant validity. Social Indicators Research. doi:10.1007/s11205-011-9845-2.
Janus, M. (2001). Validation of a teacher measure of school readiness with parent and child-care provider reports. Department of Psychiatry and Biobehavioural Sciences, McMaster University, Canada. Annual Research Day.
Janus, M. (2002). Validation of the Early Development Instrument in a sample of First Nations children. Department of Psychiatry and Biobehavioural Sciences, McMaster University, Canada. Annual Research Day.
Janus, M., Brinkman, S., Duku, E., Hertzman, C., Santos, R., Sayers, M., et al. (2007). The Early Development Instrument: A population-based measure for communities. A handbook on development, properties, and use. Hamilton, ON: Offord Centre for Child Studies.
Janus, M., Harren, T., & Duku, E. (2004). Neighbourhood perspective on school readiness in kindergarten, academic testing in Grade 3, and affluence levels. Paper presented at the McMaster University Psychiatry Research Day, Hamilton, ON.
Janus, M., & Offord, D. (2000). Readiness to learn at school. ISUMA Canadian Journal of Policy Research, 1, 71–75.
Janus, M., & Offord, D. (2007). Development and psychometric properties of the Early Development Instrument (EDI): A measure of children’s school readiness. Canadian Journal of Behavioural Science, 39, 1–22.
Janus, M., Offord, D., & Walsh, C. (2001). Population-level assessment of readiness to learn at school for 5-year-olds in Canada: Relation to child and parent measures. Presented at the Society for Research on Child Development conference, Minneapolis.
Janus, M., Walsh, C., & Duku, E. (2005). Early development instrument: Factor structure, sub-domains and Multiple Challenge Index. Department of Psychiatry and Biobehavioural Sciences, McMaster University, Annual Research Day.
Janus, M., Walsh, C., Viveiros, H., Duku, E., & Offord, D. (2003). School readiness to learn and neighbourhood characteristics. Poster presented at the Biennial meeting of the Society for Research on Child Development, Tampa, FL.
Janus, M., Willms, J. D., & Offord, D. R. (2000). Psychometric properties of the Early Development Instrument (EDI): A teacher-completed measure of children’s readiness to learn at school entry. Unpublished manuscript.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.
Keating, D. P. (2007). Formative evaluation of the Early Development Instrument: Progress and prospects. Early Education and Development, 18(3), 561–570.
Keating, D. P., & Hertzman, C. (Eds.). (1999). Developmental health and the wealth of nations. New York: Guildford.
Kershaw, P., & Forer, B. (2006). What are the social and economic indicators of nurturing neighborhoods? Poster presented at the American Educational Research Association conference, San Francisco, California.
Kershaw, P., Irwin, L., Trafford, K., & Hertzman, C. (2005). The British Columbia atlas of child development. Vancouver, BC: Human Early Learning Partnership.
Kim, K. (2004). An additional view of conducting multi-level construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behaviour and processes (pp. 317–333). The Netherlands: Elsevier.
Kim, J., & Suen, H. K. (2003). Predicting children’s academic achievement from early assessment scores: A validity generalization study. Early Childhood Research Quarterly, 18, 547–566.
Klein, K. J., Bliese, P. D., Kozlowski, S. W. J., Dansereau, F., Gavin, M. B., Griffin, M. A., et al. (2000). Multilevel analytic techniques: Commonalities, differences, and continuing questions. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 512–553). San Francisco: Jossey-Bass.
Klein, K. J., Dansereau, F., & Hall, R. J. (1994). Levels issues in theory development, data collection, and analysis. Academy of Management Review, 19, 195–229.
Kreiger, J. (2003). Class size reduction: Implementation and solutions. Paper presented at the SERVE Research and Policy Class Size Symposium, Raleigh, SC.
LaPointe, V. R. (2006). Conceptualizing and examining the impact of neighbourhoods on the school readiness of kindergarten children in British Columbia. Unpublished Doctoral Dissertation, Vancouver, The University of British Columbia.
LaPointe, V. R., Ford, L., & Zumbo, B. D. (2007). Examining the relationship between neighbourhood environment and school readiness for kindergarten children. Early Education and Development, 18, 473–496.
Linn, R. L. (2009). The concept of validity in the context of NCLB. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 195–212). Charlotte, NC: IAP-Information Age Publishing, Inc.
Lissitz, R. W. (2009). The concept of validity: Revisions, new directions and applications. Charlotte, NC.: IAP-Information Age Publishing, Inc.
Lloyd, J. E. V., & Hertzman, C. (2009). From Kindergarten readiness to fourth-grade assessment: Longitudinal analysis with linked population data. Social Science and Medicine, 68, 111–123.
Mashburn, A. J., Hamre, B. K., Downer, J. T., & Pianta, R. C. (2006). Teacher and classroom characteristics associated with teachers’ ratings of prekindergartners’ relationships and behaviors. Journal of Psychoeducational Assessment, 24, 367–380.
Meisels, S. J. (1997). Using Work Sampling in authentic performance assessments. Educational Leadership, 54, 60–65.
Meisels, S. J. (1999). Assessing readiness. In R. C. Pianta & M. J. Cox (Eds.), The transition to kindergarten (pp. 39–66). Baltimore, MD: Paul H. Brookes.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: Macmillan.
Muthen, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods and Research, 22, 276–398.
Muthen, L. K., & Muthen, B. O. (2006). Mplus User’s Guide, version 4. Los Angeles: Muthen and Muthen.
National Research Council Institute of Medicine. (2000). From neurons to neighbourhoods: The science of early childhood development. In J. P. Shonkoff & D. A. Phillips (Eds.), Committee on integrating the science of early childhood development (pp. 328–336). Washington, D.C.: National Academy Press.
O’Brien, R. M. (1990). Estimating the reliability of aggregate-level variables based on individual-level characteristics. Sociological Methods and Research, 18, 473–504.
Ostroff, C. (1992). The relationship between satisfaction, attitudes and performance: An organizational-level analysis. Journal of Applied Psychology, 77, 963–974.
Pedder, D. (2006). Are small classes better? Understanding relationships between class size, classroom processes, and pupils’ learning. Oxford Review of Education, 32, 213–234.
Pellegrini, A., & Blatchford, P. (2000). The child at school: Interactions with peers and teachers. London: Edward Arnold.
Province of British Columbia (2008). Strategic plan 2008/09 – 2010/11. Retrieved from http://www.bcbudget.gov.bc.ca/2008/stplan/2008_Strategic_Plan.pdf on September 18, 2008.
Rimm-Kaufman, S. E., & Pianta, R. C. (2000). An ecological perspective on the transition to kindergarten: A theoretical framework to guide empirical research. Journal of Applied Developmental Psychology, 21, 491–511.
Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351–357.
Rowan, B., Raudenbush, S. W., & Kang, S. K. (1991). School climate in secondary schools. In S. W. Raudenbush & J. D. Willms (Eds.), Schools, classrooms, and pupils: International studies of schooling from a multilevel perspective (pp. 203–223). San Diego: Academic.
Satorra, A., & Bentler, P. M. (1999). A scaled difference Chi-square test statistic for moment structure analysis. Technical Report, University of California, Los Angeles. http://preprints.stat.ucla.edu/260/chisquare.pdf.
Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York: Routledge.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. M. Cole, V. John-Steiner, S. Scribner, & E. Souberman (Eds.), Cambridge, MA: Harvard University Press.
Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics, Vol. 26: Psychometrics (pp. 45–79). Elsevier Science B.V.: The Netherlands.
Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: IAP-Information Age Publishing, Inc.
Zumbo, B. D. & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J. Bovaird, K. Geisinger, & C. Buckendahl (Eds.). High stakes testing in education—science and practice in K-12 settings (Festschrift to Barbara Plake). American Psychological Press: Washington, D.C.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
Steps in the Calculation of the Root Mean Square Residual–Proportion (RMR-P) Fit Statistic
The calculation of the RMR-P follows the same basic procedural steps used to calculate the Standardized Root Mean Square Residual fit statistic (SRMR; Hu and Bentler 1995) which is based on residual covariances.
-
1.
The calculation begins with the residual proportions that result from conducting a CFA with categorical items. For each pair of items, there will be c 1 *c 2 residual proportions, where c 1 and c 2 are the number of categories in items 1 and 2. For example, for two items each with three categories, there will be nine residual proportions per item pair. For a domain with x items, there are x(x-1)/2 item pairs. Therefore, for the Emotional Maturity domain for example, which has 30 items each with three categories, there will be 435 item pairs each with nine residual proportions, for a total of 3,915 data points.
-
2.
Since the residual proportions for any item pair must necessarily add up to zero, the relative size of the residuals is assessed using the strategy of squaring and later taking the square root. Therefore, the next step is to square all of the residual proportions.
-
3.
The mean of all the squared residuals is then calculated.
-
4.
Finally, the square root of the mean squared residuals is taken, resulting in the unscaled RMR-P statistic.
-
5.
The theoretical maximum value of the unscaled RMR-P depends on the number of categories in each item (as shown below). Because some EDI items are binary and others are three-category ordinal, a scaling factor needs to be applied to set a common scale for the RMR-P for each domain. The calculation of this scaling factor is as follows:
-
a.
For any item pair, regardless of the number of categories, the maximum sum of the squared residual proportions is 2. This follows mathematically from the observations that the sum of all observed proportions must equal 1, and the sum of all residual proportions must equal 0. The poorest possible model fit is when one residual proportion is equal to 1 and a second residual proportion is equal to (−1), and the rest of the residuals are equal to 0. In this case, the sum of the squared residual proportions is equal to 2 [i.e., 12 + (−1)2]. Any other combination of possible residual proportions will result in a lower sum of squares.
-
b.
For a domain with only binary items (i.e., Language and Cognitive Development), the maximum RMR-P statistic is 0.71. This is because in step 3, the maximum mean of the squared residuals would be 0.5 (maximum of 2 for each four squared residual proportions); the square root of 0.5 (step 4) is 0.71. For a domain with only three-category items (i.e., Social Competence, Emotional Maturity, and Communication and General Knowledge), the maximum unscaled RMR-P is 0.47, based on a maximum mean of the squared residuals of 0.22 (maximum of 2 for each nine squared residual proportions); the square root of 0.22 is 0.47.
-
c.
The only EDI domain with a mixture of binary and three-category items is Physical Health and Well-being. The maximum RMR-P for this domain is .59, based on weighting the number of item pairs with four (2 × 2), six (2 × 3), and nine (3 × 3) residual proportions.
-
d.
Having calculated the maximum RMR-P for each domain, these maxima can then be used as scaling factors, so that for all domains, the range of the RMR-P statistic is from 0 to 1. Therefore, the unscaled RMR-P statistics resulting from step 4 are divided by the appropriate scaling factor for each domain.
-
e.
These final RMR-P statistics are now on the same scale used for the SRMR. This certainly does not mean that RMR-P scores are in any way standardized, but it does result in a metric with the same range as the SRMR. On the basis of this similar metric, and given that the RMSR fit statistic is itself not based on any theoretical distribution, the same conventional .05 cutoff is then applied to the scaled RMR-P statistic when assessing model fit.
-
a.
Rights and permissions
About this article
Cite this article
Forer, B., Zumbo, B.D. Validation of Multilevel Constructs: Validation Methods and Empirical Findings for the EDI. Soc Indic Res 103, 231–265 (2011). https://doi.org/10.1007/s11205-011-9844-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11205-011-9844-3