A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies

https://doi.org/10.1016/S0895-4356(03)00007-6Get rights and content

Abstract

This article reviews three traditional methods for the analysis of multicenter trials with persons nested within clusters, i.e., centers, namely naı̈ve regression (persons as units of analysis), fixed effects regression, and the use of summary measures (clusters as units of analysis), and compares these methods with multilevel regression. The comparison is made for continuous (quantitative) outcomes, and is based on the estimator of the treatment effect and its standard error, because these usually are of main interest in intervention studies. When the results of the experiment have to be valid for some larger population of centers, the centers in the intervention study have to present a random sample from this population and multilevel regression may be used. It is shown that the treatment effect and especially its standard error, are generally incorrectly estimated by the traditional methods, which should, therefore, not in general be used as an alternative to multilevel regression.

Introduction

In the health and medical sciences, experiments are conducted to compare different treatments in terms of outcome variables measuring the health or behavior of individuals. In this article we focus on the situation where the data obtained have a nested or hierarchical structure, which means that individuals are nested within clusters. For example, in a clinical trial on the effect of different antipsychotics on the mental health, patients were nested within centers [1]. In a trial where a new approach for the detection and managing of hypertension was studied, patients were nested within family practices [2]. Children were nested within villages in a study on the effect of vitamin A supplementation on childhood mortality in north Sumatra [3], and in a smoking prevention intervention, pupils were nested within classes within schools [4], [5]. Outcomes of individuals within the same cluster are likely to be correlated, that is, there will be intracluster correlation.

Data from a smoking prevention intervention [4], [5] will be used in this article. To keep things simple we will ignore the nesting of classes within schools leaving two levels of nesting: pupils within classes. Similar but more complicated results hold for three levels of nesting. Of course, the methods presented and conclusions drawn in this article are valid for any kind of experiment where persons are nested within clusters, for instance, multicenter clinical trials with patients nested within clinics. Thus, the reader may replace the words smoking prevention intervention, pupil, and class used in this article by terminology from his/her field of science.

The effect of the smoking prevention intervention on smoking behavior can be estimated and tested with regression, in which the outcome variable is regressed on treatment condition and relevant covariates. In the literature, several types of regression are being used for nested experimental data. Three traditional regression methods are naive regression, fixed effects regression, and regression of summary measures. In the naive regression, pupils are the unit of analysis and their nesting within classes, that is, the dependency among the outcomes of pupils within a class, is ignored. In fixed effects regression, classes are treated as fixed, and their differences are taken into account by dummy coding in the regression equation. Treating classes as fixed implies that statistical inference only takes sampling error at the pupil level into account, not sampling error at the class level, and conclusions are, therefore, limited to the classes in the study. The summary measures method is based upon aggregation of pupil level data within the same treatment condition to the class level, and classes are thus the unit of analysis.

Multilevel regression [6], [7], [8], [9] treats pupils as the unit of analysis, but also takes into account the dependence of outcomes of pupils nested within the same class. The multilevel regression model is also referred to as mixed effects regression, random coefficient model [10], or hierarchical linear model [11], and assumes the classes and pupils to represent random samples from some population of classes and pupils within classes, respectively. Under this assumption class and pupil effects must be treated as random effects in the regression model, while treatment condition and covariates may be included as fixed effects.

Ideally, the aim of smoking-prevention interventions should be to produce results not only valid for the classes involved in the experiment, but also for a larger population of classes. In that case, the classes involved in the trial have to represent a random sample from the population of classes, and multilevel analysis is a suitable method of analysis. In practice, there may be good reasons for treating classes as fixed, for instance, when the number of classes in the trial is very small, say less than 10 [9], [12]. In this article, however, we will focus on the situation where the classes involved in the trial are treated as a not too small random sample from a much larger population of classes.

Multilevel regression is more complex than the more traditional methods, and consequently, investigators may still want to use these traditional methods, even if they want to generalize the results from their trial to all classes in the population. Therefore, a comparison between the traditional methods and multilevel regression in the context of nested experimental data is relevant. In this article, the relationship between the four methods will be discussed, and it will be shown under which circumstances the traditional methods are acceptable, and when and how they may lead to incorrect results. The comparison made in this article is based on a few regression equations and an illustrative example for (a) the estimator of the treatment effect, and (b) its squared standard error, because these two are of main interest in intervention studies. The comparison is made for continuous outcomes, two levels of nesting, and with randomization at either level. For randomization at the class level, classes will be randomly allocated to the treatment conditions, and all pupils within each class receive the same treatment. For randomization at the pupil level, half of the pupils within each class will be randomly assigned to the treatment group while the others will be allocated to the control group.

Part of the comparison has already been made by others, but has been published fragmentarily in various articles [13], [14], [15], [16], [17], [18], [19], [20]. In the present article, these results will be presented systematically, and some gaps in knowledge will be filled up. Again, we want to stress that in this article multilevel regression and more traditional methods for experimental data with one posttreatment measurement per person are presented, assuming that the assignment of persons to different conditions is under experimental control. Multilevel regression may also be used for observation and/or longitudinal studies [21], [22].

The remainder of this article is as follows: in Section 2 an example data set of a smoking prevention intervention and two different designs for such trials are given. Naive regression, fixed effect regression, and regression of summary measures are presented in Section 3. Section 4 focuses on multilevel regression. In Section 5, the four methods are used to analyze generated data sets, and it is shown that these methods lead to different results. This difference in results will also be explained using a few simple mathematical expressions in the appendix. In Sections 3 to 5 we assume equal class sizes and no covariates, but in Section 6 these assumptions will be relaxed. In Section 7 some conclusions will be presented.

Section snippets

Designs and example data set

In principle, randomization and implementation of the two treatments may be done at either level of the hierarchy. So two different designs may be distinguished: Design 1, where randomization is done at the pupil level within each class, and Design 2, where randomization is done at the class level. The latter is often referred to as cluster randomization. For nonvarying class sizes we have a sample of n2 classes and n1 individuals per class. In Design 1, 12n1 pupils per class are randomized to

Traditional methods

Three more traditional regression methods for the analysis of multicenter trial data are naı̈ve regression, fixed effects regression, and regression of summary measures. These methods are presented in this section.

Design 1: randomization at the pupil level

In multilevel modeling, regression equations are formulated for each level (pupil, class) of the multilevel data structure, and are then combined into a single equation. For randomization at the pupil level, the pupil level equation is given by:yij=β0j+β1jxij+eij.where eij is a random error term at the pupil level, and i and j refer to pupil and class, respectively. Again, the (−1, +1) coding scheme for xij was used, because of the advantages mentioned in Section 3.1. β0j is the mean of yij

Comparison of the four methods

For illustrative purposes we generated a data set with n2 = 70 classes with n1 = 12 pupils each for each level of randomization. We used the parameter values β0 = 2.34, β1 = 0.12, σu2 = 0.16 and σe2 = 1.72. For randomization at the pupil level the variance σu2 was split up into σu02 = 0.1, σu12 = 0.06. These two data sets were analyzed with multilevel regression, naive regression, fixed effects regression, and regression of summary measures. REML estimation as implemented in the computer program MLwiN for

Generalization to more complex regression models

The results in the previous section are limited to equal class sizes and regression models with no covariates. Equal class sizes may not be feasible in practice, and often covariates have to be included into the regression model. In this section, these restrictions will be relaxed one at a time. The comparisons are based upon analysis of the TVSP data, with restriction to the Los Angeles pupils in the media or no-treatment control group. Two levels of nesting are taken into account: pupils

Conclusions

In this study four methods for the analysis of multilevel experimental data were compared: multilevel analysis, naive regression (persons as unit of analysis), fixed-effects regression, and the use of summary measures (clusters as unit of analysis). It was assumed that the conditions for random sampling of clusters from a larger population of clusters were satisfied, so that the experimental results were not only valid for the clusters in the study, but could also be generalized to the

Acknowledgements

We wish to thank Brian R. Flay for his permission to use the TVSFP data, which were collected with funding from the National Institute of Drug Abuse, Grant 1-R01-DA03468 to Brian R. Flay, W. B. Hansen, and C. A. Johnson. We wish to thank Hubert J. A. Schouten and Martin H. Prins for their comments on this article.

References (38)

  • A.S. Bryk et al.

    Hierarchical linear models

    (1992)
  • S. Senn

    Some controversies in planning and analyzing multi-centre trials

    Stat Med

    (1998)
  • S.W. Raudenbush

    Hierarchical linear models and experimental design

  • A.L. Gould

    Multi-centre trial analysis revisited

    Stat Med

    (1998)
  • B. Jones et al.

    A comparison of various estimator of treatment difference for a multi-centre clinical trial

    Stat Med

    (1998)
  • M. Parzen et al.

    Does clustering affect the usual test statistics of no treatment effect in a randomized clinical trial?

    Biometrical J

    (1998)
  • D.D. Dunlop

    Regression for longitudinal data: a bridge from least squares regression

    Am Stat

    (1994)
  • K.D. Hopkins

    The unit of analysis: group means versus individual observations

    Am Educ Res J

    (1982)
  • R.S. Barcikowski

    Statistical power with group mean as the unit of analysis

    J Educ Statistics

    (1981)
  • Cited by (112)

    • Designing and testing treatments for alcohol use disorder

      2024, International Review of Neurobiology
    View all citing articles on Scopus
    View full text