A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies

doi:10.1016/S0895-4356(03)00007-6

Journal of Clinical Epidemiology

Volume 56, Issue 4, April 2003, Pages 341-350

https://doi.org/10.1016/S0895-4356(03)00007-6 Get rights and content

Abstract

This article reviews three traditional methods for the analysis of multicenter trials with persons nested within clusters, i.e., centers, namely naı̈ve regression (persons as units of analysis), fixed effects regression, and the use of summary measures (clusters as units of analysis), and compares these methods with multilevel regression. The comparison is made for continuous (quantitative) outcomes, and is based on the estimator of the treatment effect and its standard error, because these usually are of main interest in intervention studies. When the results of the experiment have to be valid for some larger population of centers, the centers in the intervention study have to present a random sample from this population and multilevel regression may be used. It is shown that the treatment effect and especially its standard error, are generally incorrectly estimated by the traditional methods, which should, therefore, not in general be used as an alternative to multilevel regression.

Introduction

In the health and medical sciences, experiments are conducted to compare different treatments in terms of outcome variables measuring the health or behavior of individuals. In this article we focus on the situation where the data obtained have a nested or hierarchical structure, which means that individuals are nested within clusters. For example, in a clinical trial on the effect of different antipsychotics on the mental health, patients were nested within centers [1]. In a trial where a new approach for the detection and managing of hypertension was studied, patients were nested within family practices [2]. Children were nested within villages in a study on the effect of vitamin A supplementation on childhood mortality in north Sumatra [3], and in a smoking prevention intervention, pupils were nested within classes within schools [4], [5]. Outcomes of individuals within the same cluster are likely to be correlated, that is, there will be intracluster correlation.

Data from a smoking prevention intervention [4], [5] will be used in this article. To keep things simple we will ignore the nesting of classes within schools leaving two levels of nesting: pupils within classes. Similar but more complicated results hold for three levels of nesting. Of course, the methods presented and conclusions drawn in this article are valid for any kind of experiment where persons are nested within clusters, for instance, multicenter clinical trials with patients nested within clinics. Thus, the reader may replace the words smoking prevention intervention, pupil, and class used in this article by terminology from his/her field of science.

The effect of the smoking prevention intervention on smoking behavior can be estimated and tested with regression, in which the outcome variable is regressed on treatment condition and relevant covariates. In the literature, several types of regression are being used for nested experimental data. Three traditional regression methods are naive regression, fixed effects regression, and regression of summary measures. In the naive regression, pupils are the unit of analysis and their nesting within classes, that is, the dependency among the outcomes of pupils within a class, is ignored. In fixed effects regression, classes are treated as fixed, and their differences are taken into account by dummy coding in the regression equation. Treating classes as fixed implies that statistical inference only takes sampling error at the pupil level into account, not sampling error at the class level, and conclusions are, therefore, limited to the classes in the study. The summary measures method is based upon aggregation of pupil level data within the same treatment condition to the class level, and classes are thus the unit of analysis.

Multilevel regression [6], [7], [8], [9] treats pupils as the unit of analysis, but also takes into account the dependence of outcomes of pupils nested within the same class. The multilevel regression model is also referred to as mixed effects regression, random coefficient model [10], or hierarchical linear model [11], and assumes the classes and pupils to represent random samples from some population of classes and pupils within classes, respectively. Under this assumption class and pupil effects must be treated as random effects in the regression model, while treatment condition and covariates may be included as fixed effects.

Ideally, the aim of smoking-prevention interventions should be to produce results not only valid for the classes involved in the experiment, but also for a larger population of classes. In that case, the classes involved in the trial have to represent a random sample from the population of classes, and multilevel analysis is a suitable method of analysis. In practice, there may be good reasons for treating classes as fixed, for instance, when the number of classes in the trial is very small, say less than 10 [9], [12]. In this article, however, we will focus on the situation where the classes involved in the trial are treated as a not too small random sample from a much larger population of classes.

Multilevel regression is more complex than the more traditional methods, and consequently, investigators may still want to use these traditional methods, even if they want to generalize the results from their trial to all classes in the population. Therefore, a comparison between the traditional methods and multilevel regression in the context of nested experimental data is relevant. In this article, the relationship between the four methods will be discussed, and it will be shown under which circumstances the traditional methods are acceptable, and when and how they may lead to incorrect results. The comparison made in this article is based on a few regression equations and an illustrative example for (a) the estimator of the treatment effect, and (b) its squared standard error, because these two are of main interest in intervention studies. The comparison is made for continuous outcomes, two levels of nesting, and with randomization at either level. For randomization at the class level, classes will be randomly allocated to the treatment conditions, and all pupils within each class receive the same treatment. For randomization at the pupil level, half of the pupils within each class will be randomly assigned to the treatment group while the others will be allocated to the control group.

Part of the comparison has already been made by others, but has been published fragmentarily in various articles [13], [14], [15], [16], [17], [18], [19], [20]. In the present article, these results will be presented systematically, and some gaps in knowledge will be filled up. Again, we want to stress that in this article multilevel regression and more traditional methods for experimental data with one posttreatment measurement per person are presented, assuming that the assignment of persons to different conditions is under experimental control. Multilevel regression may also be used for observation and/or longitudinal studies [21], [22].

The remainder of this article is as follows: in Section 2 an example data set of a smoking prevention intervention and two different designs for such trials are given. Naive regression, fixed effect regression, and regression of summary measures are presented in Section 3. Section 4 focuses on multilevel regression. In Section 5, the four methods are used to analyze generated data sets, and it is shown that these methods lead to different results. This difference in results will also be explained using a few simple mathematical expressions in the appendix. In Sections 3 to 5 we assume equal class sizes and no covariates, but in Section 6 these assumptions will be relaxed. In Section 7 some conclusions will be presented.

Section snippets

Designs and example data set

In principle, randomization and implementation of the two treatments may be done at either level of the hierarchy. So two different designs may be distinguished: Design 1, where randomization is done at the pupil level within each class, and Design 2, where randomization is done at the class level. The latter is often referred to as cluster randomization. For nonvarying class sizes we have a sample of n₂ classes and n₁ individuals per class. In Design 1, $12 n_{1}$ pupils per class are randomized to

Traditional methods

Three more traditional regression methods for the analysis of multicenter trial data are naı̈ve regression, fixed effects regression, and regression of summary measures. These methods are presented in this section.

Design 1: randomization at the pupil level

In multilevel modeling, regression equations are formulated for each level (pupil, class) of the multilevel data structure, and are then combined into a single equation. For randomization at the pupil level, the pupil level equation is given by: $y_{ij} = β_{0j} + β_{1j} x_{ij} +e_{ij} .$ where e_ij is a random error term at the pupil level, and i and j refer to pupil and class, respectively. Again, the (−1, +1) coding scheme for x_ij was used, because of the advantages mentioned in Section 3.1. β_0j is the mean of y_ij

Comparison of the four methods

For illustrative purposes we generated a data set with n₂ = 70 classes with n₁ = 12 pupils each for each level of randomization. We used the parameter values β₀ = 2.34, β₁ = 0.12, σ_u² = 0.16 and σ_e² = 1.72. For randomization at the pupil level the variance σ_u² was split up into σ_u0² = 0.1, σ_u1² = 0.06. These two data sets were analyzed with multilevel regression, naive regression, fixed effects regression, and regression of summary measures. REML estimation as implemented in the computer program MLwiN for

Generalization to more complex regression models

The results in the previous section are limited to equal class sizes and regression models with no covariates. Equal class sizes may not be feasible in practice, and often covariates have to be included into the regression model. In this section, these restrictions will be relaxed one at a time. The comparisons are based upon analysis of the TVSP data, with restriction to the Los Angeles pupils in the media or no-treatment control group. Two levels of nesting are taken into account: pupils

Conclusions

In this study four methods for the analysis of multilevel experimental data were compared: multilevel analysis, naive regression (persons as unit of analysis), fixed-effects regression, and the use of summary measures (clusters as unit of analysis). It was assumed that the conditions for random sampling of clusters from a larger population of clusters were satisfied, so that the experimental results were not only valid for the clusters in the study, but could also be generalized to the

Acknowledgements

We wish to thank Brian R. Flay for his permission to use the TVSFP data, which were collected with funding from the National Institute of Drug Abuse, Grant 1-R01-DA03468 to Brian R. Flay, W. B. Hansen, and C. A. Johnson. We wish to thank Hubert J. A. Schouten and Martin H. Prins for their comments on this article.

References (38)

A. Sommer et al.
Impact of vitamin A supplementation on childhood mortality. A randomized controlled community trial
Lancet
(1986)
B.R. Flay et al.
The television school and family smoking prevention and cessation project I. Theoretical basis and program development
Prev Med
(1988)
B.R. Flay et al.
The television, school, and family smoking prevention and cessation project. VIII. Student outcomes and mediating variables
Prev Med
(1995)
D. Hedeker et al.
Random regression models for multicenter clinical trials data
Psychopharmacol Bull
(1991)
M.J. Bass et al.
Do family physicians need medical assistance to detect and manage hypertension?
Can Med Assoc J
(1986)
H. Goldstein
Multilevel statistical models
(1995)
J.J. Hox
Multilevel analysis: Techniques and applications
(2002)
I. Kreft et al.
Introducing multilevel modelling
(1998)
T.A.B. Snijders et al.
Multilevel analysis: an introduction to basic and advanced multilevel modelling
(1999)
N.T. Longford
Random coefficient models
(1995)

A.S. Bryk et al.

Hierarchical linear models

(1992)

S. Senn

Some controversies in planning and analyzing multi-centre trials

Stat Med

(1998)

S.W. Raudenbush

Hierarchical linear models and experimental design

A.L. Gould

Multi-centre trial analysis revisited

Stat Med

(1998)

B. Jones et al.

A comparison of various estimator of treatment difference for a multi-centre clinical trial

Stat Med

(1998)

M. Parzen et al.

Does clustering affect the usual test statistics of no treatment effect in a randomized clinical trial?

Biometrical J

(1998)

D.D. Dunlop

Regression for longitudinal data: a bridge from least squares regression

Am Stat

(1994)

K.D. Hopkins

The unit of analysis: group means versus individual observations

Am Educ Res J

(1982)

R.S. Barcikowski

Statistical power with group mean as the unit of analysis

J Educ Statistics

(1981)

Cited by (112)

Trajectories of amyloid beta accumulation – Unveiling the relationship with APOE genotype and cognitive decline
2024, Neurobiology of Aging
Amyloid beta (Aβ) follows a sigmoidal time function with varying accumulation rates. We studied how the position on this function, reflected by different Aβ accumulation phases, influences APOE ɛ4’s association with Aβ and cognitive decline in 503 participants without dementia using Aβ-PET imaging over 5.3-years. First, Aβ load and accumulation were analyzed irrespective of phases using linear mixed regression. Generally, ɛ4 carriers displayed a higher Aβ load. Moreover, Aβ normal (Aβ-) ɛ4 carriers demonstrated higher accumulation. Next, we categorized accumulation phases as “decrease”, “stable”, or “increase” based on trajectory shapes. After excluding the Aβ-/decrease participants from the initial regression, the difference in accumulation attributable to genotype among Aβ- individuals was no longer significant. Further analysis revealed that in increase phases, Aβ accumulation was higher among noncarriers, indicating a genotype-related timeline shift. Finally, cognitive decline was analyzed across phases and was already evident in the Aβ-/increase phase. Our results encourage early interventions for ɛ4 carriers and imply that monitoring accumulating Aβ- individuals might help identify those at risk for cognitive decline.
Designing and testing treatments for alcohol use disorder
2024, International Review of Neurobiology
This chapter provides a succinct overview of several recommendations for the design and analysis of treatments for AUD with a specific focus on increasing rigor and generalizability of treatment studies in order to increase the reach of AUD treatment. We recommend that researchers always register their trials in a clinical trial registry and make the protocol accessible so that the trial can be replicated in future work, follow CONSORT reporting guidelines when reporting the results of the trial, carefully describe all inclusion and exclusion criteria as well as the randomization scheme, and always use an intent to treat design with attention to analysis of missing data. In addition, we recommend that researchers pay closer attention to recruitment and engagement strategies that increase enrollment and retention of historically marginalized and understudied populations, and we end with a plea for more consideration of implementation science approaches to increase the dissemination and implementation of AUD treatment in real-world settings.
Performance of methods for analyzing continuous data from stratified cluster randomized trials – A simulation study
2023, Contemporary Clinical Trials Communications
The adoption of cluster randomized trials (CRTs) with the stratified design is currently gaining widespread interest. In the stratified design, clusters are first grouped into two or more strata and then randomized into treatment groups within each stratum. In this study, we evaluated the performance of several commonly used methods for analyzing continuous data from stratified CRTs.
This is a simulation study where we compared four methods: mixed-effects, generalized estimating equation (GEE), cluster-level (CL) linear regression and meta-regression methods to analyze the continuous data from stratified CRTs using a simulation study with varying numbers of clusters, cluster sizes, intra-cluster correlation coefficients (ICCs) and effect sizes. This study was based on a stratified CRT with one stratification variable with two strata. The performance of the methods was evaluated in terms of the type I error rate, empirical power, root mean square error (RMSE), and width and coverage of the 95% confidence interval (CI).
GEE and meta-regression methods had high type I error rates, higher than 10%, for the small number of clusters. All methods had similar accuracy, measured through RMSE, except meta-regression. Similarly, all methods but meta-regression had similar widths of 95% CIs for the small number of clusters. For the same sample size, the empirical power for all methods decreased as the value of the ICC increased.
In this study, we evaluated the performance of several methods for analyzing continuous data from stratified CRTs. Meta-regression was the least efficient method compared to other methods.
Automated evaluation of respiratory signals to provide insight into respiratory drive
2022, Respiratory Physiology and Neurobiology
The diaphragm muscle (DIAm) is the primary inspiratory muscle in mammals and is highly active throughout life displaying rhythmic activity. The repetitive activation of the DIAm (and of other muscles driven by central pattern generator activity) presents an opportunity to analyze these physiological data on a per-event basis rather than pooled on a per-subject basis. The present study highlights the development and implementation of a graphical user interface-based algorithm using an analysis of critical points to detect the onsets and offsets of individual respiratory events across a range of motor behaviors, thus facilitating analyses of within-subject variability. The algorithm is designed to be robust regardless of the signal type (e.g., EMG or transdiaphragmatic pressure). Our findings suggest that this approach may be particularly beneficial in reducing animal numbers in certain types of studies, for assessments of perturbation studies where the effects are relatively small but potentially physiologically meaningful, and for analyses of respiratory variability.
Bayesian multilevel single case models using ‘Stan’. A new tool to study single cases in neuropsychology
2021, Neuropsychologia
Single case studies continue to play an important role in neuropsychological research. However, the range of statistical tools specifically designed for single cases is still limited. The current gold standard is the Crawford's t-test, but it is crucial to note that this is limited to simple designs and it is not possible to make inferences relevant to support for the null hypothesis with it. The Bayesian Multilevel Single Case models (BMSC) provide a novel tool that grants the flexibility of linear mixed model designs. BMSC is also able to support both null and alternative hypotheses in complex experimental designs using the Bayesian framework. We compared the BMSC and Crawford's t-test in a simulation study involving a case of no-dissociation and a case of simple dissociation between a single case patient and a series of control groups of different sizes (N = 5, 15, or 30). We then showed how BMSC is useful in complex designs by means of an example using real data. The BMSC proved to be more reliable than the Crawford's test, in terms of first-type errors and more precise estimating the parameters. Notably, the BMSC model provides a comprehensive vision of the whole experimental design, interpolating a single model. It follows the recent trend which involves a shift in attention from p-values to other inferential indices and estimates.
Robustness of cost-effectiveness analyses of cluster randomized trials assuming bivariate normality against skewed cost data
2021, Computational Statistics and Data Analysis
The bivariate normal multilevel model (MLM) provides a flexible modeling framework for cost-effectiveness analyses (CEAs) alongside cluster randomized trials (CRTs) as well as for sample size calculations of these trials. The bivariate MLM assumes a joint normal distribution for effects and costs, both within (individual level) and between (cluster level) clusters. A typical problem in CEAs is that costs are often associated with right-skewed distributions (e.g., gamma or lognormal), which make it sometimes difficult to justify the modeling of the data based on normality assumptions. The robustness of CEAs of CRTs based on the bivariate normal MLM to non-normal cost distributions at both cluster and individual level are investigated. Normal, gamma, and lognormal distributions are considered using scenarios that differ in the number of clusters, the number of persons per cluster, the covariance parameters of the model, and the level of skewness in the cost data. It is shown that CEA of CRTs, and therefore sample size calculation, based on the bivariate normal MLM, is quite robust against highly skewed costs across a wide range of scenarios. This robustness holds especially with respect to the type I error rate and the power. In terms of bias in variance component estimation and standard errors of fixed effects, large bias can occur in small samples. However, these biases do not appear to translate into any serious deviation of the type I error rate or power from the nominal level.

View all citing articles on Scopus

View full text

A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies

Abstract

Introduction

Section snippets

Designs and example data set

Traditional methods

Design 1: randomization at the pupil level

Comparison of the four methods

Generalization to more complex regression models

Conclusions

Acknowledgements

Lancet

Prev Med

Prev Med

Random regression models for multicenter clinical trials data

Psychopharmacol Bull

Do family physicians need medical assistance to detect and manage hypertension?

Can Med Assoc J

Multilevel statistical models

Multilevel analysis: Techniques and applications

Introducing multilevel modelling

Multilevel analysis: an introduction to basic and advanced multilevel modelling

Random coefficient models

Hierarchical linear models

Some controversies in planning and analyzing multi-centre trials

Stat Med

Hierarchical linear models and experimental design

Multi-centre trial analysis revisited

Stat Med

A comparison of various estimator of treatment difference for a multi-centre clinical trial

Stat Med

Does clustering affect the usual test statistics of no treatment effect in a randomized clinical trial?

Biometrical J

Regression for longitudinal data: a bridge from least squares regression

Am Stat

The unit of analysis: group means versus individual observations

Am Educ Res J

Statistical power with group mean as the unit of analysis

J Educ Statistics