ReviewStatistical procedures for analyzing mental health services data
Introduction
A large body of research has examined variables associated with the previous use of mental health services, using various conceptual frameworks (Bruce et al., 2002). Among large-scale community surveys, recent results have demonstrated that mental health service use is significantly associated with a number of variables including demographic characteristics, attitudes toward treatment, mental health diagnoses, and access variables (Bland et al., 1997, Kessler et al., 1998, Lin and Parikh, 1999, Parslow and Jorm, 2000, Lewis et al., 2005, Oliver et al., 2005, Wang et al., 2005, Elhai et al., 2006a, Elhai and Ford, 2007).
Several recent reviews have discussed a number of important methodological issues that have limited the literature examining the use of mental health services, including design-specific problems in querying about service use and in measuring utilization (Walker et al., 2004, Elhai et al., 2005). However, in addition to methodological and design issues, there are also important data analysis issues that warrant consideration. The current article aims to briefly present the problems inherent in analyzing data on mental health service use and costs, and discusses in non-technical terms several alternative statistical methods that represent the state of the art in handling such data, with an empirical comparison of the performance of these methods.
Section snippets
Complexities in mental health service use and cost data
Mental health services researchers often gather data on the intensity of services used by participants (typically in the form of visit counts), and sometimes the resulting costs incurred (in dollars). Such data are most often gathered over a recent time period (e.g., past 12 months), since research demonstrates that subjects' recall accuracy substantially decreases when estimating visit counts over longer time frames (Roberts et al., 1996). Medical chart reviews also tend to focus on short time
Data transformations
Perhaps as a result of these data problems, the actual analyses presented in mental health service use studies most often involve logistic regression, by reducing visit counts and costs to dichotomous categories (e.g., “use”/“non-use”; “0–9 visits”/“10 or more visits”; “above”/“below median costs”). While this approach may seem to solve the problems discussed above, because logistic regression does not have the same restrictive assumptions that linear regression has, new problems are
Count regression models
When analyzing predictors of such skewed service use and costs data, the best solution is to use a non-linear, count regression model. Such models require that the dependent variable is a non-negative integer, and as in ordinary linear regression, the predictor variables must be either continuously-scaled, binary-coded or a mixture. Count models use maximum likelihood procedures, and implement transformations to make the non-linear count dependent variable linear. Count models are specific
Decisions in analyzing count regression models
In Fig. 1, we present a flowchart to assist the reader in selecting the most appropriate regression model, given characteristics of the dependent variable.
At the time of this writing, two statistical packages include standard modules for Poisson, negative binomial, and the zero-inflated and zero-truncated methods: Stata (Statacorp, 2005) and LIMDEP (Econometric Software, 2002). Gauss (Aptech Systems Inc., 2005) offers (but does not include as standard) a Maximum Likelihood application which
Applying the models to a dataset of mental health visit counts
Recently, we examined mental health treatment use intensity among 186 Midwestern U.S. primary care patients (Elhai et al., 2006b). We assessed the relationship of gender, attitudes toward mental health treatment, violent-crime and non-crime trauma frequency (log-transformed due to substantial skewness), and a probable posttraumatic stress disorder (PTSD) diagnosis with self-reported mental health visit counts from the past 6 months. We now present a comparison of the above-mentioned statistical
Conclusions
This paper presented a review of the data analysis problems that are inherent when analyzing mental health service use data. Several solutions were presented, including Poisson and negative binomial, zero-inflated, and zero-truncated regression models. Quite different results were observed when alternative statistical solutions were used to handle a typical dataset with mental health service use as the outcome variable. The results demonstrate the potential danger of using analytic methods
References (39)
- et al.
Mental health service use among American Red Cross disaster workers responding to the September 11, 2001 U.S. terrorist attacks
Psychiatry Research
(2006) - et al.
Overdispersion tests for truncated Poisson regression models
Journal of Econometrics
(1992) - et al.
Sociodemographic, clinical, and attitudinal characteristics of the untreated depressed in Ontario
Journal of Affective Disorders
(1999) - et al.
Comparison of self-reported and medical record health care utilization measures
Journal of Clinical Epidemiology
(1996) - et al.
Assessing population need for mental health care: a review of approaches and predictors
Mental Health Services Research
(2004) Gauss
(2005)Predicting the use of outpatient mental health services: do modeling approaches make a difference
Inquiry
(2002)- et al.
Help-seeking for psychiatric disorders
Canadian Journal of Psychiatry
(1997) - et al.
Barriers to reducing burden of affective disorders
Mental Health Services Research
(2002) - et al.
Medical service utilization by veterans seeking help for posttraumatic stress disorder
American Journal of Psychiatry
(2002)