Introduction

Mental health services and addiction research often rely on longitudinal studies to evaluate intervention effects. Some of these studies have long duration with many assessments (i.e., more than 10 repeated measurements) (Harrison et al. 2001; Hartman et al. 2008). Recently, with the rapid development of technology-based behavioral health interventions, technological tools, such as mobile devices (e.g., wearable sensors, smartphones) capture real-world and real-time assessments (e.g., ecological momentary assessment) data with abundant observations (Marsch 2012; Ramsey 2015). These types of data (data from long-duration studies, and data intensively collected using mobile devices), termed intensive longitudinal data (Walls et al. 2006), may exhibit complex patterns of change (e.g., highly fluctuating with irregular ups and downs), and the relationships between the outcome variable and covariates may change over time.Footnote 1

Intensive longitudinal data require a new modeling approach. Traditional statistical methods, such as mixed-effects, multilevel, or growth curve models, are well suited for reasonably short longitudinal studies but may not be adequate for intensive longitudinal data. Traditional methods using parametric models may not fully capture a complex pattern of change over time. Analysts must impose a growth pattern or shape of change using limited parametric forms (i.e., polynomials of different degrees in time such as linear, quadratic, or cubic) to approximate the observed shape of the growth trajectory or pattern of change. But many courses of change with irregular patterns cannot be well approximated by these polynomials (Fitzmaurice et al. 2011). Moreover, not only the trajectory shape of an outcome and the predictors may change over time, but also the relationship between them, that is, the effect of a predictor on outcome, may change or evolve over time. The parametric model with pre-specified forms, such as curvilinear trend and/or interactive models with time, may be able to capture some of this changing association, but they may not be flexible enough to reflect the whole picture of this dynamic and time-varying association, and thus result in a poorly specified model of change. Recently, a new approach, the time-varying effect model (TVEM) (Hastie and Tibshirani 1993; Tan et al. 2012; Yang et al. 2012) and corresponding software, TVEM SAS macro (Yang et al. 2012) have been developed to address these challenges. This approach is more suitable to model both more frequently collected data (high intensity) and data from a long duration study with many assessment points. The purpose of this paper is to introduce TVEM to mental health services and addiction researchers and to illustrate its application using data from a longitudinal study with long duration: New Hampshire Dual Diagnosis Study with 16-year follow-up and 20 assessment points.

Conceptual Overview

The time-varying effect model (TVEM) is a semi-parametric regression model. Other non-parametric or semi-parametric models (e.g., locally weighted scatterplot smoothing, thin-plate smoothing splines, and generalized additive models) (Hastie and Tibshirani 1993) can be used to fit curves or non-linear relationships; nevertheless, none of these approaches is as general as the time-varying effect model (TVEM). For the simple TVEM model with a continuous outcome variable, yij, and a single time-varying covariate, xij, the model can be written as (Tan et al. 2012):

$$y_{ij} = \beta_{0} \left( {t_{ij} } \right) + \beta_{1} \left( {t_{ij} } \right)x_{ij} + \varepsilon_{ij}$$

where i = 1, …, n denotes the number of subjects and j = 1, …, m denote the number of assessments over time. \(\beta_{0} \left( {t_{ij} } \right)\) and \(\beta_{1} \left( {t_{ij} } \right)\) are the intercept and the slope functions, which are time-specific and may have different values at different points in time. TVEM differs from a standard regression model in two ways: (1) \(\beta_{0} \left( {t_{ij} } \right)\) and \(\beta_{1} \left( {t_{ij} } \right)\) are functions, not fixed regression coefficients, and they are allowed to change as a function of time. (2) TVEM does not force a functional relationship between any covariate and outcome into a pre-specified form; rather, it allows the shape of the relationship to take nearly any form, as revealed directly from the data. Generalized TVEM also accommodates binary, ordinal, count, and zero-inflated count outcomes. Multiple covariates, both time-varying (e.g., work status) and time-constant (e.g., gender), can be included in the model. The effect of both time-varying and time-constant covariates on outcome can be treated as either static (fixed) or dynamic (time-specific) depending on model fit. The time-specific effect is displayed graphically as confidence intervals, rather than as p-values, across the time continuum, and the time-static effect is tested with a single p value.

TVEM can be estimated using a range of non-parametric techniques, which can be broadly divided into two main strands: kernel-based methods and spline-based methods (Fitzmaurice et al. 2011). The SAS macro suites for TVEM implemented spline-based methods for model estimation. The mathematics underpinning spline-based methods is complex (Fitzmaurice et al. 2011; Keele 2008; Ruppert et al. 2003; Tan et al. 2012), but the idea of spline-based regression is not so complex. These models can be viewed as piecewise regressions with many pieces, defined by an appropriate number of knots (sometimes called a broken-stick). The model is fitted with a series of piecewise polynomials (usually cubic) for each small interval separated by knots. The key to these techniques is balancing the trade-off between goodness-of-fit (better approximation) and smoothness (more parsimonious or less fluctuation) of the fitted regression coefficient functions. A model with too many knots may fit a curve well, but be too fluctuating and result in an over-fitting; a model with too few knots may have sufficient smoothness but a poor fit.

Of many possible spline-based methods, the SAS macro suites for TVEM implemented basis-spline (B-spline) and penalized spline (P-spline) (Tan et al. 2012). B-spline approaches balance the trade-off between smoothness and the fit of the model by determining an optimal number of knots and the locations of knots in terms of model fit indices. The P-spline-based approach in TVEM estimates the regression functions using a truncated power basis and balances trade-off by adding a roughness penalty term in the model for over-fitting, thus the name penalized spline (P-spline) model. Specifically, in the P-spline regression model, variations, nonlinearity, or roughness in functional relationship between a predictor and an outcome are defined by coefficients of the piecewise polynomial functions. Instead of choosing optimal numbers and locations of knots, the P-spline approach allows a relatively large number of knots, but constrains their influence by shrinking the coefficients of the piecewise polynomial functions toward zero with an added roughness penalty term. The penalty term is governed by a smoothing parameter, which insures that the fitted curve is sufficiently smooth but not over-fitted. When the error term, ei, in the spline model is assumed to have a normal distribution, the P-spline regression model can be represented as a linear mixed-effects model.Footnote 2 The smoothing parameter in the P-spline model and the two variance components (variance of the residuals and variance of random effect) in the mixed-effects model has an intrinsic relationship. Restricted maximum likelihood (REML) estimation of these two variance components in the mixed-effects model would automatically optimize these two variance components, thus creating a smoothing parameter for the spline model (Fitzmaurice et al. 2011; Ruppert et al. 2003; Tan et al. 2012). An important practical implication of this conversion is that the analyst can use existing software (e.g. SAS PROC MIXED, or SAS PROC GLIMMIX) to estimate this rather complex P-spline model. In fact, the P-spline-based method is very popular due to its connection with mixed-effects models with an automatic smoothing property.

Application of TVEM

Description of Study and Data

Data for this illustrative study are from the New Hampshire Dual Diagnosis Study, a prospective 16-year study of clients with severe mental illnesses and co-occurring substance use disorders (Drake et al. 1998). Participants were 223 clients from seven of New Hampshire’s 10 community mental health centers recruited between 1989 and 1992. The first 3 years involved a randomized controlled trial of case management models for co-occurring psychiatric and substance use disorders. Participants were subsequently released from their experimental conditions and followed naturalistically through the end of 16 years. One hundred twelve participants remained in the study at 16 years. Assessments were conducted at baseline, every 6 months for the first 3 years and yearly thereafter for a total of 20 data points. The outcome for this analysis was the general life satisfaction rating score (ranging from 1 to 7) from the Quality of Life Inventory (Lehman 1988). The time-varying predictor was work status (working or not) at each assessment point, and the time-constant covariate was psychiatric diagnosis at baseline (bipolar vs. schizophrenia and schizoaffective disorders). Because we chose these variables to illustrate the method, not the content, we did not consider other background covariates.

Analytic Approach

First, we inspected the observed temporal pattern of the relationship over 16 years between average general life satisfaction score and work status (Fig. 1, top) and diagnosis status (Fig. 1, bottom). Second, for comparison, we fitted a parametric linear mixed-effects model to the data. Finally, we applied TVEM to the data using the TVEM SAS macro suites (Yang et al. 2012).

Fig. 1
figure 1

Observed longitudinal association between general life satisfaction and work status over 16 years (top, 20 assessment points), and between general life satisfaction and diagnosis over 16 years (bottom, 20 assessment points)

Results

The observed relationship between general life satisfaction and work status (Fig. 1, top) shows that work was positively associated with general life satisfaction at most times over 16 years; but at a few assessment points (around the middle of the study period from 7 to 11 years), the two variables were either not associated or even negatively associated (but non-significant based on a point correlation analysis). Thus, the observed effect of work status on general life satisfaction was not constant, whereas the relationship between general life satisfaction and baseline diagnosis (Fig. 1, bottom) was essentially constant over time (except at one assessment point).

To model these temporal relationships, we started by fitting a series of conventional linear mixed-effects models with different degrees of polynomials in time and interactions between work status, diagnosis and time. Based on the best model fit (likelihood ratio test, AIC and BIC in Table 1), our final mixed-effects model included the intercept, work status (time-varying), diagnosis (static), and the linear time trend as fixed effects, and the intercept and linear time trend as random effects. The results showed that work was marginally associated with better general life satisfaction (slope = 0.09 and p = 0.0560). Bipolar diagnosis (in comparison with schizophrenia spectrum) was negatively associated with general life satisfaction (slope = −0.33 and p = 0.0355). The effect of the linear time trend was also positive and significant (slope = 0.03 and p < 0.001) (Table 1).

Table 1 The relationship between both time-invarying (diagnosis) and time-varying (working status) covariates and outcome (general life satisfaction): results of the mixed-effects linear model

To re-analyze the data using TVEM, we used both B-spline and P-spline for model estimation. The two methods yielded similar results. B-spline was numerically more stable in computation (Yang et al. 2012); P-spline was easy to use due to its automatic smoothing property with a roughness penalty, thus avoiding a decision regarding the appropriate number of knots. We fitted the model using the B-spline method first. Any statistical model specification should be guided by both theory and the model fit. In this study, the theoretical reason for treating the effect of the predictor/covariate as either time-constant or time-varying wasn’t clear. Because the plots in Fig. 1 suggested that the effect of diagnosis was essentially constant, the effect of work status changed over time, and the intercept (or average score of general life satisfaction measured at each assessment point) changed over time, we specified the effect of diagnosis as time-constant and the intercept and the effect of work status as time-varying in our initial model. For time-varying effects (intercept and slope function of work), we needed to specify the number of knots (assuming knots were equally spaced along the time-continuum) and to choose the optimal number of knots in terms of model fit. We started with zero knots for both the intercept and the slope function of work status, and increased the number of knots until we got the best model fit. Our optimal model used zero knots for the intercept and two knots for the slope function of work status. It yielded model fit indices of AIC = 1750.03 and BIC = 1816.07 (the smallest, or best fits, among a series of competing models).

After we chose the optimal numbers of knots for the intercept and the slope function of work, and to insure that the effect of diagnosis was indeed time-constant as the plot suggested, we refitted the model by allowing the effect of diagnosis to be time-varying. The model fit index, AIC = 1749.16, was essentially the same, but the BIC = 1839.21 was much worse.Footnote 3 Therefore, using the B-spline method, we determined that the model with a constant effect of diagnosis, a time-varying intercept, and a time-varying effect of work status on general life satisfaction was our best model.

We then refitted the model using P-spline. With P-spline, because we did not have to identify the best-fitted model by changing the number of knots each time, we treated the effect of diagnosis as time-constant and chose five knots for both the intercept and the slope function of work status, which yielded similar results to our B-spline analysis. Thus, our final model was:

$${\text{Gen}} . {\text{ Life satisfaction}}_{ij} = \beta_{0} \left( {t_{ij} } \right) + \beta_{1} \left( {t_{ij} } \right){\text{work status}}_{ij} + \beta_{2} {\text{diagnosis}}_{i} + \varepsilon_{ij}$$

This is a semi-parametric model. The effect of diagnosis was treated as a constant and estimated parametrically. The effect of work status was treated as time-specific and estimated non-parametrically.

The parametric effect is reported as usual, with the estimated parameter and usual significance testing. The non-parametric effect is reported graphically with confidence intervals. The effect of diagnosis (bipolar = 1) was significant (slope = −0.34, p < 0.001) and consistent with the linear mixed-effects model (slope = −0.33, p < 0.0355). The effect of work status is shown graphically with confidence intervals at the bottom of Fig. 2. Unlike the result from the linear mixed model, the effect of work status was not constant: Being employed was significantly associated with improvement of general life satisfaction in the early years (prior to year 6) and the late years (year 11 to 16) of the study. However, work status was not related to general life satisfaction in the middle years (year 6 to year 11), as indicated by confidence intervals containing the zero line. The coefficient curve at the top of Fig. 2 is for the estimated time-dependent intercept, showing that average general life satisfaction gradually improved over years, analogous to a polynomial time-effect in the linear mixed-effects model.

Fig. 2
figure 2

Estimated time-varying intercepts: average general life satisfaction (top) and time-varying slopes: changing in relationship between general life satisfaction and work status (bottom) with 95 % confidence interval over 16 years (20 assessment points)

Discussion

Longitudinal data with long duration and high frequency of measurement are becoming common in mental health services and addiction research. These data may have complex (irregular) patterns of change over time, and the relationships between the variables may also vary over time. Existing methods may not adequately capture this complexity. TVEM fills this gap by not only permitting nearly any shape of change but also by allowing the effect of predictors to be time-varying. By applying TVEM to the New Hampshire Dual Diagnosis 16-year follow-up study for participants with severe mental illness and substance abuse, we discovered that the effect of work status on general life satisfaction was positive overall but not constant. It changed in both magnitude and direction over 16 years. Using conventional mixed-effects models, we could only conclude that being employed was marginally associated with greater general life satisfaction over 16 years. In fact, this claim was not true for every assessment period: in some years, the effect was positive and strong; in some years, the effect was positive but weak; and in other years, the effect was nil or even non-significantly negative. Using conventional models, we could not fully characterize this dynamic relationship.

This study has several caveats. First, TVEM is particularly suitable for intensively collected data, for example, using passive sensing devices or ecological momentary assessment data. Nevertheless, TVEM is equally applicable for data with long duration and many assessment points, because individual variables may exhibit complex and irregular patterns of change, and the relationships among variables may also change over time. Some statisticians consider more than 10 repeated measurements for participants as intensive longitudinal data (Tan et al. 2012; Walls et al. 2006). Our data from a 16-year follow-up study with 20 data points showed that both the trajectory shape of the outcome and the relationships among the variables changed over time. We have several current studies using mobile technology, which will provide more opportunities to model intensively collected data using TVEM.

Second, non-parametric regression approaches do not produce a single parameter estimate that summarizes the relationship between an outcome and a predictor. Instead, they produce a plot of the estimated relationship between the two variables with confidence intervals around it. Readers used to a single regression coefficient and corresponding p-value may find this disconcerting at first, but they should recognize that a single parameter cannot describe a statistical relationship that changes over time. While the absence of parameters does not allow for easy interpretation, a plot with confidence intervals often does a far better job of summarizing a statistical relationship. With a plot, the scale of both variables should be obvious, and the eye can immediately discern the strength of the relationship along with the level of statistical precision when confidence bands are included (Keele 2008).

Third, TVEM and its corresponding software have not yet incorporated random effects or correlation structures into the model. Because the error structure for intensive longitudinal data is more complex than allowed in conventional mixed-effects models, more research is needed in this area. The current version of TVEM uses robust sandwich estimation for standard errors, which implicitly accounts for correlations due to repeated measurement.

The current version of TVEM also uses all available data for every individual over time. The impact of missing data on the robustness of model estimate is, however, unknown and awaits future research (Shiyko et al. 2014; Tan et al. 2012). To verify whether or not the result of TVEM analysis was partially due to attrition over time, we repeated our analysis using 112 participants who remained in the study for 16 years, and we found a similar pattern of results for the effect of work status on general life satisfaction (the effect increases in size, decreases in size, and then increases in size again); yet the effect was no longer negative in the middle years of the study, and the confidence interval became wider. A wider confidence interval may be due to reduced sample size. The fact that the effect was no longer negative (even though it was the smallest in size) in the midway indicated that participants who dropped out of the middle years of the study might be different from the remainders in relation to the study outcome. Further investigation is needed to determine why the effect for dropouts may be different, and the reasons for their dropouts.

Fourth, to focus on key features of TVEM and key differences between TVEM and alternative approaches, and to avoid confusion over issues that would exist in both the TVEM and other approaches, we included only two covariates, a time-constant covariate (diagnosis) and a time-varying covariate (work status) to make presentation simpler. We did not control for other potential covariates and possible confounders (both time-constant and time-varying confounders). Thus the results from this analysis should not in any way be considered causal.

Fifth, TVEM uses non-parametric smoothing techniques for model estimation, which are complex mathematically. Our paper targets non-statistician practitioners in mental health services and addiction research, and consequently the presentation was conceptual. Readers interested in more technical details should consult the references cited in this paper.

Finally, TVEM is one non-parametric approach to analyze intensive longitudinal data. An alternative non-parametric approach, the penalized functional regression (Goldsmith et al. 2011) also captures longitudinal patterns of any shape and assesses time-specific relationships between variables. Other approaches, such as machine learning techniques (Hastie et al. 2009), popular among computer scientists and some statisticians, can also be applied to large data with high intensity. These techniques are, however, usually beyond the reach of social scientists and applied mental health services and addiction researchers.

Summary and Conclusion

With the rapid development of methods for electronic data capture, longitudinal data with a long time series and/or high frequency of measurements have become common in mental health services and addiction research. These data typically exhibit complex patterns of change, and the relationships between variables may also change over time. Existing statistical methods, such as mixed-effects, multilevel, and growth curve models, are not flexible enough to fully capture this complexity and its dynamics. TVEM, using non-parametric smoothing techniques, permits modeling complex shapes of change and dynamic (changing) relationships between a predictor and an outcome over time. TVEM provides an opportunity for more accurate modeling of intensive longitudinal data and should be considered whenever this type of data has been collected.