Appendix A. Information about Multiple Imputation
The TRAILS dataset of 2229 pre-adolescents dropped to 1065 due to the use of a peer subsample and further decreased to 701 because of attrition over 10 years, leading to 68.9% missing data. The 701 participants who were included in the peer subsample and participated in the fifth wave, differed from other TRAILS respondents (N = 2229 – 701 = 1528) on several characteristics at T1. The participants included in the subsample were more often girls (59.9%) compared with the participants not included in the subsample ((46.5%) χ2(1) = 34.43, p < 0.001); they had higher socioeconomic backgrounds (M = 0.2, SD = 0.74) compared with the participants not included (M = –0.16, SD = 0.8) t(1447.6) = –10.54, p < 0.001); they had higher WISC scores (M = 102.35, SD = 13.28) than the participants not included (M = 94.81, SD = 15.15), t(1537.2) = –11.89, p < 0.001); and they achieved better academically (M = 3.91, SD = 0.8) compared with the participants not included in the subsample (M = 3.48, SD = 0.9), t(1327.7) = –10.39, p < 0.001). The variables peer acceptance, peer rejection and educational attainment mainly had missing values.
Missing data were handled by using Multivariate Imputation by Chained Equations (MICE) in R (Van Buuren,
2018). MICE creates multiple completed datasets by replacing the missing values with estimated values using the selected method of imputation. Each imputed dataset is analyzed separately, and the analyses results are then pooled to obtain one end result. The pooling is based on Rubin’s rules, taking into account the number of imputations and the increased variances caused by the missing data and the imputation (Van Buuren,
2018).
To optimally impute the data, accurate imputation models were chosen, one for each variable separately. The choice of model is based on two important considerations: (1) the predictor variables to be included in the imputation model, and (2) the nature of the variable to be imputed.
With respect to the first consideration, all imputation models included all variables that were used in data analyses on the imputed data, including interactions. Further, the models contained variables that were related to the non-response or to the variables with missing values (Van Buuren,
2018). The imputation models included available variables related to variables with missing values with a correlation of 0.3 or higher, namely: peer acceptance at T2, peer rejection at T2, academic achievement at T1, school advice at T1 and educational attainment at T2, T3, T4 and T6.
With respect to the second aspect, imputation models were chosen to fit the nature of the variables. That is, categorical variables were imputed using logistic regression models and continuous variables with predicted mean matching (Van Buuren,
2018). Interactions were included as predictors but were not allowed to be imputed themselves, instead they were recomputed using the imputed main variables as soon as these were imputed. Moreover, interactions were not used to impute the two variables included in the interaction.
Generally, it is advised to impute variables using the raw scores rather than the transformed scales. However, the scale of parental acceptance consists of 36 items (18 for mother and 18 for father) and the scale of parental rejection consists of 24 items (12 for mother and 12 for father). Because respondents with missing data on one of these items generally had missing data on most of these items, items within scales cannot predict each other. Therefore, the transformed scale scores (mean scores) were used instead of the original item scores. Furthermore, SES was added as a scale to the imputation model because there were no missing values for SES and because parental occupation is problematic to impute as not being a numerical variable.
For imputation, the variables peer acceptance and peer rejection were transformed using a logit transformation to correct for skewness. The variable peer acceptance consists of the percentage nominations of best friends, ranging from 0 to 1 with a mean of 0.3. The variable peer rejection consists of the percentage nominations of dislikes, ranging from 0 to 1 with a mean of 0.1. Because the logit can only be calculated for scores over 0, the value 0 was transformed into 0.005. After imputation the imputed variables were transformed back to their original values to be used in the data analyses.
Although the default number of imputations is 5, it is advised to impute with a number being approximately equal to the percentage of missing data. However, increasing the number of imputations usually does not change the conclusions from the pooled estimates (Van Buuren,
2018). Therefore, with overall 68.6% missing data we imputed 50 times.
The 50 multiply imputed datasets were imported in SPSS. In SPSS, the logit transformation for the peer variables were reversed. The interaction variables used in the imputations were discarded and new interactions were computed using variables centered on the pooled means. Each imputed dataset was analyzed separately and estimates were pooled by SPSS. Estimates not pooled automatically by SPSS (ANOVA
F tests,
χ2 tests, SDs, ORs, and model fit indices) were pooled by exporting SPSS results to R and using pooling functions. In the pooling function, degrees of freedom (by default based on the number of imputations) were adjusted for sample size. Table
4 shows the descriptive statistics for the incomplete and the multiply imputed sample. The largest change in descriptive statistics after multiple imputation occurred at the outcome variable. Whereas in the incomplete sample 9.6% of the early adults reached the level of lower vocational education, 24.8% higher vocational education, 41.5% University of Applied Sciences and 24.1% reached university, in the multiply imputed sample 19% of the early adults reached the level of lower vocational education, 25% higher vocational education, 37.2% University of Applied Sciences and 18.8% reached university.
Table
4Table 4
Descriptive statistics for incomplete and multiply imputed sample
WISC score | 97.19 (15.00) | 45 | 149 | 2220 | 97.16 (15.00) | 45 | 149 | 2229 |
Socioeconomic status | –0.05 (0.80) | –1.94 | 1.73 | 2187 | –0.05 (0.80) | –1.94 | 1.73 | 2229 |
Parental acceptance | 3.21 (0.50) | 1.17 | 4.00 | 2206 | 3.21 (0.50) | 1.17 | 4.00 | 2229 |
Parental rejection | 1.48 (0.31) | 1.00 | 3.47 | 2205 | 1.48 (0.31) | 1.00 | 3.47 | 2229 |
Peer acceptance | 0.29 (0.16) | 0.00 | 0.80 | 1064 | 0.28 (0.16) | 0.00 | 0.80 | 2229 |
Peer rejection | 0.13 (0.13) | 0.00 | 0.85 | 1064 | 0.13 (0.14) | 0.00 | 0.85 | 2229 |
Gender | 50.7% girls | | | 2229 | 50.7% girls | | | 2229 |
| 49.3% boys | | | | 49.3% boys | | | |
Educational attainment | 9.6% Lower vocational education | | | 1430 | 19.0% Lower vocational education | | | 2229 |
| 24.8% Higher vocational education | | | | 25.0% Higher vocational education | | | |
| 41.5% University of Applied Sciences | | | | 37.2% University of Applied Sciences | | | |
| 24.1% University | | | | 18.8% University | | | |