Introduction
Mentalizing refers to the capacity to understand the self and others in terms of intentional mental states such as feelings, desires, attitudes and goals (Fonagy and Target,
2006). Mentalizing difficulties are an established correlate of eating disorders across the age range and may represent a target for prevention and treatment (Jewell et al.,
2016; Simonsen et al.,
2020). However, the precise mechanisms by which mentalizing and eating disorders are related is not yet clear, and this task is made harder by several key issues. Firstly, mentalizing is a complex, multidimensional concept (Luyten et al.,
2020), and eating disorders themselves are heterogeneous, encompassing a broad spectrum of symptomology across different disorders (American Psychiatric Association,
2013). Moreover, the ability of the field to make progress is constrained by the availability of adequate measurement tools, and the validity of mentalizing measures in eating disorder populations is low (Bizzi et al.,
2023).
At the level of theory, the ability to mentalize is an evolutionarily prewired capacity which requires considerable environmental input to develop fully (Luyten et al.,
2020). Initial iterations of the theory emphasised early attachment experiences as the crucible in which a sense of self and emotional agency develop through mentalizing (Fonagy et al.,
2002; Fonagy and Bateman,
2019), initially via ‘marked mirroring’ exchanges with the primary caregiver (Gergely & Watson,
1996). Parents who are able to mentalize their child’s internal mental states, and convey these to the child through contingent and marked affective displays, were assumed to help their child to develop secure attachment representations, mentalizing ability and adaptive emotional regulation (Luyten et al.,
2020). By contrast, consistently mis-attuned caregiving was theorised to lead to mentalizing difficulties, which then served as a vulnerability for psychopathology, including eating disorders, when faced with later developmental challenges (Fonagy et al.,
2002). This core theory was adapted for eating disorders by Robinson et al. (
2019) to include genetic predisposition, attachment disturbance and mentalizing dysfunction in a proposed aetiological pathway. However, evidence of an association between early attachment security and later eating pathology is lacking (Jewell et al.,
2023a). More recent iterations of mentalizing theory have de-emphasized the role of early attachment (Luyten et al.,
2020), and there has been a focus on the role of mentalizing as a transdiagnostic factor that may relate to outcomes of psychotherapy (Luyten et al.,
2024).
Increasing evidence suggests that
reflective function (RF), which refers to the ability to mentalize in the context of attachment relationships (Fonagy et al.,
1991), may indeed play a role in the process and outcome of eating disorders treatment. The relevance of attachment to mentalizing relates to the historic theoretical emphasis on mentalizing as a skill learnt in the context of attachment relationships (Fonagy et al.,
2002), as well as the initial work to develop measures of RF based on attachment interview transcripts (Fonagy et al.,
1991). Both Katznelson et al. (
2020) and Jewell et al. (
2023b) found baseline RF to predict later therapeutic alliance scores in adult and adolescent eating disorders samples respectively, whilst RF has also predicted treatment outcome in adults with eating disorders (Kuipers et al.,
2017). However, the assessment of RF through available observer-rated measures such as the RF Scale (Fonagy et al.,
1998) applied to the Adult Attachment Interview (George et al.,
1985) or transcripts of therapy sessions (e.g. Talia et al.,
2019) is time-consuming, reducing the feasibility of evaluating mentalization in either routine clinical practice or research settings such as large-scale epidemiological studies (Fonagy et al.,
2016). Consequently, there has been a need to develop valid and reliable self-report measures of RF.
The Reflective Function Questionnaire (RFQ) (Fonagy et al.,
2016) was developed to fill this need. In devising the measure, Fonagy et al. (
2016) initially developed a scoring method in which some items were scored on a Likert scale in which the highest scores were at the extreme end of the scale (
polar method) whereas other items scored highest for items in the middle of the scale (
median method). This approach was taken since, depending on item wording, higher RF might sometimes be reflected in responses in the middle of the Likert scale. This scoring method was used in the first published evaluation of the RFQ, the 46-item RFQ for Youth (RFQ-Y) (Ha et al.,
2013), in which half the items were scored using the median method (Scale A) and half by the polar method (Scale B). However, the use of a median scoring approach is psychometrically sub-optimal and was removed from the eight-item adult RFQ8 in its final published version (Fonagy et al.,
2016). Meanwhile, to improve the psychometric properties of the RFQ-Y, Sharp et al. (
2022) refined the measure using the polar-scored items from Scale B, resulting in the five-item RFQY-5, demonstrating adequate fit for a unidimensional factor structure, good internal consistency and construct validity. However, whilst the RFQY-5 was validated in an adolescent psychiatric sample it has yet to be evaluated in an eating disorder sample. Such an evaluation is particularly important given that the starvation state in restrictive eating disorders is known to impact on theory of mind (Bora & Köse,
2016), and may therefore impact RF. Moreover, as yet there has been no evaluation of the sensitivity to change of the RFQY-5.
In summary, there is a need for a brief, psychometrically-sound self-report measure of adolescent RF to be evaluated in an eating disorder sample. The present study therefore assesses the psychometric properties of the RFQY-5 (Sharp et al.,
2022) in a secondary analysis of data derived from a prospective observational study of family therapy treatment in adolescents with restrictive eating disorders (Jewell et al.,
2023b).
Hypotheses
We tested the following hypotheses:
1)
Structural validity: confirmatory factor analysis will demonstrate a unidimensional structure of the RFQY-5 within established parameters of model fit (see Methods for details).
2)
Internal consistency: The RFQY-5 will demonstrate acceptable internal consistency with Cronbach’s alpha and McDonald’s omega values > 0.7.
3)
Construct validity: The RFQY-5 will be significantly, and negatively, correlated with specific scales of the Difficulties in Emotion Regulation Scale (DERS) (Gratz & Roemer,
2004), as found in the Sharp et al. (
2022) study: specifically, we predict medium-sized correlations (Pearson’s
r or Spearman’s
rho around 0.3) for the Impulse Control Difficulties and Lack of Emotional Clarity, with a large correlation (
r or
rho > 0.5) with the Lack of Emotional Awareness scale. This was hypothesised since the RFQY-5 item content has an overlap with these scales, with some highly similar items contained in the Lack of Emotional Awareness scale. In addition, reflective functioning and emotional regulation are theoretically related concepts (Luyten et al.,
2020), with evidence of an association between them across clinical and non-clinical samples (e.g. Sharp et al.,
2011; Schwarzer et al.,
2021).
4)
Sensitivity to change: we predict significant increases in RF between baseline (T1) scores and nine months later (T2) assessed using a paired t-test. Whilst improving RF is not a specific treatment target of family therapy for eating disorders, we hypothesised that the impact of starvation (such as psychological rigidity and reduced theory of mind) would lead to lower RF at baseline; by contrast, we hypothesised that weight restoration plus psychological progress towards eating disorder recovery would lead to increased RF at nine months.
Method
Data for this study were drawn from Jewell et al.’s (
2023b) study in which 192 adolescents were recruited through consecutive sampling into a study of predictors of outcome in family therapy within specialist community eating disorder treatment centers. Adolescent participants were aged between 10 and 17 years and were diagnosed by a clinician with anorexia nervosa (AN) or a restrictive Other Specified Feeding or Eating Disorder (OSFED) using DSM-5 criteria (American Psychiatric Association,
2013). Recruitment took place across three specialist community eating disorder centers in England. Participants had a mean age of 14.7 years, were 88.4% female, and 79.8% were of White British ethnicity. Family therapy for adolescent anorexia nervosa (FT-AN) was offered to families as part of routine treatment in line with national guidelines recommending this as the first-line treatment for AN and similar presentations in adolescence (National Institute for Health and Care Excellence,
2017). Participants were recruited at the start of their family therapy treatment and completed a range of self-report measures of attachment, mentalization and emotion regulation difficulties. Full details on inclusion criteria, ethical approval, original sample size calculation and recruitment (including flowchart) are included in the original paper. In brief, the study received ethical approval, and all participants provided written consent. Participants could withdraw from the study at any time without providing a reason.
The final sample for the present study comprises 171 adolescents for whom Time 1 data were available, out of 192 participants recruited to the Jewell et al. (
2023b) study. Exclusions were as follows: 15 adolescents did not complete any measures for the study; two participants withdrew their consent; four were recruited to the study in error without meeting inclusion criteria.
For the present study, we consulted the latest COSMIN guidelines (Mokkink et al.,
2019) on sample sizes for studies evaluating psychometric properties. COSMIN criteria for ‘very good’ design were satisfied by a sample > 100 for structural validity of a five-item measure, internal consistency and construct validity, and by a sample > 50 for sensitivity to change. To maximize the available sample size, our analyses included both male and female participants, and these findings are reported in the main body of this paper. However, given our largely female sample, as a secondary analysis we also repeated all analyses on a female-only sample and report these findings as an appendix.
Data Analytic Strategy
Primary analyses were conducted in SPSS version 28.0 with Hayes’ omega macro to calculate omega (Hayes & Coutts,
2020). Factor analyses were conducted in MPlus version 8.6 (Muthén & Muthén,
2017). The best fitting model was determined by examining the following fit indices: the root-mean-square error of approximation (RMSEA), with values of less than 0.08 indicating reasonable fit and values above 0.10 suggesting poor fit (Browne & Cudeck,
1993); the comparative fit index (CFI; Bentler,
1990), with values between 0.95 and 1.00 indicating excellent fit and values between 0.90 and 0.95 indicating acceptable fit (Hu & Bentler,
1999); and the standardized root-mean-square residual (SRMR), with values less than 0.08 indicating acceptable fit (Hu & Bentler,
1999). Results of chi-square tests are reported; however, the chi-square test is sensitive to sample size, so these results were not given as much weight as the fit indices listed above (Fan et al.,
1999). Sensitivity to change was assessed using paired sample t-tests. The normality of data and presence of outliers were inspected visually using histograms and Q-Q plots, as well as through checking of values for skewness and kurtosis. Values for skewness and kurtosis for the RFQY-5 at Time 1 and Time 5 were within acceptable limits for normality. Values for two subscales of the DERS were outside of acceptable limits, therefore non-parametric tests were used for correlations.
Discussion
Our findings point to the RFQY-5 having inadequate psychometric properties in adolescents with restrictive eating disorders. Whilst convergent validity with two scales from the DERS (Gratz & Roemer,
2004) was demonstrated, we found the structural validity and internal reliability to be below accepted threshold values. In addition, the RFQY-5 was not sensitive to change between baseline and nine months. This begs the question: what explains this sub-optimal performance in adolescents with eating disorders?
We suggest that our findings be considered through the lens of ‘
trait vs. state’. The internal reliability of the RFQY-5 in our sample (alpha = 0.63) was lower than the satisfactory value reported by Sharp et al. (
2022) (alpha = 0.75) even taking into account adjusted internal reliability standards for very short measures (Vaske et al., 2017). Our table of inter-item correlations (Table
2) identifies Item 4 (“I’m often curious about the meaning behind others’ actions”) as performing poorly, since it correlates significantly with only one other RFQ-5 item. In terms of
states, it is possible that the impact of starvation could be at play. Bora and Köse (
2016) found meta-analytic evidence of impaired theory of mind in individuals with anorexia nervosa (AN), with larger effect sizes for participants with acute compared with recovered AN. It is thus possible that adolescents who are acutely unwell report lower curiosity about others’ mental states, and this might change with physical recovery. However, by nine months, there was no significant change in scores on the RFQY-5, raising two possibilities – firstly, that nine months is too short an interval for changes in mentalizing to be apparent on a self-reported measure; secondly, that the RFQY-5 items are insensitive to change and might capture
trait-like aspects of adolescent mentalizing. Relatedly, one must consider that rates of autism are raised in adolescent AN (Westwood et al.,
2018) and this might also influence measurement variance of the RFQY-5 in this population. Finally, the RFQY-5 is regarded as an evaluation of a person’s capacity for mentalizing other minds (Sharp et al.,
2022). It is possible that the measurement variance identified in this study is explained by differences in mentalizing of others’ minds in adolescents with AN relative to their peers. Future studies could utilise case-control designs to investigate both mentalizing of the self as well as of the other, to better understand the extent to which mentalizing within this population may differ from controls.
Our study has several strengths and limitations which should be borne in mind. Our sample is representative of UK clinic-attending adolescents with restrictive eating disorders, and the sample size is sufficient to undertake our analyses. However, the majority of participants were female and White British, precluding conclusions about how the RFQY-5 might perform in other groups or indeed nationalities. In addition, the five items were selected off the 46-item data which is not ideal. A future validation study should administer only the five items given known effects of the impact items have on each other when administered together.
An important implication of our study is that clinicians and researchers seeking a psychometrically robust self-report measure of reflective function in adolescents with eating disorders currently do not have strong evidence for the validity and reliability of any measure. One option is to utilise the full-length RFQ-Y (Ha et al.,
2013), which demonstrated acceptable internal reliability in this population, as reported in Jewell et al. (
2023b). In this present study, our focus was on evaluating the psychometric properties of the 5-item version due to its brevity, and we were unable to do a full psychometric evaluation of the 46 items since standards recommend a larger sample size than we have available for a measure of that length. However, Sharp et al. (
2022) have highlighted some of the shortcomings of the 46-item RFQ-Y, such as its sub-optimal scoring method.
Our findings highlight the need to develop valid and reliable measures of mentalizing in adolescents and adults with eating disorders. Existing measures which could be investigated in this population include the Certainty About Mental States Questionnaire (Müller et al.,
2023) and the Mentalization Scale (Dimitrijević et al.,
2018). The RFQ8 (Fonagy et al.,
2016) represents another potential choice, but the current scoring method, in which several items contribute to two different scales, is not ideal, and its factor structure is unclear, with recent research suggesting a unidimensional factor structure, rather than the two-factor structure reported initially (Müller et al.,
2022; Woźniak-Prus et al.,
2022). We also recommend that clinicians and researchers carefully consider the item content of available measures when choosing instruments and interpreting results. The item coverage provided by a self-report measure such as the RFQ in any of its forms is unlikely to capture differences in mentalizing ‘modes’ (Fonagy et al.,
2002; Luyten et al.,
2020) that have been seen as clinically relevant in eating disorders (Robinson et al.,
2019). For the adolescent eating disorders field, recent qualitative data supports the idea that mentalizing changes are important in the recovery process for adolescents with AN (Baudinet et al.,
2023). Further investigation of the role of mentalizing in recovery via quantitative means will require adequate measurement tools that are sensitive to change.
In summary, there is a need for brief, psychometrically robust measures of reflective function in eating disorder samples. Our study suggests that the RFQY-5 does not perform adequately in this population, despite having performed well in a prior study by Sharp et al. (
2022) that included adolescent community and psychiatric inpatient samples. Future validation studies should include not only self-report but also task-based and observer-rated measures that can assess mentalizing in ecologically valid ways, as well as capturing the more nuanced aspects of mentalizing that may be seen as clinically important. The development of new measures should involve those with lived experience of eating disorders to ensure that tasks and items are salient. Validation studies should recruit diverse samples and assess measurement invariance across cultures, ethnicities, and genders, as well as in autistic individuals.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.