Background
The patient’s perspective is an essential parameter in outcome-related research [
1,
2]. The focus on outcomes assessment has moved beyond simple evaluation of patient satisfaction to a process designed to assist clinicians and researchers to more accurately capturing how patients perceive the impact of the disease and care on dimensions of health status [
3,
4]. Patient-Reported Outcome Measures (PROMs) have been developed to quantify patient perceptions of various health status dimensions (e.g., symptom status, physical function, mental health, social function, overall wellbeing) to help inform research and clinical practice efforts to improve healthcare quality [
3‐
5]. Therefore, PROM utilization can serve as a valuable tool for patient assessment in healthcare [
3‐
7], including following surgical treatment and throughout the continuum of care in orthopedic clinical research [
1,
6,
7].
Various foot and ankle-related multidimensional PROMs, such as the Foot Function Index (FFI), revised Foot Function Index (FFI-R), Foot and Ankle Ability Measure (FAAM), Foot and Ankle Disability Index (FADI), Foot and Ankle Outcome Score (FAOS), American Orthopedic Foot & Ankle Society scales (AOFAS), and Ankle Osteoarthritis Scale (AOS) have been commonly used in research and practice [
1,
7]. The FFI has been used across various foot-related pathological conditions and has broad appeal to clinicians and research scientists [
7‐
10], which has been supported by its translation into several languages (i.e., Korean [
11], Danish [
12], Italian [
13], Brazil [
14], Dutch [
8], Spanish [
15], German [
16], and French [
17]). Previous FFI research on the measurement properties of the scale have used Classical Test Theory (CTT) approaches [
8,
18] and Rasch analysis [
19,
20]. Additional psychometric evaluation identified a need to include additional items (i.e., psychosocial activity and quality of life in foot health) which led to the development of the FFI-R Short version (FFI-RS) [
20].
The use of Rasch model psychometric information (e.g., fit statistics, item-person map, rating scale function) helps to provide a more accurate evaluation of the items and measurement of a scale [
19‐
22]. For example, Rasch analysis provides information on how well an item assesses the underlying construct, the possibility of an item's redundancy with other items in the scale, and the acceptability of the response categories. The use of Rasch analysis can help reduce some of the limitations of CTT (e.g., dependency [
23], item difficulty) because it produces item statistics independent from the samples and person statistics independent from the items [
24], which has led to an increasing use of Rasch analysis in the development and assessment of clinical instruments for healthcare [
25]. Traditionally, the Rasch model includes two-facets: item difficulty and person ability; if another facet is added, a many-faceted Rasch model is used [
26,
27]. For example, when the five subscales of the FFI-RS (i.e., pain, stiffness, difficulty, activity limitation, and social issues) are incorporated in the modeling, the two-faceted model turns into a three-faceted model (i.e., item, person, and subscale). The analysis also estimates subscale difficulty using a many-faceted Rasch model while simultaneously “controlling for” the subscale difficulty in the estimation of two-facets, item difficulty and person ability parameters.
While the FFI-RS was developed with Rasch analysis, there are limitations with the initial analysis approach and the sample utilized. For example, it may be beneficial to use a larger, more diverse sample which better represents the patient population (e.g., males and females, various age groups) in which the scale is used [
28]. Further, it would be valuable to use many-faceted Rasch analysis to simultaneously estimate item difficulty, person ability, and subscales of the FFI-RS to better understand the validity of the FFI-RS from a multi-dimensional perspective. Therefore, the purpose of this study was to evaluate the psychometric properties of the FFI-RS as a patient-reported measure of foot function, using a many-faceted Rasch model with a large and diverse sample of responses from patients who completed the scale.
Discussion
The purpose of this study was to evaluate the FFI-RS using the many-faceted Rasch model that included a more extensive and diverse patient sample. Our findings provide further support for sound psychometric properties of the FFI-RS. The FFI-RS item estimates demonstrate appropriate fit and placement for the same metrics. The Rasch analysis resulted in 2 items (Q 26 and 29) being removed from the scale, with the final model retaining 32 items that had acceptable Infit and Outfit statistics.
Our findings did vary from prior research, but this may have been due to the quantity and composition of research participants used in each study, thus resulting in final model solution differences. For example, a previous study [
28] had a sample size of 92 patients from a Veterans Administration Hospital podiatry clinic in the Midwest, while our sample included 2,184 participants from an international surgical database. Rasch analysis with smaller sample sizes, like many other statistical analysis procedures, may be less powerful for fit analysis, may be more likely to skew the estimates due to larger standard errors, and may offer less robust estimates [
35]. Therefore, it is possible that sample size differences impacted model fit.
Our findings provide novel insight into the response scale for the FFI-RS. The 4-point Likert rating scale met requirements: (a) the responses of distribution of observation across the four categories, with a slightly positive skew; (b) the average logit measures and category thresholds were increased; (c) the Outfits mean square residual were less than 2 for each scale category. Therefore, our findings indicated that the 4-point Likert scale operated effectively in the many-faceted Rasch analysis.
Rasch analysis also has the advantage of individually measuring item difficulty [
36]. Item difficulty is related to which items are commonly endorsed (i.e., selected as impaired) by respondents; less difficult items are the ones which are most frequently endorsed (i.e., the item most often endorsed as impaired) by participants and more difficult items are less frequently endorsed. Our findings indicate the subscale “Difficulty” contains both the most and least difficult items for respondents: 1) the most difficult item (i.e., the least commonly endorsed by patients as impaired) was “Walking with assistive devices?”; 2) the least difficult item (i.e., the most commonly endorsed by patients as impaired) was “Running?”. In short, the FFI-RS “Difficulty” subscale contains the item with the lowest likelihood of being selected as impaired by respondents with a foot/ankle pathology, as well as the item (i.e., “Running”) most likely to be reported as impaired by respondents. Further research would be valuable to investigate response patterns based on pathology, symptom or injury severity, and population types (e.g., elite athletes, sedentary patients, different age groups).
Our results also include novel findings with the facet map which displays the items and subscales difficulty distribution of the FFI-RS, the foot function levels distribution of participants, and the relative position among an individual's foot function levels, items, and subscales in the FFI-RS. The end of the facet map has gapping, which means the FFI-RS does not provide a range of content to measure individuals with the highest and lowest levels of foot function. On the logits scale, items measuring persons in the same location may presumably be removed without a major loss of content information. The facet map findings, along with the item difficulty analysis, provide support for further scale modification which could occur with item development or modification to remove or modify redundant FFI-RS items within subscales. The goal would be to develop and evaluate an item pool which fully measures the intended constructs across relevant patient subgroups in future studies.
Our study of the FFI-RS used a larger, more heterogeneous sample (e.g., wide age range [i.e., 12 to 90 years old], larger inclusion of female participants) who sought patient care services for pain or pathology. Thus, our study likely includes a highly generalizable sample of the patient population and novel analysis not previously conducted; however, the use of the dataset from the SOS database and our work does have limitations. First, longitudinal data and a healthy sample of respondents were not included in this study. Future research is needed to assess the longitudinal properties of scale, as well as if the same results are obtained when healthy samples are included to inform if the FFI-RS can be used to differentiate between injured and healthy patients and guide return to activity or patient discharge decisions. Second, item level bias due to age, sex, or ethnicity differences may be a concern; further analysis across other relevant subgroups (e.g., different levels of physical activity or education, injury conditions, interventions) would also be valuable for assessing scale psychometric properties. Therefore, multi-group testing to assess differences across other demographic variables would be valuable.
Our analysis also did not exhaust all psychometric testing that would be valuable. For example, the lack of longitudinal data prevents testing of responsiveness (e.g., minimal clinically important differences) or test–retest reliability, while the lack of available demographic data (e.g., injury type) prevents valuable multi-group analysis from occurring. Lastly, the absence of questions that capture a wider range of foot function is a limitation in our findings. Future research should include various relevant patient populations (e.g., elite athletes, recreational athletes), age groups, pathology/conditions, interventions, and demographic variables (e.g., education levels, ethnicities) to establish the best item pool (i.e., item difficulty range to adequately capture foot function capacity), which would be useful to provide the most valuable information to clinicians while limiting patient response burden across the patient spectrum who could use the scale.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.