Background
Static foot assessment is a common approach in clinical practice for classifying foot type with a view to identifying possible aetiological factors relating to injury and prescribing therapeutic interventions [
1,
2]. This approach is underpinned by a contextual model of the foot whereby structural alignment, or position of the foot, is used to infer characteristics of dynamic foot function, and theoretically establish injury mechanisms leading to pathology [
3‐
5]. This model of foot function is primarily derived from the work of Root et al [
6,
7] who proposed static assessment measures to enable clinicians to identify deviations from an ideological ‘normal’ foot. A lack of empirical evidence and concerns with the reliability [
8] and validity [
9,
10] of this work have led to moves away from this approach and to the development of new foot function paradigms [
11,
12]. However, despite these more contemporary approaches, the premise of being able to categorise the foot based upon its anatomical characteristics remains appealing and thus static foot assessment remains common. As such numerous foot classification measures have been developed over the past three decades [
1,
13,
14]. The majority of these measures, whether they be anthropometric (rearfoot angle (RFA), medial longitudinal arch angle (MLAA), navicular drop (ND)), footprint (arch index, malleolar valgus index) or radiographic measures typically provide only a uni-planar assessment of foot posture. In contrast, the Foot Posture Index [
13] (FPI-6) is a multi-planar tool, that combines sagittal, frontal and transverse plane assessments of the foot, that has gained popularity over the past decade.
While there is a plethora of literature exploring the reliability of different foot classification measures there has been little work exploring the level of agreement between different measures. The validity of common static foot measures, specifically the FPI-6, navicular height and Arch Index, was reported in a cohort of older adults [
15]. Moderate to strong correlations between clinical measures were reported, with normalised navicular height and FPI-6 demonstrating the highest association (
r = -.74). Similarly, significant associations (
p ≤ .01) and moderate to strong correlations (
r = .42) were reported for clinical and radiographic measurement. These findings are supported by the work of Murley et al., [
14] who reported moderate to strong (
r = .24 - .70) relationships between clinical and radiographic measures. Further work [
16] looking at the association between footprint indices (malleolar valgus index and arch index) and navicular measures (navicular drift and drop) reported significant correlations between malleolar valgus index and ND in single (
r = .61,
p < .001) and bipedal stance (
r = .66,
p < .001). Significant correlations were also reported between Arch Index and navicular drift during single leg stance (
r = .43,
p = .029). These studies have all included footprint based tools and the majority have included radiographic measures. The use of footprint indices is contentious [
17] due to a lack construct validity [
18,
19], while the use of radiographic measures necessitates specialised equipment and exposure to radiation. Furthermore, all of the cited studies have explored the relationship between raw scores, rather than the agreement between measures in relation to which category the foot is classified into. As such the studies offer little indication of the agreement across the measures due to differences in cut off points for foot classification between different metrics. Assessment of the level of agreement would shed light on the consistency with which the foot is classified based on different measures and to the extent to which different foot classification measures are analogous. Information of this kind may in turn help to develop a more standardised approach to static foot classification.
Without doubt access to simple, quick and safe methods to assess the foot is important [
20] but, given the number of measures available, there is a need to explore current measures to ensure that the appropriate techniques are used. Consistent, credible and standardised measures are fundamental to informing practitioners involved in foot assessment and care delivery. Equally, the varied use of clinical measures in research studies challenges the pooling and systematic analysis of research data and translation of research findings into clinical practice. Establishing agreement between common measures will help inform debate about the suitability of current measures and ultimately encourage a more standardised approach to clinical practice. Therefore, the aim of this study was to determine the level of agreement between commonly used measures of foot classification.
Results
Test score, intra-rater reliability and within-measure classification agreement are displayed in Table
1. The intra-rater reliability of the FPI-6 (
ICC
(3,1)
= .93), RFA (
ICC
(3, 1)
= .93) and MLAA (
ICC
(3, 1)
= .91) were almost perfect. ND demonstrated fair reliability (
K
w
= .4). The level of agreement for foot classification based on each measure across the two test sessions was almost perfect for the FPI-6 (
K
w
= .92) and MLAA (
K
w
= .92), moderate for the RFA (
K
w
= .6) and fair for ND (
K
w
= .4) (Table
1).
Table 1
Test scores (mean (SD)), intra-rater reliability and agreement for the foot classification measures
FPI-6 | 4 (4)a | 3 (4)a | .93 | .92 |
RFA (°) | -3 (3) | - 3 (3) | .93 | .60 |
MLAA (°) | 136 (10) | 136 (9) | .91 | .92 |
ND (mm) | 7 (3) | 6 (3) | .40b | .40 |
The number of participants classified as having pronated, neutral or supinated feet and the between-measure agreement are detailed in Table
2. Using the FPI-6, 53% of participants were classified as having a neutral foot type, 40% a pronated foot type and 7% a supinated foot type. With the MLAA, 73% of participants had a neutral foot type, 20% of participants a pronated foot type and 7% a supinated foot type. When using the RFA 33% of participants had a neutral foot type, 67% a pronated foot type and 0% had a supinated foot type. Seventy three percent of participants were classified as having a neutral foot type using ND, with 17% a pronated foot type and 10% a supinated foot type. There was moderate agreement (
K
f
= .58) between the foot classification measures.
Table 2
Number of participants classified as having pronated, neutral and supinated feet by each of the static foot classification measures and Fleiss Kappa statistic (K
f
)
FPI-6 | 5 | 23 | 2 |
RFA | 10 | 20 | 0 |
MLAA | 6 | 22 | 2 |
ND | 5 | 22 | 3 |
K
f
| .58 | | |
Discussion
Static foot assessment is commonly undertaken to inform clinical management to identify possible aetiological factors of injury and prescription of therapeutic intervention(s), such as foot orthoses [
1,
2]. Consistent, credible and standardised foot measures are key to informing clinical decision making but inconsistencies with the measures and outcome scores pose challenges for practitioners. The aim of this study was to determine the level of agreement between commonly used foot classification measures. Initial within-measure agreement for foot classification was based on the test and retest classification score for each measure. The FPI-6 and MLAA were the most consistent methods for classifying the foot (
Kw = .92) across two sessions whilst the RFA (
K
w
= .6) was lower, but with moderate agreement between test sessions. In contrast, ND was the least consistent measure for classifying the foot (
K
w
= .4) across sessions. The assessment of ND has gained popularity as a simple and quick clinical measure [
28] despite conflicting opinion on the reliability of the measure [
28,
29]. The findings from this study highlight concerns about the use of the measure as a stand-alone test for foot classification, and re-iterate concerns about the reliability of the measure. The intricacies with navicular tuberosity and sub-talar joint palpation are factors which pose challenges and, as an independent measure, the findings from this study suggest that patients may be misclassified with this measurement and as such the purpose of the measure is challenged.
Agreement between measures for classifying participants’ feet was moderate (
K
f
= .58). This confirms that the measures did not classify participants’ feet into consistent categories (pronated, neutral and supinated). The level of agreement reported in this research is lower than previous studies [
15,
16] but this disparity is unsurprising given that the studies have used different measures to classify foot structure, and different approaches to statistical analysis have been undertaken. Due to the different constructs considered by each of the measures in this study our finding may not be surprising but, nevertheless, this remains a concern as the measures purport to classify the foot into three common categories. The moderate level of agreement reported in this study may be explained by the fact that the three dimensional nature of foot structure cannot be represented by a single, uni-planar measure. Our findings re-iterate current opinion that static foot measures are of limited clinical value [
20,
30] and that appropriate cut off boundaries for foot classification using reliable measures is required to increase the consistency with which the foot is classified. Furthermore, the limited agreement across measures suggests that the pooling of data across studies using different foot classification tools should be undertaken with caution, as the measures tested are not analogous in the manner in which they classify the foot. One factor that would influence the level of agreement between the measures reported is the classification boundaries used to categories the foot into pronated, neutral and supinated groupings for each measure. The boundaries used within this study for each measure are consistent with those commonly reported within the literature [
12,
21,
23,
24]. However, only the cut off boundaries for the FPI-6 are clearly based on normative data. Thus future work is required to determine normative values for the interpretation of static foot classification measures and also a better understanding of how static measures relate to dynamic function and injury.
Based on the data presented in this study, it is our opinion that navicular drop is not an acceptable measure for characterising the foot. Individual measures of foot dimensions (such as navicular drop) may be useful for clinical assessment of specific anatomical sites but it is important to take into account reported findings about the validity, reliability and responsiveness of the measures. The MLAA emerged as the most robust of the uni-planar measures with a higher level of reliability, good agreement within measure for foot classification and broader foot classification boundaries The FPI-6 was the only multi-planar measure used in this study which demonstrated excellent reliability and agreement across sessions. This measure appears to be a robust and reliable means of static foot assessment and offers a more valid approach to assessing static foot structure.
There are some limitations to the work that must be acknowledged. There was a short time frame between test and retest measurements which may have led to a learning or memory effect. To reduce the potential learning or memory effects, a second rater was used to record all scores in an attempt to blind the primary rater to the measured scores. The similarities between the reliability coefficients reported within the study and those previously reported within the literature, where larger time frames between test and retest measurements have been utilised, suggest that no obvious learning effects took place. An additional limitation of the work was the sample recruited into the study. The participants involved in this work were healthy and free from pathology which may limit the external validity of the findings. The recruitment of a healthy population is also likely to have influenced the reliability coefficients reported within this study. It is acknowledged that reliability is a product of a number of factors including the participants, assessor, measure and environment. As such changes in any one of these factors are likely to alter the reliability of the measures reported within this work. We also acknowledge that there are a number of measures that we have not been able to consider in our study.
Acknowledgements
The authors would like to thank Mr Simbarashe Tanyanyiwa for his assistance during data collection.