Introduction

Functional capacity evaluations (FCEs) are standardized batteries of tests which all together form an evaluation of capacity of work-related activities. FCEs are used in occupational, insurance, and rehabilitation medicine in order to evaluate work ability. Earlier studies show that there is evidence of reliability and some aspects of validity, depending on the FCE protocol [1]. Worldwide there are multiple FCEs using different protocols from different providers which all claim to measure the same construct, namely functional capacity. However, concurrent validity of these FCE protocols are moderate to poor [24]. In addition, when the same protocol is administered in a different environment, different results appear [5]. Differences between various approaches to FCEs may include variations in the number of measurements obtained, degree of standardization or the clarity of the concepts and underlying theories [6]. A possible explanation of variation between results, besides the points addressed above, is the lack of consensus used in terms of operational definitions. Different authors have previously addressed this issue [3, 7, 8]. One study has addressed the problem of confused definitions of terms and confusion in conceptual framework. This study resulted in recommendations on how to use operational definitions in the field of work-related assessments [7]. Additionally, a different study addressed the presumed difference between a kinesiophysical (evaluator terminates a test when maximum is reached) and a psychophysical approach to FCE (patient terminates the test when acceptable maximum is reached). This study, however, found no differences between the test termination criteria and concluded that this presumed difference may be due to a lack of clarity in operational definitions [3]. Others found inconsistencies in terminology in physical functioning, functional ability, physical ability, physical activity, activity, capacity, performance, functional status, functional limitations, etc. and concluded that consensus was needed [7, 8]. Authors have proposed to use the World Health Organizations’ International Classification of Functioning, Disability and Health (ICF) to classify work-related definitions in a world wide consented framework [911]. All proposed to use the ICF because it considers functioning as a biopsychosocial understanding of health in which physical and behavioral functions are in dynamic interaction with each other.

The ICF is a classification system which was constructed by the World Health Organization (WHO) and aims to provide a universal classification system of disability and functioning for the use in health and health-related sectors. The aims of the ICF are: to provide a scientific basis for understanding and studying health and health-related states, outcomes and determinants; to establish a common language for describing health and health-related states in order to improve communication between different users; and to permit comparison of data across countries, health care disciplines, services and time and to provide a systematic coding scheme for health information systems. The ICF provides a model which describes determinants of functioning which depend on six interrelated components. These components are: disease and disorder; functions and structures; activities or limitations of activities to perform a task or action by an individual; participation or its limitations in the involvement in a life situation; environmental factors; and personal factors [12]. The purposes of ICF are very near to the purpose of this research and therefore, ICF may be suitable as a conceptual framework. A difficulty of interpreting definitions within this model may be that the ICF is generic to all health related topics and may not be sufficiently operationally defined for the use in specific working areas such as FCE. Therefore, with regards to operational definitions in FCE, widely consented definitions of experts may be very valuable because the integration of knowledge from researchers and clinicians can form a solid basis. Clear operational definitions may enhance establishment of common language and improve comparison of data. The objectives of this study were: to gain consensus in operational definitions used in FCE and to gain consensus in a conceptual framework in which FCE can be classified.

Methods

Study Outline

To reach consensus on operational definitions used in FCE and on conceptual framework, a Delphi study design was used. In total, three Delphi rounds were held. A focus group meeting was held prior to the Delphi Survey. The subsequent steps which were followed were adapted from Fowles [13] and are presented in a flow chart in Fig. 1. The first step in the process of the construction of the questionnaire was made by the authors who pre defined operational definitions that were frequently used in the international literature or operationally defined by a dictionary. Pre definitions were send to Dutch FCE experts and following to this, a Focus Group meeting was held with Dutch FCE experts in order to construct a semi structured questionnaire in which all relevant objectives were addressed. This led to the basis of the first round questionnaire. Consensus was operationally defined when 75% or higher of the participants agreed [14]. All questions on which no consensus was reached as a result of the first round, were adapted and rewritten by the authors based on recommendations of experts and were provided in the second round. An additional third round was held to address definitions in which no consensus was reached in the first two rounds. All questionnaires were sent by e-mail. Experts were given 2 weeks to fill in and return the questionnaire by e-mail or fax. E-mail reminders were sent after the first and after the second week.

Fig. 1
figure 1

Flow chart of Delphi Survey

Participants: The Expert Panel

The Focus Group, which was held prior to the Delphi Survey, consisted of six Dutch FCE experts and one expert of the ICF. The aim of the Focus Group was to construct a first round questionnaire and to select experts in the field of FCE. Experts were invited to represent a variety of expertise in FCE. Experts represented clinical practice, research or provider of FCE, and were working in insurance, rehabilitation, occupational medicine, and education. Experts were selected if they met any of the following criteria: At least one international publication as first author and one as co-author in the field of FCE; or an individual who had developed an FCE that was subject of investigation in at least one publication in international literature. The authors consulted the Medline database to identify potential participants. Additionally, Focus Group members and invited experts were sent a list of all potentially eligible experts and they were asked whether anyone should be on the invitation list that was not invited yet but did meet the inclusion criteria. Experts who were willing to participate signed informed consent and returned this. A total of 33 potential eligible experts from North America, Australia, Asia and Europe were identified and invited to participate in this study. Anonymity of experts was guaranteed. All correspondence concerning the study was collected by the author’s secretary and results were blinded for the authors. Data analyses were thus performed anonymously.

First Round

The first round questionnaire was semi-structured and consisted of two sections addressing 30 questions. The purpose of the first round was to explore the experts’ opinions about definitions of FCE-related terms and to explore whether a conceptual framework could be used to classify terms. Additionally, experts were asked to provide additional definitions of terms besides those that were already addressed. The first section addressed the place of FCE in a conceptual framework. Section 2 of the questionnaire addressed operational definitions of FCE related terms. The content of the first round is presented in Appendix. The questionnaire took approximately 1 h to complete.

Second Round

Based on the results of the first round, the second round questionnaire was constructed (Appendix). This questionnaire contained 21 full structured questions. The questionnaire contained two sections. The first section addressed the place of FCE within the ICF. Experts were asked whether they did or did not concur with FCE related definitions as predefined by the ICF and whether they agreed or did not agree with certain statements used in FCE language. In section 2 of the questionnaire, experts were asked to agree or disagree with operational definitions which were used commonly in FCE. Terms indicated in Round 1 were included in the second round.

Third Round

A third round with nine questions was held to clarify different constructs in which no consensus was reached (Appendix). This questionnaire contained questions in which two definitions were proposed which were mostly supported in the second round. Additionally, this questionnaire contained questions concerning the place of FCE in ICF. Experts were given the opportunity to ‘agree’ or ‘disagree’ with a statement or to choose one definition which should be used in FCE in their opinion.

Results

A total of 33 potential eligible experts from six different countries were identified and invited to participate in this Delphi Survey. A total of 22 experts responded to this invitation and signed and returned informed consent. There were 11 non-responders (33%). Included experts (18 researchers, 4 developers) were from Australia (n = 3), Europe (n = 10) and North America (n = 9).

First Round

Of all included experts, 95% returned the questionnaire within two weeks (21 out of 22). In Tables 1 and 2, the items on which consensus was met are presented. While experts agreed upon the use of ICF as a conceptual framework, there was at this stage no consensus on how to do this. Additional definitions of terms were proposed in a high variety by the expert panel, indicating that experts are using different terms and definitions of terms. Three items were accepted concerning conceptual framework (Table 1) and three definitions were accepted (Table 2). As a result of the first round, the authors chose to exclude further questions concerning work performance, work ability, work tolerance, malingering and aggravation. Authors did this because the experts agreed with the complexity and extensiveness of these terms and should be researched separately from this study (see Table 3).

Table 1 Items in which consensus was met concerning FCE within the framework of the ICF (agreement ≥ 75%)
Table 2 Items in which consensus was met concerning operational definitions of FCE related terms (agreement ≥ 75%)
Table 3 FCE related items in which no consensus was met

Second Round

The response rate after the second round was 82% (18 out of 22) in this round. Experts reached consensus on the definitions as they were predefined within the ICF. Five items were accepted concerning conceptual framework (Table 1) and three definitions were accepted (Table 2).

Third Round

The response rate of the third round was 82%. Consensus was being reached in five out of nine questions. Results of the third round are presented in Tables 1 and 2. After the third round, consensus was reached on 19 items. Nine items represented operational definitions and ten items concerned the place of ICF in a conceptual framework.

No consensus was reached on nine definitions, on which five were excluded as a result of the first round. Definitions for which no consensus was reached after the third round were: FCE; Physical Capacity Evaluation; recovery and ability. All excluded definitions are presented in Table 3.

Discussion

One of the main results of this study was that experts agreed on using the ICF as a conceptual framework for FCE and that experts consented with definitions of terms as defined in the ICF. The study results gain more insight in the definitions which are used frequently in FCE and contribute therefore to psychometric characteristics of FCEs. Interestingly, no consensus was reached on the term FCE itself. Even though consensus was reached on the different terms that comprise FCE, no consensus was reached for one single definition of FCE. It appears that this combination of terms seem to be interpreted differently than the items solely. After elimination of optional definitions during three rounds, two definitions remained. The two definitions with the highest points scored were:

  1. 1.

    A FCE is an evaluation designed to document and to describe a person’s current safe work ability from a physical ability and motivational perspective with consideration given to any existing medical, impairment and/or pain syndromes. (38% agreement)

  2. 2.

    A FCE is an evaluation of capacity of activities that is used to make recommendations for participation in work while considering the person’s body functions and structures, environmental factors, personal factors and health status. (63% agreement)

In both definitions, multiple biopsychosocial factors such as personal and contextual factors are taken into consideration. Moreover, both definitions consider similarity of FCE purpose, namely to evaluate ability or participation for work. It remains unclear whether both definitions can be compared to each other on outcome because no consensus was reached on the term ability. Both definitions may be not mutually exclusive and some experts stated that they may be even complementary to each other. However, if ICF were to be used as a framework for FCE, the second definition seems preferable because all terms are defined within the ICF. However, as authors of this study, we have excluded ourselves as participants in this study. Therefore, based on the predefined methodology of this study, we cannot recommend one definition over the other. Thus, it is recommended that in future studies researchers provide the definition of FCE they used.

Former research to psychometric properties of functional tests had recommended that all test selection should be done based to psychometric properties of safety, reliability, validity, practicality, and utility [15]. Safety, for example, has previously been object of discussion, merely, because of the lack of a consented operational definition for safety and for injury [1618]. Therefore, above all, previously mentioned properties can only be applied when measurement instruments are placed and described in the context in which they are intended to and if operational definitions are clear. This is a crucial point in many different health sectors in which researchers from different areas conflict with each other because of a lack of consensus in terminology. This, in turn, makes it impossible to compare or interpreted data correctly and seriously hinders progression in this field. ICF may in this case offer a framework in which multiple work fields may classify definitions [12]. ICF, however, is universal to all health related sectors and should in most cases be further operationally defined to be of use in other specific sectors.

A difficulty in this study was, as mentioned above, that ICF is generic to all health related sectors and not specific to any work field in particular. FCE development evolved in the 1980s, 20 years before the introduction of the ICF in 2001 [12]. This made it difficult to post hoc classify terminology in a framework of a date beyond introduction of FCE. Another difficulty in reaching consensus was the differences of work disciplines and health disciplines involved in FCE. Therefore, an expert group was selected which consisted of persons working in insurance medicine, rehabilitation medicine, occupational medicine, and education. Because FCE is used by different disciplines, terminology had evolved in the past decades to a jumble of terms in which different health care providers used different terms. In the 1980s, FCE developers and researchers were strongly influenced by the biomedical model. The term capacity, for example, was first defined as ‘physical abilities maximums’ [19]. This dualistic approach formed a basis of categorization of physical and psychosocial factors influencing the individual based on body functions and structures. This approach excludes contextual or personal factors by stating that functioning is no more than the sum of different body functions or structures. As a result of the first round, experts agreed on using the ICF as a potential useful classification system for FCE and related terminology. The experts disagreed, however, on how to classify the terms of ‘capacity’ and ‘performance’. There appeared to be a rather strict separation between biomedical oriented and bio psychosocial oriented experts. Where the biomedical oriented defined capacity as “the maximal limits of the anatomical system”, the bio psychosocial oriented [20] disagreed because we cannot measure the maximal limits of the anatomical system and capacity is about functioning and not about body functions/structures. The latter agree with a definition such as ‘the highest probable level of functioning’. The main result of the first round was therefore: experts do agree that the ICF provides a useful framework but do not agree on how to classify definitions within the ICF.

One objective of the second round was to confront experts with this contrast. Authors constructed a questionnaire to address these issues. All experts were asked whether they could or could not concur with the definitions of capacity and performance as they were predefined within the ICF. At least 79% of all experts concurred with these questions. Some experts who did not concur with these definitions did this because “FCE terminology had already been developed in the biomedical context and was not incorporated in the ICF model.”

A general weakness of this study may be selection bias of included experts. Some experts may have dropped out or resign to participate because of negative feelings they have about the study. The response rate of all experts who agreed to participate, however, was above 80% in all three rounds. Delphi studies, however, have been found an effective way to gain and measure group consensus in healthcare [21]. To reduce the risk of excluding experts who should have been included in this study, the focus group, which was held prior to the Delphi Survey, was asked whether any experts should be invited who was not pre-screened from the Medline database by the authors. This resulted in three additional experts. This question was again asked to all experts when sending the first invitation. Again, two additional experts were included. Nevertheless, experts could have been missed which may have led to a selection bias. Another point of selection bias was present because two of the authors of this study (RS; MR) met the inclusion criteria for experts but were not included in the expert panel. Strength of this study was that experts did not directly interact with each other, which prevented social processes or contaminations that can happen in group processes. Where single experts may suffer biases and group meetings suffer from ‘follow the leader’ tendencies, a Delphi method was assumed to be the most appropriate technique for this consensus study [22].

In conclusion, the results of this study show that consensus was reached in a large part of operational definitions in FCE. This may enable researchers as well as clinicians to improve communication and to better interpret data and patient outcome. In this study, consensus was met on using the ICF as a conceptual framework in order to classify terminology of FCE. Experts met consensus to use pre-defined terms of the ICF. Consensus was met in 19 statements and definitions in total. No consensus was met about a definition of FCE for which two potential eligible definitions remained. It was recommended that authors define definitions they use in future research in order to permit comparison of data and to serve as the use of a common language.