Introduction
Burn scars are known for their impact on the quality of life due to an array of functional, cosmetic, and psychological problems, related to scarring [
1‐
3]. Several appropriate instruments are available that have been tested and validated to evaluate scar quality [
4‐
6]. Scar assessment scales are often used because they are easily accessible and free of charge [
7,
8].
In 2004, the Patient and Observer Scar Assessment Scale (POSAS) was introduced [
9], which aimed at measuring the quality of scar tissue. The POSAS consists of an Observer and a Patient Scale and includes a comprehensive list of items, based on clinically relevant scar characteristics [
10]. The observer scores six items:
vascularization, pigmentation, thickness, surface roughness, pliability, and
surface area. The patient scores six items:
pain, pruritus, color, thickness, relief, and
pliability (see “
Appendix”) [
10].
All included items are scored on the same polytomous 10-point scale, in which a score of 1 is given when the scar characteristic is comparable to ‘normal skin’ and a score of 10 reflects the ‘worst imaginable scar’. All items are summed to give a total scar score, and therefore, a higher score represents a poorer scar quality.
Studies that compared the POSAS with the widely used Vancouver Scar Scale revealed that the former was more reliable than the latter [
9,
11]. At present, the POSAS is being used to evaluate the rehabilitation process in different types of injury [
11‐
19] and has been advocated by many for scar assessment [
2,
8,
11,
20].
Currently, all available scar assessment scales, including the POSAS, have been constructed and tested following principles of the classical test theory (CTT). However, modern test theories are considered superior to the CTT as it makes stronger assumptions and provides stronger findings. For this reason, the Rasch measurement model, one of the item response theory (IRT) models, is nowadays frequently applied in quality-of-life research [
21‐
26]. Use of Rasch methodology involves a rigorous and extensive analysis of the data and provides additional psychometric information that cannot be obtained through the CTT approach. The data are tested for fit into the Rasch model, allowing for a detailed examination of the internal construct validity of the scale, including properties such as reliability and ordering of the categories. It also determines whether a scale is unidimensional, which is required to justify summation of scores and can linearly transform raw scores from their original scale to an interval scale to allow application of parametric statistics.
After several years of using the POSAS for burn scar evaluation, it became appropriate to subject this tool to modern test theories. For this reason, we decided to apply the Rasch model [
27] to our data.
Discussion
Modern test theory analysis on a scar assessment scale is mandatory to improve the evidence base in scar treatment research. In general, the POSAS questionnaire performed adequately on burn scars, except for the item surface area, using the thorough and stringent Rasch analysis. The person reliability of the Observer Scale is just above 0.8 and of the Patient Scale nearly 0.8, which is the lower limit of reliability required for serious decision making. This can be explained by the limited range in scar quality in our sample. The item reliability for this sample of patients is very good despite the small number of items. Three statistically distinct levels of scar quality can be differentiated by both scales, for instance good, intermediate, and bad scars.
The items of the POSAS and other scar assessment scales are intended to measure a single variable (often referred to as a unidimensional variable) being ‘scar quality’. No substantial dimension could be identified by factor Rasch analysis, and therefore, the Observer and Patient Scales are suitable unidimensional questionnaires for the evaluation of burn scars. However, the dimensionality investigation of the Patient Scale did show an interesting structure (data not shown): the items pain and pruritus and the items thickness, surface roughness, and pliability can be interpreted as subdimensions in scar evaluation. The items pain and pruritus are typical neurological sensations of a scar, and thickness, surface roughness, and pliability can be considered as tactile characteristics.
The items in the Wright map of the Patient Scale show that the items pain and pruritus have a high item difficulty without overlap, meaning that the patients assess pain and pruritus as the most severe symptoms in relation to their scar. Both the item maps of the Observer and Patient Scale show some overlap of the item difficulties. Theoretically, overlapping items should be reduced, and new items should be included that may fill up the holes in the map, resulting in a more even spread of item locations. However, the selection of items has to be considered from a clinical viewpoint: from that perspective, all items appear to be relevant as they relate to the complaints and problems of patients that dictate possible interventions. Moreover, most other scar assessment scales include comparable sets of items.
Tables
4 and
5 show that the category frequencies are highly skewed to the lower end. The distribution of patient measures in the Figs.
1 and
2, however, is not skewed, probably because the lower categories of the items are uniformly used.
The most remarkable finding, from a clinical perspective, was the functioning of the item
surface area in the Observer Scale. The measures of all the items of the Observer Scale fit to the Rasch model, except for this item. Many scars tend to contract, leading to a significant reduction in the surface area, which is one of the most mutilating and disturbing problems for burn patients
. Surface area was implemented in the POSAS in the second version by our group because of its clinical relevance [
10]. Linear regression of this item on linear scars revealed that
surface area significantly influenced the general opinion of the observer. Apparently, the surface area remains difficult to assess because the scar changes over time and the original surface area can only be estimated for burn scars. For linear scars, the situation is different because usually a linear scar is a thin line immediately post-surgery. These scars may tend to broaden, which can easily be recognized. These findings suggest ‘differential item functioning’ (DIF) of the item
surface area on different scar types, which could not be studied in this sample.
The Patient Scale fit statistics revealed an adequate fit for clinical observations although the items pain and pruritus did show high infit and outfit mean values, indicating that the response on these items is often erratically or is difficult to predict by the model.
The category rating scale of the Observer Scale is working well. The clinicians can discriminate the 10 levels, although the fifth category is masked by categories 4 and 6 in the category probability curves. Partial credit analyses of the item surface area showed moderate disordered category probability curves. The categories of the Patient Scale are less ordered, indicating that the patients are not able to discriminate the current 10 levels in the scale. After reducing the number of categories, ordering of the categories was restored. The use of five categories for the Patient Scale should be studied in further scar research before definitely moving away from the use of ten categories.
Predictive validity could be confirmed for the Observer Scale by a good correlation between the clinicians input and the overall opinion on the scar. For the Patient Scale however, the correlation was only moderate. We believe that this can be explained by the validity of the overall opinion on the scar by the patient. In our experience, responses on general questions are depended on the patient’s current status and influenced by other aspects such as emotions, functional impairment, or quality of life.
No other study has analyzed the POSAS using the Rasch model. Nevertheless, Lindeboom et al. studied photographs of linear scars using a modified Observer Scale, which related the category scoring to clinical descriptions of the scars [
33]. For instance, the item
pigmentation showed increasing category scoring with the lowest score for normal skin, followed by hypopigmentation and ending with hyperpigmentation. This implicates that hypopigmentation is less severe than hyperpigmentation. The outcome and fit of this item will be highly dependent on the ratio of darker-skinned people to Caucasians within the sample. The item
pliability was excluded for further analysis because of a low reliability between the four raters. As mentioned by these authors,
pliability could not be assessed adequately from photographs. They showed an overall misfit of the data to the measurement model and suggested revision of the item categories and weighting the items. However, in our large data set obtained from clinical observations, we found no disordered categories in the original Observer Scale, except for the item
surface area. Our clinicians could discriminate all ten levels, and the category scale was working well. Therefore, we feel that it is premature to advise to change the Observer Scale because of a relatively small study which analyzed photographs of relatively small linear scars.
In conclusion, this study revealed several valuable insights into the psychometric properties of the POSAS. We confirmed that the scale is reliable and found that it provides a unidimensional measure for scar quality. For burn scars, all items, except surface area, showed a good fit to the stringent Rasch model. We feel that the functioning of this item is highly dependent on the type of scar being assessed. Therefore, the presence of differential item functioning should be investigated in another sample of POSAS scores obtained from different scar types. Research should also focus on category functioning of the Patient Scale. Small adjustments of the POSAS may be considered in the future only when extensive analysis has revealed that it will lead to superior clinimetrical properties of this scale.