Consistency of extreme response style and non-extreme response style across traits

https://doi.org/10.1016/j.jrp.2012.10.010Get rights and content

Abstract

The consistency of extreme response style (ERS) and non-extreme response style (NERS) across the latent variables assessed in an instrument is investigated. Analyses were conducted on several PISA 2006 attitude scales and the German NEO-PI-R. First, a mixed partial credit model (PCM) and a constrained mixed PCM were compared regarding model fit. If the constrained mixed PCM fit better, latent classes differed only in their response styles but not in the latent variable. For scales where this was the case, participants’ membership to NERS or ERS on each scale was entered into a latent class analysis (LCA). For both instruments, this second order LCA revealed that the response style was consistent for the majority of the participants across latent variables.

Highlights

► We investigated the consistency of response styles across traits. ► We analyzed data from PISA 2006 attitude scales and the NEO-PI-R. ► Latent classes of extreme responders and non-extreme responders were found. ► Latent class analyses showed that response styles are consistent for most respondents.

Introduction

The aim of this paper is to analyze the consistency of response styles across the different latent variables assessed in an instrument. Response styles occur in many questionnaires employing Likert-type scales. However, it is unclear whether participants use the same response style throughout the instrument, independently of the trait being assessed, or whether there is a relationship between the occurrence of response styles and the trait. This paper tries to elucidate the consistency of response styles using mixed Rasch models, in which participants are allocated to response style classes for each of the scales, and a latent class analysis, in which the consistency of membership to a certain response style class is investigated. In the following, first a definition of response styles is provided and the importance of considering response styles is addressed. Second, the use of mixed Rasch models to investigate response styles is explained. Third, existing research on the stability and consistency of response styles is summarized and our approach to investigating the consistency of response styles is described.

The term response style refers to systematic individual differences in response scale use that are independent of item content and the respondent’s trait level. Thus, an individual’s response style characterizes his or her tendency to prefer certain response categories over others. Response styles that have been shown to occur frequently are acquiescence response style, the tendency to agree with items, disacquiescence response style, the tendency to disagree with items, extreme response style (ERS), the tendency to prefer extreme response categories, and midpoint responding, the tendency to choose the middle category of a response scale (see Baumgartner and Steenkamp (2001) for a detailed summary of common response styles). Importantly, in all cases, the response style tendency is characterized by its occurrence irrespective of the item’s content and the person’s standing on the trait being assessed by the item.

The pervasiveness of these response styles has been shown in a wide variety of self-report questionnaires using Likert-type response scales. For example, Rost et al., 1997, Austin et al., 2006 found ERS in the German and English NEO-FFI (Borkenau and Ostendorf, 1993, Costa and McCrae, 1992) using mixed Rasch models. Eid and Rauber’s (2000) mixed Rasch analysis of a leadership performance scale resulted in two subgroups of participants, one that preferred extreme categories and one that used the response scale evenly. Buckley (2009) showed the occurrence of acquiescence response style, disacquiescence response style, extreme response style, and non-contingent responding (i.e., inconsistent responses to similar items) in several attitude scales included in the Programme for International Student Assessment (PISA) 2006 student questionnaire (OECD, 2006).

Concerning the relationship between response styles and personality, Austin et al. (2006) found that persons employing ERS had higher extraversion and conscientiousness scores as measured by the NEO-FFI (however, see Paulhus, 1991). Naemi, Beal, and Payne (2009) showed that ERS and peer-ratings of intolerance of ambiguity, simplistic thinking, and decisiveness were positively associated. Individual differences in response styles appear to be influenced by other factors as well. For example, Eid and Rauber (2000) reported that women had a higher probability of being allocated to the ERS group compared to men. Van Herk, Poortinga, and Verhallen (2004) showed that the occurrence of ERS and acquiescence response style differed between six European countries. Both response styles were more pronounced in Mediterranean than in Northwestern Europe. Johnson, Kulesa, Cho, and Shavitt (2005) analyzed data from 19 countries and found that the cultural dimensions power distance and masculinity were positively related to ERS whereas they were negatively related to acquiescence response style. Smith (2004) showed a relationship between acquiescence response style and nations that are high on family collectivism.

This paper focuses on response styles that differ in the degree of extremity of the preferred response. These are ERS and its opposite, namely a response style characterized by the avoidance of extreme response categories, called non-extreme response style (NERS) in the following, as well as midpoint responding. Note that NERS is not the same as midpoint responding, since midpoint responding is defined as an explicit preference for the middle category, while respondents employing NERS can prefer all moderate categories, including but not limited to the middle category. With the widely used response format strongly disagreedisagree – neutral – agree – strongly agree we would therefore expect ERS respondents to favor strongly disagree and strongly agree, midpoint responders to favor neutral, and NERS responders to favor either one of the moderate categories disagree, neutral, or agree irrespective of their true trait level.

The importance of considering response styles is illustrated by Austin et al. (2006) who showed that sum scores may be distorted when participants employ different response styles. In particular, ERS participants received more extreme trait scores compared to other participants. Thus, comparisons of sum scores across subgroups of participants may be rendered invalid by response styles. Furthermore, as Buckley (2009) pointed out, when attitudinal data obtained from international educational assessments such as PISA are used in secondary analyses, conclusions may be erroneous if individual and cross-cultural differences in response styles are not taken into account. Thus, the aim of this paper is to explore the consistency of response styles across traits in an instrument using several attitude scales from the PISA 2006 student questionnaire and a widespread personality questionnaire, the NEO-PI-R.

In line with other studies on response styles (e.g., Austin et al., 2006, Eid and Rauber, 2000, Rost et al., 1997), mixed Rasch models (Rost, 1990, Rost, 1991) will be used to identify subgroups of participants that differ regarding their response style. Mixed Rasch models combine latent class analysis (LCA) with Rasch models. Both qualitative differences between subgroups of participants (as in an LCA) as well as quantitative differences between participants within a subgroup (as in a Rasch model) can be analyzed simultaneously. Thus, in a mixed Rasch model the Rasch model holds within each latent class but item parameters vary between latent classes.

The least restrictive mixed model for polytomous data (mixed partial credit model; Rost, 1991) extends the partial credit model (PCM; Masters, 1982) by incorporating class-specific parameters. According to Rost, 1990, Rost, 1991, Rost, 2004, it is defined asP(Xvi=x)=g=1Gπgexp(xθvg-σixg)s=0mexp(sθvg-σisg)with σixg=s=1xτisg for all ixg and two side conditions, i=1kx=1mτixg=0 for all g, and σi0g = 0 for all i and g (explanation below). The probability of person v endorsing category x on item i with m + 1 categories (x = 0,  , m) is denoted by p(Xvi = x). The class size is given by πg (0 < πg < 1) with the constraint g=1Gπg=1 for all g with the latent classes (g) being mutually exclusive.

In Eq. (1), θvg is the individual person parameter indicating the trait level of person v in latent class g. In analogy to the item difficulty in the dichotomous Rasch model, the item location in the PCM, which can be computed as the mean of the thresholds (see below), can be interpreted as the mean endorsement difficulty of the item. Thus, items with higher item locations are more difficult to endorse (i.e., higher trait levels are necessary to endorse response categories stating agreement) and items with lower item locations are easier to endorse. To illustrate, the gray line in Fig. 1a shows the item locations for the five items on the PISA 2006 student questionnaire scale instrumental motivation in science. As in the dichotomous Rasch model (Rasch, 1960), item location and person parameters are represented on the same latent trait with scale values in units of logits (depicted on the y-axis), which usually range between −3 and 3 (Embretson & Reise, 2000). For instance, in Fig. 1a, item 1 has a lower item location parameter (i.e., higher endorsement probability; 0.38 logits) than item 2 (0.96 logits).

Threshold parameters govern the responses in each item’s categories. The threshold parameters indicate at which trait level it is equally likely for a respondent to answer in two adjacent categories. For implementation into the model in Eq. (1), σixg, these threshold parameters are cumulated into item parameters σixg=s=1xτisg over all thresholds the participant’s response x exceeded. In Fig. 1a, the black lines show the three threshold parameters for each of the five items on instrumental motivation in science. For example, the solid black line in Fig. 1a is the threshold between categories 1 (strongly disagree) and 2 (disagree). For item 1 it is located at about −4.2 logits. Since thresholds and trait values are estimated on the same logit scale, this indicates that a person with a trait value of −4.2 logits is equally likely to choose either strongly disagree or disagree. Likewise, respondents with trait values between two thresholds are most likely to respond in the corresponding category: respondents with trait values between −4.2 and about 0.5 (threshold 2 in Fig. 1a) will most likely choose disagree. Considering the probabilistic nature of the model these are always only statements about the most likely propensity for each person.

With the norming condition i=1kx=1mτixg=0 within each class g, effectively the mean of all item locations within each class is set to 0. A further condition (σi0g = 0 for all i and g; Rost, 1991) allows the index for the response categories x to be used for the notation of the thresholds τisg as well.

In a mixed PCM with more than one latent class (g > 1 in Eq. (1), the PCM holds within each latent class but item parameters may be different between the classes. Item parameter invariance between samples is a property of unidimensional traits and, for example, is subject to testing the homogeneity of scales (Andersen, 1973). With item location and threshold parameters being different between classes, the latent variables measured in such classes strictly speaking have different meanings, i.e., different traits are measured in latent classes with differing item parameters. The differences between latent classes can be interpreted as content-related differences (e.g., different traits are being measured) as well as content-unrelated differences (e.g., differences in response scale usage).

For the examination of the consistency of response styles, it is desirable to ensure that participants solely differ in their response scale use on the scales under investigation, but not in the trait that is being assessed, their understanding of the items’ content, or other factors that might influence the choice of a response category. The central idea of the approach presented in this paper is to differentiate between differences in item locations, which are interpreted as capturing different traits, and differences between classes in threshold parameters, which reflect different response styles while responses can be assumed to be on the same latent trait. Whether the latent classes are homogeneous regarding the trait being measured and only differ in response styles can be tested by model comparisons between a regular mixed PCM as described above and a constrained mixed PCM. Instead of estimating all parameters (locations, thresholds) freely for each class as in the unconstrained mixed PCM, for the constrained mixed PCM, item locations are restricted to be equal between latent classes yielding σixg = σix in the model in Eq. (1).

Since all parameters are estimated freely in the unconstrained mixed PCM, the resulting latent classes can differ regarding response styles as well as other factors such as the trait being measured. With the equality constraint imposed on the item location parameters in the constrained mixed PCM, homogeneous latent classes can be assumed which can only differ in the distribution of the threshold parameters τixg characterizing different response styles. This is illustrated in Fig. 1a, Fig. 1b which shows the characteristic difference in threshold parameters between NERS and ERS for the PISA 2006 attitude scale instrumental motivation in science. For the NERS group (Fig. 1a), thresholds are widely spaced while for the ERS group (Fig. 1b), the three thresholds are close together. Due to the equality constraint implemented in the constrained mixed PCM, the location parameters (grey lines in Fig. 1a, Fig. 1b) are the same for both classes. Thus, the classes only differ in the distribution of their threshold parameters. For participants allocated to the NERS group, the trait level necessary to choose one of the outer categories (strongly disagree or strongly agree) is more extreme than for participants allocated to the ERS group. For example, on item 5 of instrumental motivation in science, a NERS person would need a trait value of about 6 for strongly agree to be the most likely category, while for an ERS person a trait value of about 2 would suffice. Thus, participants in the NERS group can be interpreted as respondents who prefer middle categories while participants in the ERS group can be interpreted as participants who prefer extreme categories.

If the constrained mixed PCM holds for observed data, confirming trait homogeneity between the latent classes, trait values are directly comparable between latent classes. By constraining item location parameters to be equal it is ensured that trait values are on the same scale while potential differences in response style use are captured by the threshold parameters. Thus, trait values based on the constrained mixed PCM are corrected for response styles (see Rost et al., 1997). Since sum scores may be affected by different response styles, only trait values from a constrained mixed PCM should be used to compare the trait levels of respondents from different latent classes (i.e., response styles).

In sum, the approach taken in this paper to operationalize response styles is to compare a mixed PCM and a constrained mixed PCM regarding model fit for the scales under investigation. If the assumption of trait homogeneity between the latent classes holds, they only differ with respect to their response scale usage. Scales in which this is the case will be included in the analysis of the consistency of response styles across traits presented below.

Several studies have explored the stability of response styles longitudinally and across traits. Regarding the longitudinal stability of response styles, Bachman and O’Malley (1984) reported high reliability estimates for an agreement and an extreme responding index for a follow-up period of up to four years across five questionnaire forms. Participants in Weijters, Geuens, and Schillewaert’s (2010b) study filled out two different online questionnaires with a one-year interval between data collections. Weijters, Geuens et al. (2010b) analyzed the stability of four response styles (acquiescence response style, disacquiescence response style, extreme response style, and midpoint responding) using a second order measurement model that included time-specific response style factors for the two waves and second order time-invariant response style factors. They found that more than half of the variance in the time-specific response style factors was explained by their respective time-invariant response style factor, supporting a high stability of the four response styles over a one-year period.

Concerning the consistency of response styles across traits within a questionnaire, Austin et al. (2006) found that membership to either the ERS or NERS latent class in (unconstrained) mixed Rasch models correlated significantly and positively between neuroticism, extraversion, agreeableness, and conscientiousness, indicating that participants applied the same response style over the course of the NEO-FFI. Similarly, using correlations between class memberships derived from mixed Rasch models as well, Hernández, Drasgow, and González-Romá (2004) reported that about 49% of their participants were consistently allocated to the class avoiding the middle category across the traits assessed in the 16PF Questionnaire (Cattell, Cattell, & Cattell, 1993), though participants demonstrating a preference for the middle category did not do so consistently. Furthermore, Weijters, Geuens, and Schillewaert (2010a) used structural equation modeling to show that acquiescence response style and extreme response style were mostly consistent across a random sample of items taken from marketing and attitude scales. They found that response styles were best modeled using a tau-equivalent factor model with a time-invariant autoregressive coefficient, indicating that the effect of the two response styles generalized across independent item sets.

In this paper, we take an alternative approach to testing the consistency of response styles across the traits assessed in a questionnaire, namely a second order latent class analysis (Keller & Kempf, 1997). That is, mixed Rasch models will be computed first to allocate participants to different response styles. Then, a latent class analysis will be computed using the response style assignments resulting from the mixed Rasch models. In the following, analyses conducted on several PISA 2006 attitude scales (study 1) and the NEO-PI-R (study 2) will be reported. The results from both studies will be discussed in the general discussion.

Section snippets

Sample

In study 1, data from the German students taking part in the PISA 2006 assessment (“PISA sample”) was analyzed.1 Only the German PISA 2006 sample (as opposed to all the countries taking part in PISA 2006) was used to avoid cross-cultural differences in response styles (e.g., Johnson et al., 2005) from contaminating the analyses. For an investigation of

Sample

The sample in study 2 (“NEO sample”) consisted of the non-clinical standardization sample (N = 11,724; 64.0% women) for the German NEO-PI-R (Ostendorf & Angleitner, 2004). Participants were between 16 and 91 years old (M = 29.92, SD = 12.08). The sample was randomly divided into two halves, allowing the results obtained using the first half to be validated with the second half.

Instrument

Participants filled out the German NEO-PI-R (Ostendorf & Angleitner, 2004). The NEO-PI-R assesses the Big Five personality

General discussion

In this paper, the consistency of two response styles, NERS and ERS, was investigated across latent variables in several PISA 2006 attitude scales and the NEO-PI-R. For the majority of the participants in both instruments the response style occurred consistently independently of the trait that was being assessed. In the following, first the modeling of response styles suggested in this paper will be discussed. Then the implications of the occurrence and consistency of response styles in the two

Acknowledgments

The authors would like to express their gratitude to Dr. Fritz Ostendorf for providing the data for the standardization sample of the German NEO-PI-R. The authors further wish to thank Prof. Dr. Matthias Ziegler for his valuable comments on an earlier version of this paper.

References (38)

  • H. Bozdogan

    Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions

    Psychometrika

    (1987)
  • Buckley, J. (2009). Cross-national response styles in international educational assessments: Evidence from PISA 2006...
  • R.B. Cattell et al.

    16PF fifth edition questionnaire

    (1993)
  • P.T. Costa et al.

    Revised NEO personality inventory (NEO-PI-R) and NEO Five-Factor inventory (NEO-FFI)

    (1992)
  • M. Eid et al.

    Detecting measurement invariance in organizational surveys

    European Journal of Psychological Assessment

    (2000)
  • S.E. Embretson et al.

    Item response theory for psychologists

    (2000)
  • Frey, A., Taskinen, P., Schütte, K., Prenzel, M., Artelt, C., Baumert, J., et al. (2009). PISA 2006 Skalenhandbuch:...
  • A. Hernández et al.

    Investigating the functioning of a middle category by means of a mixed-measurement model

    Journal of Applied Psychology

    (2004)
  • T. Johnson et al.

    The relation between culture and response styles: Evidence from 19 countries

    Journal of Cross-Cultural Psychology

    (2005)
  • Cited by (66)

    • Quality control: response style modeling

      2022, International Encyclopedia of Education: Fourth Edition
    • Extremity in horizontal and vertical Likert scale format responses. Some evidence on how visual distance between response categories influences extreme responding

      2021, International Journal of Research in Marketing
      Citation Excerpt :

      Furthermore, we provide evidence that the additional ERS in vertical Likert formats often represents non-substantive variance, which leads to different estimates of important measurement model parameters (like residual terms and factor loadings) and different estimates of correlations between constructs in data obtained with different scale formats. ERS has received considerable research attention in the last few years (Aichholzer, 2013; Cabooter, Millet, Weijters, & Pandelaere, 2016; de Jong, Steenkamp, Fox, & Baumgartner, 2008; de Langhe, Puntoni, Fernandes, & van Osselaer, 2011; Jin & Wang, 2014; Morren, Gelissen, & Vermunt, 2011; Naemi, Beal, & Payne, 2009; Wetzel, Carstensen, & Böhnke, 2013). A key reason is that ERS has been found to exert a pervasive and consistent influence on item responses across different scales included in the same questionnaire (Weijters, Geuens, & Schillewaert, 2010a; Wetzel et al., 2013; Zettler, Lang, Hülsheger, & Hilbig, 2015).

    • A review on the accuracy of teacher judgments

      2021, Educational Research Review
      Citation Excerpt :

      First, students are generally asked to make use of self-report items and scales (Urhahne et al., 2013). However, these are prone to item distortion and positive self-portrayal (Burrus et al., 2011; Stanwick & Garrison, 1982; Wetzel et al., 2013). Therefore, in the motivational-affective domain, agreement between the teacher judgment and student self-report seems to be a more appropriate phrasing than accuracy (Friedrich et al., 2013; ter Laak et al., 2001).

    View all citing articles on Scopus
    View full text