Caring about carelessness: Participant inattention and its effects on research

https://doi.org/10.1016/j.jrp.2013.09.008Get rights and content

Highlights

  • We examined the effects of inattentive responding on data quality.

  • Multiple indicators formed latent classes of inattention replicating prior work.

  • Across multiple samples, 3–9% of respondents were highly inattentive.

  • Inattentive responding obscured regression and experimental results.

  • Screening out inattentive respondents improved statistical power.

Abstract

The current studies examined the adverse effects of inattentive responding on compliance with study tasks, data quality, correlational analyses, experimental manipulations, and statistical power. Results suggested that 3–9% of respondents engaged in highly inattentive responding, forming latent classes consistent with prior work that converged across existing indices (e.g., long-string index, multivariate outliers, even–odd consistency, psychometric synonyms and antonyms) and new measures of inattention (the Attentive Responding Scale and the Directed Questions Scale). Inattentive respondents provided self-report data of markedly poorer quality, sufficient to obscure meaningful regression results as well as the effects of experimental manipulations. Screening out inattentive respondents improved statistical power, helping to mitigate the notable drops in power and estimated effect sizes caused by inattention.

Introduction

An oft-neglected issue underlying much research is that not all respondents pay sufficient attention when completing self-report measures. Such responding could introduce error into a dataset, potentially decreasing power and obscuring results. Inattention is sometimes blatant and easy to address (e.g., removing participants from analyses if they exhibit suspiciously fast reaction times or below-chance performance on a task). This logic, however, is only rarely applied to inattentive responding on self-report scales as this form of inattention is more subtle and therefore more difficult to measure. The current studies were designed to identify excessive inattention using a multi-method approach, exploring its impact on: (1) compliance with common study tasks, (2) quality of self-report data, (3) correlational and experimental analyses and (4) statistical power, in order to determine the potential scope of this problem and the degree to which addressing it might improve statistical analyses. Estimated rates of inattention in the existing literature have varied widely, from 3% to 46% of respondents (e.g., Berry et al., 1992, Johnson, 2005, Meade and Craig, 2012, Oppenheimer et al., 2009). In part, this wide range of estimates is due to a lack of clarity on how best to measure inattentive responding and on what thresholds correspond to unacceptably error-ridden data. The present research used indicators of non-compliance, data quality, and statistical power as criteria for comparing methods of measuring inattention and establishing concrete, practical thresholds for researchers to use to screen their data.

It is common for a small portion of participants to exhibit poor attention and effort in research. For example, subjects with excessively short reaction time latencies on implicit measures like the Implicit Association Test (IAT; Greenwald, Nosek, & Banaji, 2003) are routinely excluded from analyses. Although such practices are common in research using reaction time paradigms and experimental manipulations, this logic has not typically been extended to research utilizing self-report methods. This discrepancy is likely not because researchers believe that experimental manipulations or reaction time measures are more prone to non-compliance than self-report measures; rather, it is simply easier to identify non-compliance on such tasks. As researchers typically do not screen for inattention on self-report scales, the prevalence and impact of such problematic responding is largely unknown.

Although a number of constructs are occasionally lumped together under the heading of validity scales (e.g., socially desirable responding, faking good, faking bad, random responding), the present research focuses on a specific form of invalidity: inattention when completing self-report measures. This form of inattention is distinct from other types of invalidity. For instance, the response sets of faking good, faking bad and social desirability imply a motivation to present oneself in a particular manner. Ironically, these forms of invalidity may be negatively related to inattentive responding because presenting oneself in a particular manner requires carefully attending to questions (Meade & Craig, 2012). In contrast, inattentive responding corresponds to a lack of motivation to present oneself in a certain manner, and should therefore contribute little more than error variance to analyses. Extreme levels of inattention conceptualized in this manner are consistent with the extremely inattentive latent class identified by Meade and Craig (2012) as comprising approximately 9% of an undergraduate sample, and with what Nichols, Greene, and Schmolck (1989) called “content nonresponsivity.” Although inattention could be correlated with individual differences, in the current study we view inattentive responding as a proximal behavior enacted during the completion of research studies. We therefore conceptualize it as more of a transitory (state) phenomenon, allowing for the possibility that the same individual might provide high levels of attention in one study (e.g., a short and particularly interesting study) but insufficient levels of attention in other studies.

Much of the work examining inattentive responding on self-report measures has been conducted in the development of clinical assessment batteries like the Personality Assessment Inventory (PAI; Morey, 1991) and the Minnesota Multiphasic Personality Inventory (MMPI; Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). The length of these inventories (typically containing several hundred items) demands high levels of sustained attention from respondents, necessitating the development of scales designed to identify problematic responding. Two types of validity scales within these clinical batteries (infrequency and inconsistency scales) assess inattentive responding, with a focus on identifying extreme (and therefore problematic) levels. Infrequency scales (e.g., the infrequency scale of the PAI, the “bogus” items of Meade & Craig, 2012) are made up of items that elicit nearly identical (highly skewed) responses from most respondents (e.g., “I have been to every country in the world”). Respondents receive higher scores on infrequency scales for each increasingly unlikely response across the set of items, and cut-scores are used to identify inattentive responding excessive enough to yield an invalid protocol. Inconsistency scales (e.g., the VRIN scale of the MMPI, the inconsistency scale of the PAI) are made up of pairs of items with nearly identical content that are presented in opposing halves of a survey (e.g., “I am an active person” paired with “I have an active lifestyle”). Absolute differences in responses are summed across the item pairs so that higher scores reflect more inconsistent responding. Thus, although these scales make use of self-report items, they do not ask subjects to report on their own levels of attention but instead use their responding behavior on a set of heterogeneous items to indirectly assess their attentiveness to item content. Such scales are effective at distinguishing randomly generated data from actual data (e.g., Bruehl et al., 1998, Pinsoneault, 2005), but have only rarely been implemented outside of the clinical instruments for which they were designed (e.g., Saavedra, Chapman, & Rogge, 2010).

More recently, Oppenheimer and colleagues (2009) developed the instructional manipulation check (IMC), a single item measure presented as a separate page in an online survey that uses critical instructions embedded at the end of a lengthy paragraph to assess participants’ attentiveness to instructions. Although the IMC moderated the effectiveness of text-based manipulations, its usefulness is minimized by the fact that it only measures one form of inattention (skipping instructions) which is relatively common and therefore identifies a high proportion of participants (35–45%) as inattentive. As Oppenheimer and colleagues note, eliminating that many participants could potentially reduce power and bias results, leading them to suggest using the IMC as an intervention to encourage attentiveness rather than as a measure of inattention. Despite that suggestion, published studies using the IMC have simply excluded high proportions of inattentive respondents (e.g., Simmons & Nelson, 2006, Study 12). This highlights the need for a measure of attention with greater variability and specificity, potentially allowing for a smaller proportion of participants to be identified as excessively inattentive.

In addition to adding measures in order to identify highly inattentive respondents, researchers can also calculate post hoc indices of inattention using virtually any body of self-report items after data have been collected. Meade and Craig (2012) examined the convergence of several such indices, including highly correlated item-pairs (psychometric synonyms and antonyms assessing the consistency of responding), even–odd consistency (split-half reliabilities measured within respondents across scales), multivariate outlier distances (assessing statistically unlikely response patterns), long string analyses (measuring the tendency to choose identical answers in blocks of items), and time spent on the survey. Latent profile analyses of these indices along with an infrequency scale identified two main types of inattention: one reflecting more general inattentive responding and another marked by subjects frequently selecting the same answer for entire blocks of questions and consequently completing the survey in suspiciously short periods of time. Their analyses suggested an overall frequency of 10–12% of inattentive respondents in a sample of undergraduates.

Careless or inattentive responding might act as a source of measurement error that could obscure meaningful results. Identifying and removing inattentive respondents before data analysis could therefore offer a relatively easy method of decreasing error variance and increasing statistical power in research using self-report measures. Consistent with this assertion, Oppenheimer et al. (2009) found that previously reported experimental results with two different manipulations did not replicate in a group of participants identified as inattentive, although they did replicate among attentive participants. Similarly, inattentive responding can adversely affect correlational and factor analyses (Johnson, 2005, Meade and Craig, 2012, Woods, 2006), moderating findings even to the point of generating spurious results.

In contrast to these findings, Piedmont, McCrae, Riemann, and Angleitner (2000) challenged the utility of validity scales as a set, demonstrating that scores on a diverse array of 13 different validity measures (including measures of inattention as well as distinct constructs, such as social desirability, faking good, faking bad) failed to moderate substantive results when formed into a heterogeneous composite. The methodological decisions underlying their non-significant results highlight four fundamental elements of the current approach. First, the current work focused exclusively on inattentive responding, as it is not clear that all forms of invalid responding would have equivalent effects on data quality and statistical power. Consequently, forming composites of disparate validity indices could obscure results. Second, the current work adopted a taxonometric approach focused on identifying only the most extreme inattentive respondents (whose inattention would have the most pernicious effects on data quality). This stands in contrast to prior work that either modeled inattentive responding as a continuous variable or that identified large proportions of the sample as problematic (e.g., Oppenheimer et al., 2009, Piedmont et al., 2000), potentially underestimating the effects of inattention on data quality (as only extremely high levels of inattention are likely to introduce sufficient randomness to obscure meaningful results). Third, the current work collected sufficiently large samples so that a reasonable number (e.g., 30–40) of excessively inattentive respondents could be identified and analyzed separately. Finally, the current work addressed a critical gap in the literature by specifically examining the practical utility of screening out extreme inattention (examining its effects on statistical power). Removing inattentive respondents reduces sample size, and prior research has not yet demonstrated that this data cleaning approach increases statistical power.

The studies presented in this paper sought to extend prior work by exploring the correlates of inattention and examining the effects of inattention on compliance with study tasks, data quality, correlational analyses, experimental manipulations, and statistical power. Toward those ends, the studies also sought to develop effective methods of identifying extreme inattention, establishing practical thresholds of unacceptable inattention based on indices of non-compliance, data quality, and statistical power. Analyses across all three samples: (1) examined personality and motivational correlates of inattention, (2) evaluated the convergent validity of various indicators of inattention and non-compliance, (3) replicated the latent classes of inattention from Meade and Craig (2012), (4) examined the impact of inattention on data quality (i.e., internal consistency of scales, substantive correlational and experimental results) and statistical power, and (5) examined the potential gains in power afforded by the use of various inattention indices when cleaning data.

Section snippets

Multi-method approach

One goal of Study 1 was to augment recent work investigating the prevalence of inattentive responding (Meade & Craig, 2012), taking a multi-modal approach that included assessing self-reported responding styles, previously published or new indices of inattention (multivariate distances, long-string indices, psychometric synonyms and antonyms, even–odd consistency, infrequency and inconsistency scales, directed questions), and 3 indicators of compliance with tasks common to psychological

Study 2

Taken as a set, the results of Study 1 suggested that the inattentive respondents (identified by the multivariate latent profile analyses, the ARS, or the DQS) demonstrated low levels of effort and compliance with study tasks and seemed to provide poor quality self-report data, even to the point of obscuring regression results found in the attentive respondents. Those results not only replicated the multivariate work of Meade and Craig (2012), but also helped to quantify the impact of

Effects of inattention on statistical power

The results of Study 2 replicated the findings of Study 1, underscoring the potential effects of inattention on correlational analyses. Study 3 sought to extend this work by examining the effects of inattention on experimental manipulations and the associated power to detect significant effects for those manipulations. Experiments often include relatively subtle manipulations involving alternate forms of instructions, primes, pictures or other stimuli. Inattentive participants may not be

Study 4

In the previous studies, the various indices of inattention consistently identified approximately 3–9% of the sample as providing highly inattentive responses. Other estimates of inattention rates in the existing literature have varied widely (e.g., 3.5% in Johnson, 2005; 10–12% in Meade & Craig, 2012; 35–46% in Oppenheimer et al., 2009). In part, this wide range of estimates could be due to the diverse and often inconsistent ways that inattention has been conceptualized and measured across

General discussion

Not all participants read each and every item on a scale carefully before providing a response. Although inattention is likely to add error variance to studies, it has only recently begun to receive more direct methodological attention within the literature (e.g., Meade & Craig, 2012). Furthermore, despite recent work (e.g., Oppenheimer et al., 2009), its full impact on data quality and statistical power have yet to be assessed. Across 4 studies, we examined the effects of inattention on basic

Acknowledgments

We thank Soonhee Lee and Elizabeth Baker-Davidson, Maria Saavedra-Finger, Amy Rodrigues, Christine Walsh, Amanda Shaw, and Silvia Marin for helping collect preliminary data on the item pool. We also thank the participants who completed our studies and Harry Reis for his comments on earlier versions of the manuscript.

References (29)

  • J.A. Johnson

    Ascertaining the validity of individual protocols from web-based personality inventories

    Journal of Research in Personality

    (2005)
  • D.M. Oppenheimer et al.

    Instructional manipulation checks: Detecting satisficing to increase statistical power

    Journal of Experimental Social Psychology

    (2009)
  • R.W. Robins et al.

    Personality correlates of self-esteem

    Journal of Research in Personality

    (2001)
  • S. Bruehl et al.

    The variable responding scale for detection of random responding in the Multidimensional Pain Inventory

    Psychological Assessment

    (1998)
  • D.T.R. Berry et al.

    MMPI-2 random responding indices: Validation using a self-report methodology

    Psychological Assessment

    (1992)
  • J.N. Butcher et al.

    Minnesota Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring

    (1989)
  • J.M. Digman

    Personality structure: Emergence of the five-factor model

    Annual Review of Psychology

    (1990)
  • P. Edwards et al.

    Increasing response rates to postal questionnaires: Systematic review

    British Medical Journal

    (2002)
  • J.L. Funk et al.

    Testing the ruler with item response theory: Increasing precision of measurement for relationship satisfaction with the Couples Satisfaction Index

    Journal of Family Psychology

    (2007)
  • W.L. Gardner et al.

    “I” value freedom, but “we” value relationships: Self-construal priming mirrors cultural differences in judgment

    Psychological Science

    (1999)
  • A.G. Greenwald et al.

    Understanding and using the Implicit Association Test: I. An improved scoring algorithm

    Journal of Personality and Social Psychology

    (2003)
  • J.J. Gross et al.

    Emotion elicitation using films

    Cognition & Emotion

    (1995)
  • J.L. Huang et al.

    Detecting and deterring insufficient effort responding to surveys

    Journal of Business and Psychology

    (2012)
  • O.P. John et al.

    The Big Five Inventory—Versions 4a and 54

    (1991)
  • Cited by (501)

    View all citing articles on Scopus
    View full text