Ascertaining the validity of individual protocols from Web-based personality inventories☆
Introduction
World Wide Web-based personality measures have become increasingly popular in recent years due to the ease of administering, scoring, and providing feedback over the Internet. Web-based measures allow researchers to collect data, inexpensively, from large numbers of individuals around the world in a manner that is convenient to both researchers and participants. With this emerging technology, two important questions about Web-based measures have been raised. The first is the degree to which established paper-and-pencil personality measures retain their reliability and validity after porting them to the Web (Kraut et al., 2004). Although this question should be answered empirically for each personality measure in question, studies to date suggest that personality measures retain their psychometric properties on the Web (Buchanan et al., in press, Gosling et al., 2004).
This article addresses a second kind of validity concern for Web-based measures, protocol validity (Kurtz & Parrish, 2001). The term protocol validity refers to whether an individual protocol is interpretable via the standard algorithms for scoring and assigning meaning. For decades psychologists have realized that even a well-validated personality measure can generate uninterpretable data in individual cases. The introduction of this article first reviews what we know about the impact of three major influences on the protocol validity of paper-and-pencil measures: linguistic incompetence, careless inattentiveness, and deliberate misrepresentation. Next, the introduction discusses why these threats to protocol validity might be more likely to affect Web-based measures than paper-and-pencil measures. The empirical portion of this article provides estimates of the incidence of protocol invalidity for one particular Web-based personality inventory, and compares these estimates to similar data for paper-and-pencil inventories. Finally, the discussion reflects on the significance of protocol invalidity for Web-based measures and suggests strategies for preventing, detecting, and handling invalid protocols.
Section snippets
Three major threats to protocol validity
Researchers have identified three major threats to the validity of individual protocols. These threats can affect protocol validity, regardless of the mode of presentation (paper-and-pencil or Web). The first is linguistic incompetence. A research participant who has a limited vocabulary, poor verbal comprehension, an idiosyncratic way of interpreting item meaning, and/or an inability to appreciate the impact of language on an audience will be unable to produce a valid protocol, even for a
Incidence and detection of invalid protocols for paper-and-pencil inventories
Many of the major personality inventories, e.g., the California Psychological Inventory (CPI; Gough & Bradley, 1996), Hogan Personality Inventory (HPI; Hogan & Hogan, 1992), Multidimensional Personality Questionnaire (MPQ, Tellegen, in press), and Minnesota Multiphasic Personality Inventory (MMPI; Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989), have built-in protocol validity scales to detect cases in which individuals are not attending to or failing to understand item meanings or are
Linguistic incompetence as a special problem for Web-based measures
Because unregulated Web-based personality measures are readily accessible to non-native speakers from all backgrounds around the world, linguistic competency may be a greater concern for Web-based measures than for paper-and-pencil measures administered to the native-speaking college students often used in research. Non-native speakers may have difficulty with both the literal meanings of items and the more subtle sociolinguistic trait implications of items (Johnson, 1997a). At the time
Summary of the present research plan
The most direct way of assessing protocol validity would be to compare the results of testing (trait level scores, narrative descriptions) with another source of information about personality in which we have confidence (e.g., averaged ratings or the consensus of descriptions from knowledgeable acquaintances—see Hofstee, 1994). Gathering such non-self-report criteria validly over the Internet while protecting anonymity is logistically complex, and ongoing research toward that end is still in
Participants
Before screening for repeat participation, the sample consisted of 23,994 protocols (8764 male, 15,229 female, 1 unknown) from individuals who completed, anonymously, a Web-based version of the IPIP-NEO (Goldberg, 1999; described below). Reported ages ranged from 10 to 99, with a mean age of 26.2 and SD of 10.8 years. Participants were not actively recruited; they discovered the Web site on their own or by word-of-mouth. Protocols used in the present analyses were collected between August 6,
Duplicate protocols
The SPSS LAG function revealed 747 protocols (sorted first by time and then by nickname) in which all 300 responses were identical to the previous protocol. Also identified were an additional 34 cases in which the first 120 responses were identical. A few additional protocols contained nearly all identical response (e.g., four protocols contained 299 identical responses, one contained 298 identical responses). Protocols with 298, 299, or 300 identical responses to 300 items (or 118, 119, or 120
Discussion
The present study investigated the degree to which the unique characteristics of a Web-based personality inventory produced uninterpretable protocols. It was hypothesized that the ease of accessing a personality inventory on the Web and the reduced accountability from anonymity might lead to a higher incidence (compared to paper-and-pencil inventories) of four types of problematic protocols. These problems are as follows: (a) the submission of duplicate protocols (some of which might be
Conclusions
Of more substance and practical importance than the specter of radical misrepresentation on Web-based personality measures are issues such as detecting multiple participation and protocols that are completed too carelessly or inattentively to be subjected to normal interpretation. The incidence of: (a) repeat participation, (b) selecting the same response category repeatedly without reading the item, and (c) skipping items all exceed the levels found in paper-and-pencil measures. Nonetheless,
Acknowledgments
Some of these findings were first presented in an invited talk to the Annual Joint Bielefeld-Groningen Personality Research Group meeting, University of Groningen, The Netherlands, May 9, 2001. I thank Alois Angleitner, Wim Hofstee, Karen van Oudenhoven-van der Zee, Frank Spinath, and Heike Wolf for their feedback and suggestions at that meeting. Some of the research described in this article was conducted while I was on sabbatical at the Oregon Research Institute, supported by a Research
References (55)
Units of analysis for description and explanation in psychology
Measurement and control of response bias
In defense of traits
- Buchanan, T., Johnson, J. A., & Goldberg, L. R. (in press). Implementing a five-factor personality inventory for use on...
- et al.
Minnesota Multiphasic Personality Inventory2 (MMPI-2): Manual for administration and scoring
(1989) - et al.
Personality: Individual differences and clinical assessment
Annual Review of Psychology
(1996) The scree test for the number of factors
Multivariate Behavioral Research
(1966)- et al.
Revised NEO Personality Inventory (NEO PI-RTM) and NEO Five-Factor Inventory (NEO-FFI) professional manual
(1992) - et al.
Stability and change in personality assessment: The revised NEO personality inventory in the year 2000
Journal of Personality Assessment
(1997)
A study of faking behavior on a forced choice self-description checklist
Personnel Psychology
How to conduct behavioral research over the Internet
The development of markers for the Big-Five factor structure
Psychological Assessment
A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models
The prediction of semantic consistency in self-descriptions: Characteristics of persons and of terms that affect the consistency of responses to synonym and antonym pairs
Journal of Personality and Social Psychology
Should we trust web based studies? A comparative analysis of six preconceptions about Internet questionnaires
American Psychologist
CPI manual: Third edition
Who should own the definition of personality?
European Journal of Personality
Integration of the big five and circumplex approaches to trait structure
Journal of Personality and Social Psychology
Personality psychology: Back to basics
Hogan Personality Inventory manual
Jackson Vocational Interest Survey manual
The big five trait taxonomy: History, measurement, and theoretical perspectives
Cited by (368)
Narcissism and psychological needs for social status, power, and belonging
2023, Personality and Individual DifferencesWhich tests should be administered first, ability or non-ability? The effect of test order on careless responding
2023, Personality and Individual DifferencesUsing Response Times for Joint Modeling of Careless Responding and Attentive Response Styles
2024, Journal of Educational and Behavioral StatisticsEmotion Identification for Self and Other Associated with Callous-Unemotional Traits and Sex Differences in Early Adolescents
2024, Child Psychiatry and Human DevelopmentOpen science perspectives on machine learning for the identification of careless responding: A new hope or phantom menace?
2024, Social and Personality Psychology Compass
- ☆
Prepared for the special issue of the Journal of Research in Personality 39 (1), February 2005, containing the proceedings of the 2004 meeting of the Association for Research in Personality.