Abstract
The sensitivity and the specificity of four outlier scores were studied for four different discordancy tests. The outlier scores were the Mahalanobis distance, a robust version of the Mahalanobis distance, and two measures tailored to discrete data, known as O+ and G+. The discordancy tests were Tukey’s fences (a.k.a. boxplot). Tukey’s fences with adjustment for skewness (adjusted boxplot), the generalized extreme studentized deviate (ESD), and the transformed ESD (ESD-T). Outlier scores O+ and G+ performed better than the Mahalanobis distance and its robust version. Discordancy tests ESD-T and adjusted boxplot were advocated for high specificity and ESD for high sensitivity.
References
1994). Fast very robust methods for the detection of multiple outliers. Journal of the American Statistical Association, 89, 1329–1339.
(2000). Robust diagnostic regression analysis. New York, NY: Springer.
(2004). Exploring multivariate data with the forward search. New York, NY: Springer.
(1995). A maximum likelihood approach to correlational outlier identification. Multivariate Behavioral Research, 30, 125–148.
(1994). Outliers in statistical data. New York, NY: Wiley.
(2000). LOF: Identifying density-based local outliers. In , Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (pp. 93–104). New York, NY: ACM.
(1964). An analysis of transformation. Journal of the Royal Statistical Society B, 26, 211–252.
(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
(1983). Transforming data. In , Understanding robust and exploratory data analysis (pp. 97–128). New York, NY: Wiley.
(2009). Detection and diagnosis of person misfit from patterns of summed polytomous item scores. Applied Psychological Measurement, 33, 599–619.
(2010). Person fit for test speededness normal curvatures, likelihood ratio tests and empirical Bayes estimates. Methodology, 6, 3–16.
(1992). Identifying multiple outliers in multivariate data. Journal of the Royal Statistical Society B, 54, 761–771.
(1994). A modification of a method for detection outliers in multivariate samples. Journal of the Royal Statistical Society B, 56, 393–396.
(1986). Robust statistics: The approach based on influence functions. New York, NY: Wiley.
(2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15, 199–236.
(1986). Performance of some resistant rules for outlier labeling. Journal of the American Statistical Association, 81, 991–999.
(2008). An adjusted boxplot for skewed distributions. Computational Statistics and Data Analysis, 52, 5186–5201.
(1993). How to detect and handle outliers. Milwaukee, WI: ASQC Quality Press.
(1994). Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, 149–176.
(1984). The identification of outliers in two-way contingency tables using 2 × 2 subtables. Applied Statistics, 33, 215–223.
(1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India, 2, 49–55.
(2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107–135.
(1991). A weighted Loevinger H-coefficient extending Mokken scaling to multicategory items. Kwantitatieve Methoden, 12(37), 97–117.
(1988). Evaluating outlier identification tests: Mahalanobis D squared and Comrey’s Dk . Multivariate Behavioral Research, 23, 189–202.
(1991). The influence of test characteristics on the detection of aberrant response patterns. Applied Psychological Measurement, 15, 217–226.
(1983). Percentage points for a generalized ESD many-outlier procedure. Technometrics, 25, 165–172.
(2003). Robust regression and outlier detection. New York, NY: Wiley.
(1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.
(1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633–639.
(2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.
(1984). A comparison of robust methods and detection of outliers techniques when estimating a location parameter. Communications in Statistics, Theory, and Methods, 13, 813–842.
(2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn and Bacon.
(1982). Some standard errors in item response theory. Psychometrika, 47, 397–412.
(1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
(1997). Handbook of item response theory. New York, NY: Springer.
(2009). Outlier detection in test and questionnaire data for attribute measurement. (Unpublished PhD thesis). Tilburg University, The Netherlands.
(2007). Outlier detection in test and questionnaire data. Multivariate Behavioral Research, 42, 531–555.
(2008). Outlier detection in the medical Questionnaire Rising and Sitting Down (QR&S). In , New trends in psychometrics (pp. 595–604). Tokyo, Japan: Universal Academy Press.
(2011a). Outliers in questionnaire data: Can they be detected and should they be removed? Journal of Educational and Behavioral Statistics, 36, 186–212.
(2011b). Robust Mokken scale analysis by means of the forward search algorithm for outlier detection. Multivariate Behavioral Research, 46, 58–89.
(