Visualizations of binary data: A comparative evaluation

https://doi.org/10.1016/S1071-5819(03)00082-XGet rights and content

Abstract

Data visualization has the potential to assist humans in analysing and comprehending large volumes of data, and to detect patterns, clusters and outliers that are not obvious using non-graphical forms of presentation. For this reason, data visualizations have an important role to play in a diverse range of applied problems, including data exploration and mining, information retrieval, and intelligence analysis. Unfortunately, while various different approaches are available for data visualization, there have been few rigorous evaluations of their effectiveness. This paper presents the results of three controlled experiments comparing the ability of four different visualization approaches to help people answer meaningful questions for binary data sets. Two of these visualizations, Chernoff faces and star glyphs, represent objects using simple icon-like displays. The other two visualizations use a spatial arrangement of the objects, based on a model of human mental representation, where more similar objects are placed nearer each other. One of these spatial displays uses a common features model of similarity, while the other uses a distinctive features model. The first experiment finds that both glyph visualizations lead to slow, inaccurate answers being given with low confidence, while the faster and more confident answers for spatial visualizations are only accurate when the common features similarity model is used. The second experiment, which considers only the spatial visualizations, supports this finding, with the common features approach again producing more accurate answers. The third experiment measures human performance using the raw data in tabular form, and so allows the usefulness of visualizations in facilitating human performance to be assessed. This experiment confirms that people are faster, more confident and more accurate when an appropriate visualization of the data is made available.

Introduction

Data visualization techniques aim to present data to people in ways that accurately communicate information, and require minimal effort for comprehension. Good data visualizations can facilitate the efficient examination of large volumes of data, and provide the insight that allows inferences to be made from the observed relationships within the data. Because of this potential, visualizations are commonly applied to problems of data mining and exploration, information retrieval, and the analysis of tactical and strategic intelligence.

It has often been argued that a principled psychological approach to data visualization is warranted (e.g. Chernoff, 1973; Purchase, 1998; Shneiderman, 1998; Ware, 2000). Usually the emphasis is on using perceptual principles to design data displays. It is certainly true that, in order to achieve accuracy and efficiency in comprehension, and avoid distortion of the information, visualizations must be designed to be compatible with human perceptual systems. What is less often acknowledged is the role of more abstract cognitive representational principles, not directly related to perceptual processes, in developing data visualizations (although see Kosslyn, 1994; Lokuge et al., 1996). To allow for effective analysis and manipulation of data, the structure of the information conveyed also needs to be compatible with the representational requirements and preferences of human cognitive processes.

A psychological framework for data visualization that incorporates both perceptual and cognitive components is shown in Fig. 1. As originally argued in Lee and Vickers (1998), the motivation for this framework comes from viewing data visualizations as a ‘channel’ that links information held in an artificial system with human cognitive processes. To the extent that there is representational compatibility between the artificial system and human cognition, and perceptual compatibility between the visualization and human perception, an effective means of conveying information between the two systems may be established. In particular, information represented in the artificial system may be displayed using the data visualization, perceived by the human, and represented mentally. The process of the human then seeking useful patterns and structures in the visualization involves, in effect, subjecting the information to the type of inferential cognitive processes that are difficult to implement in artificial systems. Within the framework shown in Fig. 1 there is also the possibility for the human to interact with the information by taking actions that manipulate the data visualization.

On the basis of this psychological framework, Lee and Vickers (1998) suggested that data visualization techniques which perform little or no manipulation of the data before attempting to represent it graphically, with the intention of ‘letting the data speak for themselves, may be prone to error in comprehension and manipulation. On the other hand, they argued, those visualizations that restructure the information according to cognitive demands before representing it visually may communicate the information in the raw data more effectively. The primary aim of this paper is to provide a first empirical test of this idea.

Section snippets

Evaluating data visualizations

As Meyer (2000, p. 1840) points out, there are no generally accepted guidelines for the optimal display of data. Part of the problem lies in the lack of empirical evidence for or against the use of different approaches to visualization. Despite the important role that visualizations play in information interfaces, Morse et al. (2000) note that the evaluation of data visualizations is rarely undertaken. Even where evaluations have been attempted, they often have adopted one of two approaches

Four visualizations of binary data

The data used in this study are binary, with objects being defined in terms of the presence or absence of a set of properties or features. While this is clearly a restriction, binary data are an important special case for a number of reasons. There are important properties or features that only exist in binary form, such as gender. There are also many occasions when a variable of interest is a binary quantization of an underlying continuous variable. For example, the distinction between

Data sets

Four different binary data sets were constructed to test the visualization types. These related to co-starring movie actors, movie genres, countries and their produce, and animals. In essence, each data set consisted of a set of stimuli and a set of features, with each stimulus being defined in terms of the presence or absence of each of the features. Table 1 shows the animals data set as a concrete example. Rows represents animals, and columns represent animal features. Each cell contains a

Experiment II

The inferiority of the glyph visualizations found in Experiment I was not entirely unexpected. The finding is consistent with Lee and Vicker's (1998) speculation that raw data should undergo a representational analysis before it is presented. However, the accuracy difference between the common and distinctive spatial visualizations caused by the global questions was not anticipated. Both visualizations are based on a MDS representation of the similarities in the data, and both the common and

Experiment III

From the practical standpoint of recommending a data visualization in applied settings, Experiments I and II both point towards the common spatial visualization as being the best of the four considered. What neither of the experiments evaluate, however, is whether any visualization is worth using at all. It is possible that even the common spatial visualization does not usefully improve human analysis, in the sense that the raw data themselves may allow equally or more accurate, confident and

General discussion

The results of the three experiments are consistent with Lee and Vickers (1998) proposition that visualizations presenting the unprocessed raw data do not convey information as effectively as those that restructure data according to cognitive demands. The difference between the two spatial visualizations demonstrates, however, that choosing the appropriate representation is important. In fact, the distinctive spatial visualization may be considered the worst of all the visualizations evaluated,

Acknowledgments

This work was supported by the Australian Defence Science and Technology Organisation. We wish to thank Dimitris Margaritis, Thomas Minka and Chris Woodruff for their assistance and several anonymous reviewers for helpful comments.

References (73)

  • J.D. Carroll

    Spatial, non-spatial and hybrid models for scaling

    Psychometrika

    (1976)
  • C. Chatfield et al.

    Introduction to Multivariate Analysis

    (1980)
  • H. Chernoff

    The use of faces to represent points in k-dimensional space graphically

    Journal of the American Statistical Association

    (1973)
  • H. Chernoff et al.

    Effect on classification error of random permutations of features in representing multivariate data by faces

    Journal of American Statistical Association

    (1975)
  • J. Cohen

    The earth is round (p<0.05)

    American Psychologist

    (1994)
  • J.D. Cohen

    Drawing graphs to convey proximityan incremental arrangement method

    ACM Transactions on Computer–Human Interaction

    (1997)
  • J.E. Corter

    Tree Models of Similarity and Association

    (1996)
  • T.F. Cox et al.

    Multidimensional Scaling

    (1994)
  • J. Dowell et al.

    Conception of the cognitive engineering design problem

    Ergonomics

    (1998)
  • W. Edwards et al.

    Bayesian statistical inference for psychological research

    Psychological Review

    (1963)
  • B. Everitt

    Graphical Techniques for Multivariate Data

    (1978)
  • B.S. Everitt et al.

    Applied Multivariate Data Analysis

    (1991)
  • A. Gelman et al.

    Bayesian Data Analysis

    (1995)
  • D.J. Getty et al.

    On the prediction of confusion matrices from similarity judgements

    Perception & Psychophysics

    (1979)
  • I.D. Haskell et al.

    Two- and three-dimensional displays for aviationa theoretical and empirical comparison

    International Journal of Aviation Psychology

    (1993)
  • C. Howson et al.

    Scientific Reasoning: The Bayesian Approach

    (1993)
  • J.E. Hunter

    Neededa ban on the significance test

    Psychological Science

    (1997)
  • R.J.K. Jacob et al.

    The face as data display

    Human Factors

    (1976)
  • P.M. Jones et al.

    The display of multivariate informationan experimental study of an information integration task

    Human Performance

    (1990)
  • R.E. Kass et al.

    Bayes factors

    Journal of the American Statistical Association

    (1995)
  • S.M. Kosslyn

    Elements of Graph Design

    (1994)
  • J.K. Kruschke

    ALCOVEan exemplar-based connectionist model of category learning

    Psychological Review

    (1992)
  • J.B. Kruskal

    Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis

    Psychometrika

    (1964)
  • M.D. Lee

    A simple method for generating additive clustering models with limited complexity

    Machine Learning

    (2002)
  • M.D. Lee et al.

    Extending the ALCOVE model of category learning to featural stimulus domains

    Psychonomic Bulletin & Review

    (2002)
  • Lee, M.D., Vickers, D., 1998. Psychological approaches to data visualisation. DSTO Research Report...
  • Cited by (26)

    • Data-driven smart manufacturing

      2018, Journal of Manufacturing Systems
      Citation Excerpt :

      Through the above data processing efforts, understandable knowledge can be derived from a large quantity of dynamic and ambiguous raw data [35]. Visualization is intended to clearly convey and communicate information through graphical means, enabling end users to comprehend data in a much more explicit fashion [10]. The most commonly used visualization techniques include statement, chart, diagrams, graphs, and virtual reality [36].

    • An assessment of email and spontaneous dialog visualizations

      2012, International Journal of Human Computer Studies
      Citation Excerpt :

      A display that presents the information in a way that is consistent with their expectations of content similarity should be easier for a user to navigate on this type of task. We considered this functional approach to be a better way to assess the effectiveness of the layout than simply relying on subjective judgements of the displays, which are unreliable indicators of effectiveness particularly in the evaluation of interfaces (Frøkjær et al., 2000; Wu et al., 2001; Lee et al., 2003). Different types of questions were included to allow examination of whether the visualization techniques benefited specific types of document search tasks.

    • An application of the V-system to the clustering of Chernoff faces

      2010, Computers and Graphics (Pergamon)
      Citation Excerpt :

      It provides people with a quick qualitative understanding of the information. Data visualizations play an important role in a wide range of applied problems, including data exploration and mining, information retrieval, and intelligence analysis [1]. Data are usually represented as vectors in a multidimensional space, where each dimension represents a distinct attribute describing the data.

    • Visualizing social network concepts

      2010, Decision Support Systems
    View all citing articles on Scopus
    View full text