Visualizations of binary data: A comparative evaluation
Introduction
Data visualization techniques aim to present data to people in ways that accurately communicate information, and require minimal effort for comprehension. Good data visualizations can facilitate the efficient examination of large volumes of data, and provide the insight that allows inferences to be made from the observed relationships within the data. Because of this potential, visualizations are commonly applied to problems of data mining and exploration, information retrieval, and the analysis of tactical and strategic intelligence.
It has often been argued that a principled psychological approach to data visualization is warranted (e.g. Chernoff, 1973; Purchase, 1998; Shneiderman, 1998; Ware, 2000). Usually the emphasis is on using perceptual principles to design data displays. It is certainly true that, in order to achieve accuracy and efficiency in comprehension, and avoid distortion of the information, visualizations must be designed to be compatible with human perceptual systems. What is less often acknowledged is the role of more abstract cognitive representational principles, not directly related to perceptual processes, in developing data visualizations (although see Kosslyn, 1994; Lokuge et al., 1996). To allow for effective analysis and manipulation of data, the structure of the information conveyed also needs to be compatible with the representational requirements and preferences of human cognitive processes.
A psychological framework for data visualization that incorporates both perceptual and cognitive components is shown in Fig. 1. As originally argued in Lee and Vickers (1998), the motivation for this framework comes from viewing data visualizations as a ‘channel’ that links information held in an artificial system with human cognitive processes. To the extent that there is representational compatibility between the artificial system and human cognition, and perceptual compatibility between the visualization and human perception, an effective means of conveying information between the two systems may be established. In particular, information represented in the artificial system may be displayed using the data visualization, perceived by the human, and represented mentally. The process of the human then seeking useful patterns and structures in the visualization involves, in effect, subjecting the information to the type of inferential cognitive processes that are difficult to implement in artificial systems. Within the framework shown in Fig. 1 there is also the possibility for the human to interact with the information by taking actions that manipulate the data visualization.
On the basis of this psychological framework, Lee and Vickers (1998) suggested that data visualization techniques which perform little or no manipulation of the data before attempting to represent it graphically, with the intention of ‘letting the data speak for themselves, may be prone to error in comprehension and manipulation. On the other hand, they argued, those visualizations that restructure the information according to cognitive demands before representing it visually may communicate the information in the raw data more effectively. The primary aim of this paper is to provide a first empirical test of this idea.
Section snippets
Evaluating data visualizations
As Meyer (2000, p. 1840) points out, there are no generally accepted guidelines for the optimal display of data. Part of the problem lies in the lack of empirical evidence for or against the use of different approaches to visualization. Despite the important role that visualizations play in information interfaces, Morse et al. (2000) note that the evaluation of data visualizations is rarely undertaken. Even where evaluations have been attempted, they often have adopted one of two approaches
Four visualizations of binary data
The data used in this study are binary, with objects being defined in terms of the presence or absence of a set of properties or features. While this is clearly a restriction, binary data are an important special case for a number of reasons. There are important properties or features that only exist in binary form, such as gender. There are also many occasions when a variable of interest is a binary quantization of an underlying continuous variable. For example, the distinction between
Data sets
Four different binary data sets were constructed to test the visualization types. These related to co-starring movie actors, movie genres, countries and their produce, and animals. In essence, each data set consisted of a set of stimuli and a set of features, with each stimulus being defined in terms of the presence or absence of each of the features. Table 1 shows the animals data set as a concrete example. Rows represents animals, and columns represent animal features. Each cell contains a
Experiment II
The inferiority of the glyph visualizations found in Experiment I was not entirely unexpected. The finding is consistent with Lee and Vicker's (1998) speculation that raw data should undergo a representational analysis before it is presented. However, the accuracy difference between the common and distinctive spatial visualizations caused by the global questions was not anticipated. Both visualizations are based on a MDS representation of the similarities in the data, and both the common and
Experiment III
From the practical standpoint of recommending a data visualization in applied settings, Experiments I and II both point towards the common spatial visualization as being the best of the four considered. What neither of the experiments evaluate, however, is whether any visualization is worth using at all. It is possible that even the common spatial visualization does not usefully improve human analysis, in the sense that the raw data themselves may allow equally or more accurate, confident and
General discussion
The results of the three experiments are consistent with Lee and Vickers (1998) proposition that visualizations presenting the unprocessed raw data do not convey information as effectively as those that restructure data according to cognitive demands. The difference between the two spatial visualizations demonstrates, however, that choosing the appropriate representation is important. In fact, the distinctive spatial visualization may be considered the worst of all the visualizations evaluated,
Acknowledgments
This work was supported by the Australian Defence Science and Technology Organisation. We wish to thank Dimitris Margaritis, Thomas Minka and Chris Woodruff for their assistance and several anonymous reviewers for helpful comments.
References (73)
- et al.
Empirical studies of information visualizationa meta-analysis
International Journal of Human–Computer Studies
(2000) - et al.
Weighting common and distinctive features in perceptual and conceptual judgments
Cognitive Psychology
(1984) - et al.
Towards a methodology for developing visualizations
International Journal of Human–Computer Studies
(2000) Determining the dimensionality of multidimensional scaling representations for cognitive modeling
Journal of Mathematical Psychology
(2001)- et al.
Evaluating visualizationsusing a taxonomic guide
International Journal of Human–Computer Studies
(2000) Effective information visualizationa study of graph drawing aesthetics and algorithms
Interacting with Computers
(2000)- et al.
What is beautiful is usable
Interacting with Computers
(2000) - et al.
Mapping semantic information in virtual spacedimensions, variance and individual differences
International Journal of Human–Computer Studies
(2000) - et al.
MAPCLUSa mathematical programming approach to fitting the ADCLUS model
Psychometrika
(1980) - et al.
Bayes and Empirical Bayes Methods for Data Analysis
(2000)
Spatial, non-spatial and hybrid models for scaling
Psychometrika
Introduction to Multivariate Analysis
The use of faces to represent points in k-dimensional space graphically
Journal of the American Statistical Association
Effect on classification error of random permutations of features in representing multivariate data by faces
Journal of American Statistical Association
The earth is round (p<0.05)
American Psychologist
Drawing graphs to convey proximityan incremental arrangement method
ACM Transactions on Computer–Human Interaction
Tree Models of Similarity and Association
Multidimensional Scaling
Conception of the cognitive engineering design problem
Ergonomics
Bayesian statistical inference for psychological research
Psychological Review
Graphical Techniques for Multivariate Data
Applied Multivariate Data Analysis
Bayesian Data Analysis
On the prediction of confusion matrices from similarity judgements
Perception & Psychophysics
Two- and three-dimensional displays for aviationa theoretical and empirical comparison
International Journal of Aviation Psychology
Scientific Reasoning: The Bayesian Approach
Neededa ban on the significance test
Psychological Science
The face as data display
Human Factors
The display of multivariate informationan experimental study of an information integration task
Human Performance
Bayes factors
Journal of the American Statistical Association
Elements of Graph Design
ALCOVEan exemplar-based connectionist model of category learning
Psychological Review
Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis
Psychometrika
A simple method for generating additive clustering models with limited complexity
Machine Learning
Extending the ALCOVE model of category learning to featural stimulus domains
Psychonomic Bulletin & Review
Cited by (26)
Data-driven smart manufacturing
2018, Journal of Manufacturing SystemsCitation Excerpt :Through the above data processing efforts, understandable knowledge can be derived from a large quantity of dynamic and ambiguous raw data [35]. Visualization is intended to clearly convey and communicate information through graphical means, enabling end users to comprehend data in a much more explicit fashion [10]. The most commonly used visualization techniques include statement, chart, diagrams, graphs, and virtual reality [36].
An assessment of email and spontaneous dialog visualizations
2012, International Journal of Human Computer StudiesCitation Excerpt :A display that presents the information in a way that is consistent with their expectations of content similarity should be easier for a user to navigate on this type of task. We considered this functional approach to be a better way to assess the effectiveness of the layout than simply relying on subjective judgements of the displays, which are unreliable indicators of effectiveness particularly in the evaluation of interfaces (Frøkjær et al., 2000; Wu et al., 2001; Lee et al., 2003). Different types of questions were included to allow examination of whether the visualization techniques benefited specific types of document search tasks.
An application of the V-system to the clustering of Chernoff faces
2010, Computers and Graphics (Pergamon)Citation Excerpt :It provides people with a quick qualitative understanding of the information. Data visualizations play an important role in a wide range of applied problems, including data exploration and mining, information retrieval, and intelligence analysis [1]. Data are usually represented as vectors in a multidimensional space, where each dimension represents a distinct attribute describing the data.
Visualizing social network concepts
2010, Decision Support SystemsAn empirical evaluation of four data visualization techniques for displaying short news text similarities
2007, International Journal of Human Computer StudiesCourseVis: A graphical student monitoring tool for supporting instructors in web-based distance courses
2007, International Journal of Human Computer Studies