Abstract
The concept of perceptual independence is ubiquitous in psychology. It addresses the question of whether two (or more) dimensions are perceived independently. Several authors have proposed perceptual independence (or its lack thereof) as a viable measure of holistic face perception (Loftus, Oberg, & Dillon, Psychological Review 111:835–863, 2004; Wenger & Ingvalson, Learning, Memory, and Cognition 28:872–892, 2002). According to this notion, the processing of facial features occurs in an interactive manner. Here, I examine this idea from the perspective of two theories of perceptual independence: the multivariate uncertainty analysis (MUA; Garner & Morton, Definitions, models, and experimental paradigms. Psychological Bulletin 72:233–259, 1969), and the general recognition theory (GRT; Ashby & Townsend, Psychological Review 93:154–179, 1986). The goals of the study were to (1) introduce the MUA, (2) examine various possible relations between MUA and GRT using numerical simulations, and (3) apply the MUA to two consensual markers of holistic face perception—recognition of facial features (Farah, Wilson, Drain, & Tanaka, Psychological Review 105:482–498, 1998) and the composite face effect (Young, Hellawell, & Hay, Perception 16:747–759, 1987). The results suggest that facial holism is generated by violations of several types of perceptual independence. They highlight the important theoretical role played by converging operations in the study of holistic face perception.
Similar content being viewed by others
When presented with a face, one often gets the impression that the eyes, nose, and mouth coalesce into a unified whole. The facial features are glued together so that observers find it difficult to dissect the face into its constituent components. It is this experience that makes faces a popular textbook case of holistic or Gestalt perception. The example illustrates the notion that interactions or interdependencies among sources of information (e.g., facial features) govern perception (Diamond & Carey, 1986; Farah, Wilson, Drain, & Tanaka, 1998; Galton, 1879; Young, Hellawell, & Hay, 1987).
The idea that holistic processing reflects deviations from some type of independence is shared by many face perception models. For instance, facial holism has been defined in terms of violations of: statistical independence (Ellison & Massaro, 1997; Loftus, Oberg, & Dillon, 2004; Macho & Leder, 1998), geometrical independence (Sergent, 1984; Tversky & Krantz, 1969), independence in processing rate (Bradshaw & Wallace, 1971; Wenger & Townsend, 2006), and perceptual independence and separability (Thomas, 2001a; Wenger & Ingvalson, 2002). Some of these studies have recorded violations of independence, whereas others detected no violations. This apparent inconsistency implies that independence is not a unitary concept and that the many phenomena that go under the name of “holistic face perception” might be governed not by one, but by various mechanisms.
The present work examined the notion of holistic face perception from the view point of perceptual independence models (Ashby & Townsend, 1986; Garner & Morton, 1969). The article is structured as follows. In the first section, several conceptual distinctions are made regarding the theoretical construct of perceptual independence. In a second section, two theories of perceptual independence are presented. In a third part, holistic face perception is couched in the different language of each of these theories. In a fourth section, simulations are performed in order to explore possible relations between the two theories. In a fifth and final section, one of the theories is applied to two classical phenomena of holistic face perception.
Perceptual independence
The pioneering theoretical work on perceptual independence has been accomplished by Shepard (1964), Garner (1974), and Townsend (Ashby & Townsend, 1986; Townsend, Hu, & Evans, 1984; Townsend & Spencer-Smith, 2004). A seminal paper by Garner and Morton (1969) outlined many of the pertinent topics and theoretical challenges. The prototypical task considered by these authors is that of processing two sources of information simultaneously, as when perceiving two parts of the same object or listening to two messages with each presented to a different ear. The problem of perceptual independence boils down to the following question: “Does the organism process the two or more sources or channels of information independently?” (p. 234).
Garner and Morton (1969) made several important theoretical distinctions. First, they suggested that normative models should play a pivotal role in the investigation of perceptual independence. The fit between a normative model and the behavioral data is meaningful insofar as the model captures psychological constructs that are made explicit by the researcher. Second, Garner and Morton argued that perceptual independence may not be a unitary concept but, rather, a nomenclature referring to a host of theoretical and operational definitions. For example, there are two major types of measures of perceptual independence. The first is zero correlation, which implies that two variables or sets of events are uncorrelated. The commonly used multiplication of probabilities is an example of zero correlation. It should be stated, though, that correlation and dependence are not the same thing. Correlation as a statistical concept is a special case of stochastic dependence (Townsend & Ashby, 1983). Thus, zero correlation does not imply independence, but independence does imply zero correlation. In many experiments, an exact measure of correlation is not applicable. The second definition of independence—performance parity—implies that performance on two perceptual tasks executed simultaneously is the same as the sum of the performances on each task performed separately. It is important to note that although performance parity is an appealing concept, it starts as a vaguely defined concept until placed meaningfully within an explicit model (see, e.g., Ashby & Townsend, 1986). Interestingly, the measurement theory lying behind conjoint measurement is a rigorous formulation of the general idea (Luce & Tukey, 1964). Although zero correlation and performance parity have been used separately, there exist definite relations between them.
Another important contribution of Garner and Morton (1969) is the application of information theory (Shannon, 1948) to the study of perceptual independence. During the early days of cognitive psychology, information theory played a key role in both theorizing and experimentation (Attneave, 1954; Garner, 1962). However, information models have seen a decline in popularity (Luce, 2003). This is unfortunate, because “the information model is . . . intimately associated with the entire problem of perceptual independence” (Garner & Morton, 1969, p. 239). These models can be considered as a subclass of probability models (Kolmogorov, 1968), since they are applicable to probability distributions.
In the next section I will introduce a model by Garner and Morton (1969) that makes use of information theory. Hence, a brief introduction to information theory is given in Appendix 1 (for a comprehensive review, see Cover & Thomas, 1991; Laming, 2010)
Multivariate uncertainty analysis
Garner’s application of information theory to psychology was largely based on a statistical technique called multivariate uncertainty analysis (Garner, 1962; Garner & McGill, 1956). Because Garner and Morton’s model embedded many of the principles of this approach, I will refer to it as the MUA (i.e., multivariate uncertainty analysis). The MUA provides a powerful theory of perceptual independence in which various types of dimensional interaction are defined. To the best of my knowledge, since its inception, no attempt has been made to apply MUA to empirical data either by Garner himself or by his many disciples (but see, Ashby & Townsend, 1986). One reason might be the demise of information theory in psychology (Laming, 2010; Luce, 2003). The oblivion of the MUA, as the present revival of this model hopes to show, is unjust.
To see how this model works, consider a factorial design with two dimensions A and B that are crossed at two levels 1, 2, resulting in four stimuli: A 1 B 1, A 1 B 2, A 2 B 1, and A 2 B 2. Now, assume a complete identification task with these stimuli in which the observer is asked to report on the level of both dimensions. This complete orthogonal procedure is optimal for detecting violations of perceptual independence (dimensional interactions) (Garner & Morton, 1969).
MUA addresses observable variables and is a parameter-free model that uses information measures (i.e., negative sum of the log-weighted probabilities; see Appendix 1). MUA distinguishes between two stimuli variables (i.e., A, B) and two response variables (i.e., a, b). Lowercase letters s (stimulus) and r (response) are added for the sake of clarity. In this conception, the term T(sA, sB, ra, rb) conveys the total amount of mutual information in the system (see Appendix 1 for more details on mutual information). Partitioning of this term into its constituent components results in the following equation (cf. Garner, 1962, Equations 5. 8; Garner & Morton, 1969, Equation 3):
where T(sA, sB) is the cross contingency or the nonmetric correlation between stimulus variables A and B. This measure is affected by the relative frequencies of the constituent levels. T(ra, sA) and T(rb, sB) are the direct contingencies. They measure the correlation between responses and relevant stimuli. T sB (rb, sA) and T sA (ra, sB) are the partial cross contingencies and measure the correlation between responses and their irrelevant stimuli, with the effect of the relevant stimuli partialled out. These terms are equivalent to partial correlation and reflect the level of (in)dependence between responses and irrelevant stimuli. Garner and Morton often referred to these measures as “error correlation.” T sAsB (ra, rb) is the partial response contingency. It measures the correlation between the two responses, with the mutual information of stimuli partialled out. This term indicates the level of (in)dependence of responses and is determined by either perceptual or response factors. Table 1 presents a list of the MUA components.
Equation 1 provides a complete model of perceptual independence. It defines all the possible informational correlations within the complete orthogonal experiment. Each component represents an exclusive aspect of the global mutual information and invites a unique psychological interpretation. Figure 1 illustrates the MUA graphically. Note that in this model, the mutual-informational terms partial response contingency and partial cross contingencies should be equal to zero if perceptual independence holds. Violations of perceptual independence are indicated whenever one, a subset, or all of these terms are greater than zero.
The general recognition theory
The general recognition theory (GRT; Ashby & Townsend, 1986) resulted from several decades of work on perceptual independence (Garner & Morton, 1969; Tanner, 1956; Townsend, Hu, & Ashby, 1981; Townsend, Hu, & Evans, 1984). The GRT is a multidimensional extension of signal detection theory (Green & Swets, 1966). An important aspect of GRT is its capability of distinguishing between sensory and decisional factors.
Assume a classification task with the four stimuli A 1 B 1, A 1 B 2, A 2 B 1, A 2 B 2. A fundamental assumption in GRT is that each stimulus exerts a trial-by-trial stochastic perceptual effect that is best represented as a multivariate normal distribution. Observers partition the resulting perceptual space into regions and then assign a unique response alternative with each region. Figure 2 illustrates an exemplary GRT space with two dimensions (i.e., x, y) that vary on two levels (i.e., 1, 2). The bivariate distributions are cut across the horizontal plane at a fixed probability value to create the equal-likelihood contours shown.
In its most general form, the GRT does not carry any distributional assumptions. However, the majority of GRT studies assume Gaussian multivariate distributions (Ashby & Lee, 1991; Thomas, 1995, 2001a). Hence, a full parameterized GRT model requires two means (expressed as a mean vector),
two variances, and a correlation parameter (expressed as a covariance matrix):
where the subscripts A and B index the parameters for the marginal distributions on perceptual evidence for dimension A and dimension B, respectively. In addition, when piecewise linear bounds are assumed, two intercepts for each dimension are needed (i.e., γ B1 and γ B2).
Three forms of independence and separability are defined in GRT. Perceptual independence (i.e., PI) refers to a within-stimulus stochastic independence, formally defined as
where A and B denote the stimuli components and x, y refer to the constituent dimensions. \( {f_{{{A_i}{B_j}}}}\left( {x,y} \right) \) is the bivariate probability density distribution representing the perceptual effects elicited by presentation of stimulus A i B j , and \( {g_{{{A_i}{B_j}}}}(x) \) and \( {g_{{{A_i}{B_j}}}}(y) \) are the marginal distributions of the perceptual effects on each perceptual dimension elicited by presentation of stimulus A i B j . In the Gaussian version of the GRT, PI is determined by the value of the coefficient correlation parameter ρ. When ρ is zero, PI holds. Negative or positive values of ρ, imply violations of PI. Perceptual separability (i.e., PS) entails perceptual evidence for one dimension is invariant across levels of the other dimension. Formally,
for i = 1,2 and
for j = 1,2
Decisional separability (i.e., DS) entails that the decision bound for one dimension remains the same across levels of the other dimension. Decisional separability is violated whenever the decision bound for one dimension depends on the level of the other dimension. The construct reflects decisional and motivational aspects of perception and cognition (Maddox, 1992).
The GRT space cannot be observed directly from the empirical identification confusion matrices. The challenge is therefore to make valid inferences regarding violations of PI, PS, and DS on the basis of incomplete information. Ashby and Townsend (1986) developed several nonparametric tests that assess constructs in the GRT. The first is a test of PI called sampling independence (SI):
SI is relevant to PI only if decisional separability holds. That is, if the two decision bounds for each dimension are perpendicular to the dimension axis.
A second test of PI that was proposed by Ashby and Townsend (1986) relies on information theory principles. It uses the contingent mutual information between components A and B in the perceptual distribution associated with the stimulus A i B j . The contingent mutual information term T AiBj equals zero if and only if PI holds. However, Ashby and Townsend did not notice that sampling independence also implies that, at a coarser level. Hence, the two tests should provide comparable results. The informational test has been scarcely used in the GRT literature. It will be used in the present effort in tandem with the SI test.
A third test is called marginal response invariance (MRI), which assesses perceptual separability (PS) and DS. MRI, in an identification-confusion matrix, holds when the probability of correctly recognizing one component does not depend on the physical level of the other. That is for i = 1, 2,
and for j = 1, 2,
Two major parametric approaches to making inferences regarding GRT constructs are available. The first makes use of micro- and macro-signal detection analyses (Kadlec & Townsend, 1992a, 1992b), along with the nonparametric tests. The second approach employs a hierarchical model fitting procedure (Ashby & Lee, 1991; Thomas, 2001b). The two methods have been applied successfully in a number of recent publications and have shown a high level of convergence (Cornes, Donnelly, Godwin, & Wenger, 20111; Thomas, 2001a; Wenger & Ingvalson, 2002, 2003).
Probing facial holism with MUA and GRT
Holistic face perception can be couched in the different languages of MUA and GRT. Consider a factorial design in which two facial features (nose and eyes) are crossed at two levels (short and long) to create four faces. Let us call the attributes of nose A and the attribute of eyes B. Now, ask the observers to report the levels of both features in a complete identification task. This task results in a confusion matrix with errors distributed across the off-diagonal cells. Note that the design, task, and confusion matrices are the same for MUA and GRT. This allows one to analyze the same confusion matrix according to GRT and MUA. If faces are represented holistically (Farah et al., 1998; Tanaka & Farah, 1993), we should expect that at least a subset of the measures should violate perceptual independence in both MUA and GRT. The critical assumption made here is that facial holism is general enough to manifest itself in different theories of perceptual independence.
Application of MUA to holistic face perception
The application of MUA to the perception of faces is a novel contribution of the present study. I start by defining the variables in the model: two facial dimensions (e.g., A = eye type, B = nose shape) and two responses (a = response to eye, and b = response to nose). The partitioning of the confusion matrix into its mutual information components provides a complete model of information transmission.
The cross contingency T(sA, sB) measures the amount of correlation between the facial features (e.g., eyes and nose). When levels of each feature occur equally often (i.e., orthogonally), they are statistically independent [T(sA, sB) = 0]. Unequal frequencies of feature presentations generate a correlation [T(sA, sB)>0]. Several studies have shown that the degree to which two dimensions interact depends, among other factors, on the correlational structure in the stimuli set (Garner, 1974; Melara & Algom, 2003). In the case of faces, the correlation between facial features can result from either an experimental manipulation or a given statistical structure of the environment.
The direct contingencies T(ra, sA) and T(rb, sB) reflect the degree of association between a facial feature (stimulus) and its appropriate response. The partial cross contingencies T sB (rb, sA) and T sA (ra, sB) measure the degree of error correlation between a response to a facial feature (stimulus) and an inappropriate stimulus variable, with the effect of the other stimulus partialled out. In this sense, the term parallels that of partial correlation and reflects the crossing-over from one facial feature (e.g., nose) to another (e.g., eyes). This measure is expected to be larger than zero if faces are processed holistically.
The partial cross contingencies T sB (rb, sA) and T sA (ra, sB) measure the correlation between responses to facial features (e.g., nose) and their irrelevant stimuli (e.g., eyes), with the effect of the relevant stimuli (e.g., nose) partialled out. These terms capture the crossing over from responses to irrelevant stimuli. Ashby and Townsend (1986, Theorem 6) proved that PS and DS mathematically imply that partial cross contingency equals to zero. They have also shown that it is not logically related to true PI, unless failure of the latter causes a failure of PS.
The partial response contingency T sAsB (ra, rb) measures the correlation between responses to facial feature a and responses to facial feature b with the mutual information of the stimulus A and B partialled out. The partial response contingency is related to downstream response selection and decision mechanisms, rather than perceptual mechanisms. This is an important component that has been neglected in previous studies. It might be the case that aspects of performance such as response compatibility (especially when the physical responses to the two dimensions differ), is more related to a kind of response-configurality than to the usual perceptual configurality. Hence, if faces are processed holistically in these senses, the partial response contingency is expected to produce values that are larger than zero.
MUA measures several types of informational interactions between facial features and responses. These terms can be interpreted as neuronal connections that have evolved either phylogenetically or ontogentically within the face recognition system (Bruce, 1986); their aim is to utilize statistical regularities and other structural determinants of successful face recognition in the organism’s environment (Brunswik, 1956; Hancock, Burton, & Bruce, 1996).
Application of GRT to faces
Several authors have harnessed the GRT to model performance in face recognition tasks (Cornes et al., 2011; Fitousi, Wenger, Heide, & der Bittner, 2010; Richler, Gauthier, Wenger, & Palmeri, 2008; Thomas, 2001a; Wenger & Ingvalson, 2002, 2003). The GRT offers a rigorous framework for understandinging the internal representation of faces. For instance, using GRT researchers can distinguish sensory and decisional interactions between facial features. Thomas (Thomas, 2001a, 2001b) has shown that face recognition is characterized by violations of PI, but not violations of DS. These results entail a strong form of holism because violations of PI occur at the individual stimulus level (i.e., face); violations of PS and DS entail weak forms of holism because they occur at the level of stimuli ensemble (O’Toole, Wenger, & Townsend, 2001; Wenger & Ingvalson, 2002). Note that violations of DS reflect the weakest form of holism, because they capture motivational and volitional components.
In a series of studies, Wenger and his colleagues have examined various classical markers of holistic face perception, such as the composite face effect (Richler et al., 2008), the Thatcher illusion (Cornes et al., 2011), and the face inversion effect (Wenger & Ingvalson, 2002, 2003). In all of these studies, face recognition was found to be governed by violations of DS and PS. Notably, violations of PI were minimal and might be ascribed to the different tasks, and stimuli used.
The work of Wenger and his colleagues sheds light on the important role played by decisional components in holistic face perception. The psychological nature of these factors, though, remains to be further elucidated.
Exploring possible relations between MUA and GRT
In this section, numerical simulations are performed in order to examine the presence of possible relations between MUA and GRT. The general strategy has been to (1) generate GRT models out of a known set of parameters (in each model, one parameter is increased incrementally in small steps), (2) produce identification confusion matrices from these GRT models, (3) analyze each matrix according to MUA, (4) plot the GRT parameters as a function of the MUA variable, and (5) discern (if they exist) meaningful relations. The parameter values for the simulations appear in Appendix 2. Note that analytic investigation might be possible (Ashby & Townsend, 1986) but is more complex.
Logically, any of the five MUA mutual information terms could be related to: no one of the GRT constructs (i.e., PI, PS, DS), only one, a subset, or all of them. Here, I report on several simulations that yielded meaningful relations. It is important to note that formal work on GRT (Ashby & Townsend, 1986; Kadlec & Townsend, 1992a, 1992b) has established complex relations between GRT’s constructs and the nonparametric tests. The relations observed in the present simulations may or may not imply relations between MUA and GRT’s nonparametric tests. For example, SI serves as a test of PI; if an MUA construct exhibits some level of correspondence with PI and DS, it is possible, but not certain, that the MUA term is also related to SI.
Partial response contingency
The correspondence between the partial response contingency T sAsB (ra, rb) and violations of PI was tested by generating 32,000 GRT models in which PS and DS held but PI was violated. In a quarter of these models (i.e., 8,000), one correlation coefficient parameter was varied in one of the bivariate distributions. In the remaining three quarters, the parameter was varied in two, three, or four of the bivariate distributions, respectively. In each of the four cases, the correlation coefficient parameter was increased in small steps of 0.00025 from −1 to 1 (including 0, a condition in which PI held). The partial response contingency was computed for each of the GRT models. Figure 3 plots the partial response contingency against the correlation coefficient ρ. The latter varies in one, two, three, and four bivariate distributions (Fig. 3a–d). It is evident that as the coefficient parameter is varied in more bivariate distributions, the relations become more salient and symmetric. When ρ is modulated in all four distributions (Fig. 3d), the relations become very symmetric. As the absolute value of the correlation coefficient increases, the information term increases in a quadratic fashion. The maximum value of correlation (i.e., −1 or 1) amounts to 0.3 bits of information.
The correspondence between partial response contingency and violations of DS was tested by generating 16,000 GRT models in which PI and PS held but DS was violated. In half of these models, DS was violated on one of the dimensions, while in the other half, DS was violated on both dimensions. Violations of DS were varied parametrically by moving the relevant decision bound to the right side of the space in small steps of 0.005. The partial response contingency was computed in each of the resulting GRT models. As can be noted in Fig. 4, the partial response contingency increases as a function of the degree to which DS is violated until it reaches an asymptote.
Partial cross contingency
To examine the relations between the partial cross contingency and violations of PS, I generated 8,000 GRT models in which PI and DS held but PS was violated on dimension A. An additional data set was created by generating 8,000 GRT models in which PI and DS held but PS was violated on dimension B. The degree to which PS was violated in each of these data sets was modulated by increasing or decreasing the value of the location parameter of the relevant bivariate Gaussian distribution (μ A ,μ B ) relative to the other distributions. Small steps of 0.0005 were added incrementally to create each model. The parameter ranged between −2 and +2 and included 0 (i.e., a condition in which PS held). The cross contingency was computed for each of these 16,000 models and plotted against their corresponding GRT location parameter. Figure 5 reveals strong associations between the value of T sA (ra, sB) and violations of PS on dimension A. Similarly, values of T sB (rb, sA) are associated with violations on dimension B.
The relations between the partial cross contingency terms and DS were investigated in a similar fashion. I generated 8,000 GRT models in which DS was violated on dimension A(B). The relevant decision bound was moved to the right side of the space in small steps of 0.005. The cross contingency term was computed for each of these models and plotted against the location parameter representing the degree to which DS was violated. As can be noted in Fig. 6 T sA (ra, sB) is related to violations of DS on dimension A, and T sB (rb, sA) is associated with violations of DS on dimension B.
These results also suggest that when PS and DS hold, the partial cross contingency amounts to zero. Ashby and Townsend noted that the partial cross contingency term is comparable with PS and DS; they provided a mathematical proof that when MRI holds, the partial cross contingency terms amount to zero (1986, see pp. 169–170, Equations 6–9; see also proof of Theorem 6, p. 178). Ashby and Townsend recommended using the partial cross contingency test either as a surrogate to the MRI test or in its own right. The simulations results provide additional support to Ashby and Townsend’s conjecture.
Direct contingency
In the original formulation of the MUA (Garner & Morton, 1969), direct contingency terms T(ra, sA) and T(rb, sB) played no direct role in the assessment of perceptual independence. Yet preliminary simulations pointed to possible relations between violations of PS and direct contingency. To further test this trend, I generated 8,000 GRT models in which PI and DS held but PS was violated on dimension A. An additional 8,000 GRT models were generated in which PI and DS held but PS was violated on dimension B. In these models, PS was modulated by increasing or decreasing the value of the location parameter of the relevant bivariate Gaussian distribution (μ A , μ B ) relative to the other distributions. Small steps of 0.0005 were added incrementally to create each model. The direct contingency was computed for each model and plotted against its corresponding GRT location parameter. As can be noted in Fig. 7a, c, the direct contingency T(ra, sA) is related to violations of PS on dimension A, but not to violations of PS on dimension B. The complementary Fig. 7b, d demonstrate that T(rb, sB) is related to violations of PS on dimension B but not to violations of PS on dimension A. Notably, the direct contingency has a fixed value when PS holds (i.e., the location parameter is at its initial position).
In sum, the simulations demonstrated the presence of some interesting relations between MUA and GRT. It is important to note that these simulations cannot replace a formal treatment. A possible goal for future research is to tie the MUA constructs to GRT’s nonparametric tests. Another aim is to develop an information-theoretic version of the GRT.
Study 1
(In)dependence of schematic facial features
In this section, I apply the MUA to data from two experiments by R. D. Thomas (2001a, 2001b). The stimuli used in her experiments were grayscale schematic faces such as those shown in Fig. 8. In the first experiment (Thomas, 2001b), four observers participated. Two observers were presented with faces varying on eye separation (A, far and close) and mouth width (B, thin and wide) with nose length held at fixed value. The other two observers were presented with faces varying on eyes separation (A) and nose length (B), while the mouth width was held constant. On each trial, the observers were presented with one of four possible faces and judged the value of each constituent feature by giving one of four possible responses in a nontimed manner.
Results
In all the subsequent analyses, statistical inferences were based on methods developed by Attneave (Attneave, 1951). Tests were held with an alpha level of .05, and the unit of measurement was the “bit.” Table 2 summarizes the results. In the text, I report on those measures that reached significance level.
Observer 1 classified faces on the basis of eye separation and mouth width. Direct contingency between mouth width and its appropriate response T(ra, sA) (0.035 bit) and the partial response contingency T sAsB (ra, rB) (0.009) were the only significant terms. Observer 4 classified the same set of faces. Direct contingency T(ra, sA) between eye separation and its appropriate response amounted to 0.421 bit. Direct contingency T(rb, sB) between mouth shape and its appropriate response was 0.08 bit. Observer 2 classified faces that varied on eye separation and nose length. Direct contingency T(ra, sA) between eye separation and its appropriate response amounted to 0.259 bit. The equivalent component T(rb, sB) for nose length was 0.041. Partial response contingency T sAsB (ra, rb) between responses to eye separation and nose length was 0.036 bit. Observer 3 classified the same set of faces as did observer 2. Direct contingency T(ra, sA) between eye separation and its appropriate response amounted to 0.252 bit. The corresponding term for nose length T(rb, sb) was 0.057 bit. Partial response contingency T sAsB (ra, rb) between the responses to eye separation and nose length was 0.015 bit.
In the second experiment by Thomas (2001a), three observers participated. Table 3 presents the mutual information components for this data set. Observer A classified faces on eyes separation and nose length. The direct contingency between eye separation and its appropriate response amounted to 0.276 bit. The equivalent term for the nose length was 0.036 bit. Partial response contingency between responses to eye separation and nose length was 0.4 bit.
Observer B classified the faces on the same facial attributes as observer A. Direct contingency between eye separation T(ra, sA) and its appropriate response was 0.417 bit. Direct contingency between mouth shape and its appropriate response amounted to 0.086 bit. Observer C classified faces that varied on eye separation and mouth shape. Direct contingency between eye separation and its appropriate response was 0.011 bit. The corresponding component for mouth shape was 0.031 bit. The partial response contingency T sAsB (ra, rb) between the responses to eye separation and mouth shape was 0.022 bit.
Taken together, the results from the two studies by Thomas (2001a, 2001b) point to similar conclusions. Five out of seven observers exhibited reliable levels of partial response contingency T sAsB (ra, rb). This was the only component indicating violations of perceptual independence in the MUA. According to these results, holistic processing of faces is generated by interactions between the responses made to facial features.
It might be informative at this stage to contrast the MUA results with those from the GRT. First, let me review the conclusions drawn by Thomas (2001a, 2001b). This author asserted that "SI fails for the features of eye separation and nose length (observers 2 and 3) and is marginally rejected for eye separation and mouth width. This implies either PI or DS" (Thomas, 2001b, p. 642). Thomas then continued to argue that "the failure of SI is likely due to the presence of perceptual dependence . . . rather than a failure of DS“ (p. 642). To support this conclusion, Thomas pointed to the results of the model fitting procedure: "the central source of the improvement in fit for some the more general models is the relaxation of the PI assumption" (p. 642). On the basis of the MRI and macrosignal detection tests, Thomas assumed that PS and DS hold. Her final conclusions were that PS and DS held but PI was violated for three of the four observers. A similar line of reasoning was employed in her second study (Thomas, 2001a), leading to the conclusion that, for two of the three observers, PI is violated while PS and DS hold.
To follow up on Thomas’s conclusions, I tested for PI using the contingent mutual information and sampling independence (SI) tests (Ashby & Townsend, 1986). The results of these analyses are presented in Table 4 and Table 5. Notably, the outcomes of the SI and those of the contingent mutual information tests are remarkably close. The advantage of the contingent mutual information test, though, is that it quantifies the degree of dependency in bits. As can be noted, many of the stimuli failed the test, mostly for observers 1 and 2. These results might lend additional support to Thomas’ conclusions.
Next, I examine the correspondence between the MUA and GRT results with respect to the PS and DS found by Thomas (2001a, 2001b). Recall that Ashby and Townsend (1986) proved that if PS and DS hold, the partial cross contingencies should be equal to zero, a conjecture that has also been supported by the simulations performed in the present work. The fact that no violations of the partial cross contingencies T sB (rb, sA) and T sA (ra, sB) were found corresponds well with the findings of PS and DS in Thomas’s studies (2001a, 2001b). Overall, the results of the GRT and MUA exhibit high degree of agreement.
Study 2
The composite face effect
Composite faces combine the top half of one face with the bottom half of another face (see Fig. 9). When the two halves are aligned, they seem to fuse together to produce a novel face. In the composite face effect (CFE; Young et al., 1987) observers are slower and more error prone to recognize one half of an aligned, as compared with a misaligned, composite face. The degree of this congruency effect depends on the level of the alignment of the test face. As the degree of misalignment increases, the interference gets smaller.
Richler et al. (2008) have assessed the CFE using the complete identification paradigm. In this procedure, observers are presented with a study face, a series of masking patterns, and a test face. Observers indicate whether the top part of the test face is the same as or different from the top part of the study face; a similar judgment is performed with respect to the bottom part. Thus, four types of stimuli (responses) are possible in such an experiment: same-top–same-bottom, same-top–different-bottom, different-top–same-bottom, and different-top–different-bottom. Richler et al. performed GRT analyses on the identification confusion matrices and found that PS and DS were violated, but that PI held (see also Wenger & Ingvalson, 2002, 2003); they also found that the degree of misalignment affected the degree to which DS was violated.
Richler et al.’s (2008) results are inconsistent with those of Thomas (2001a, 2001b), who found violations of PI but no violations of PS and DS. These two patterns suggest opposing forms of facial holism—weak and strong. The chasm can be explained away in two ways. First, it can be explained by the fact that the two studies employed different facial stimuli (i.e., schematic drawings vs. images of real faces) and somewhat different tasks (absolute identification vs. a same–different identification). Second, Richler et al. rejected the possibility that the CFE stems from violations of PI. Their main argument was that the CFE is an across-dimensions (i.e., composite parts) measure that cannot be computed within a single face. These authors reported simulations to support their conjecture. It is worth noting that the CFE itself cannot be generated by violations of PI. However, this does not rule out the logical possibility of within-stimulus violations of PI that are not related to CFE.
Here, I apply the MUA to data from an experiment by Fitousi et al. (2010). This study replicated the core results of Richler et al.’s (2008) Experiment 2. The main difference from the Richler et al. study is that the GRT analyses were held at the level of the individual observer. Four participants preformed in a complete identification task in two conditions. In the first condition, top and bottom parts of the test face were misaligned; in a second condition, the parts were aligned (see Fig. 9).
Results
To apply the MUA to CFE, I assumed two dimensions sA and sB (i.e., top and bottom parts of the composite face) and two responses ra and rb (i.e., response to top part, response to bottom part). The MUA defines all possible mutual information interactions among these four variables. The analyses were held separately on the confusion matrices from the aligned and misaligned conditions. Table 6 and 7 present the results for the aligned and misaligned composite face conditions. As can be noted, in the aligned condition, all four observers exhibited reliable MUA components (excepting the cross contingency, which was virtually zero because the presentation frequencies were equal). Three of the MUA components signal holistic face perception: partial cross contingency for top part, cross contingency for bottom part, and the partial response contingency between stimuli and responses. This means that all observers processed the composite faces holistically.
In contradistinction, only a portion of the misaligned condition components reached significance. For observer 1 only the partial cross contingency T sB (rb, sA) was significant and amounted to 0.009 bit. For observer 2, this component was also significant (0.026 bit), along with the T sAsB (ra, rb) component, which amounted to 0.04. For observer 3, T sB (rb, sA) and T sA (ra, sB) were significant (0.015 and 0.008 bits). For observer 4, only the T sB (rb, sA) was significant (amounting to 0.013 bit). Note that for 3 of the 4 observers the partial response contingency T sAsB (ra, rb) was not significant in the misaligned condition. This might point to the role played by this component in holistic face perception. Recall that in Study 1, the partial response contingency was found to be the only significant MUA component in 5 out of 7 observers.
The GRT analyses performed by Richler et al. (2008) and Fitousi et al. (2010) entail limited violations of PI but considerable violations of DS and PS. The latter violations were more evident in the aligned faces than in the misaligned faces condition. To gain additional insights, I tested the Fitousi et al. data for PI using the SI and the contingent mutual information tests (Ashby & Townsend, 1986). The results of these analyses are presented in Table 8 and Table 9. As has been found in Study 1, the contingent mutual information tests showed remarkable correspondence with the SI results. In the misaligned faces condition, none of the stimuli failed either of the tests. In the aligned faces condition, a minimal number of failures were observed (i.e., 4 out of 16 possible violations). These results lend additional support to Fitousi et al.’s conclusions regarding PI.
Let me turn now to the findings of violations of PS and DS. The MUA analyses of the aligned faces data revealed widespread violations of partial cross contingency terms T sB (rb, sA) and T sA (ra, sB), along with violations of partial response contingency T sAB (ra, sb). In contradistinction, these violations were much less prominent in the misaligned faces data. These outcomes make sense given that the partial cross contingencies are comparable to PS and DS (Ashby & Townsend, 1986). Thus, in combination, these results seem to be highly comparable.
General discussion
The concept of independence has been used in the psychological literature to explain a wide variety of phenomena. The present effort concerned two theories of perceptual independence: the GRT (Ashby & Townsend, 1986) and the MUA (Garner & Morton, 1969). The former has been influential in the domains of perception and cognition (Ashby, 1992), whereas the latter has been much less so. Simulations uncovered possible relations between measures from the two approaches. In addition, MUA was applied to data from face experiments with schematic facial features (Thomas, 2001a, 2001b), and with composite faces (Richler et al., 2008).
Three important outcomes emerged from this study. First, it has been shown that MUA provides a powerful theory of perceptual independence and holistic face perception. Second, various interesting relations between MUA and GRT were demonstrated. This apparent correspondence should be elucidated in future analytical investigations. Third, several informational components of perceptual (in)dependence seem to underlie holistic face perception (see also OToole et al., 2001; Wenger & Ingvalson, 2002).
MUA and the integral–separable distinction
Two phenomenologically distinct classes of objects might exist. The first includes objects that form a Gestalt and resist attempts at dissecting them into their constituent dimensions. The attributes composing such objects are called integral dimensions. With other objects, analysis is rather easy, and their constituent attributes are dubbed separable dimensions (Garner, 1974). The MUA played an important historical role in Garner’s formulation of the separable–integral conception. Garner defined separable dimensions in the following way: “the lines connecting each dimensional input with its appropriate output, labeled ’a,’ [’a’ and ’b’ in Figure 1] are the only connections that should exist, and if these are all that do exist, then we . . . are in effect dealing with separable dimensions” (Garner, 1974, p. 163). Integral dimensions were similarly defined in terms of the remaining three types of connections constructing the MUA model.
According to Garner’s principle of converging operation (Garner, 1974; Garner, Hake, & Eriksen, 1956), the classification of dimensions as separable or integral should rely on outcomes from several experimental tests. The present effort capitalized on this principle. In a similar fashion, other authors have developed models that relate GRT to Garner’s speeded classification task—an operational measure of dimensional interaction (Ashby & Maddox, 1994; Fific, Nosofsky, & Townsend, 2008; Maddox, 1992; Nosofsky & Palmeri, 1997). Recently, Fitousi and Wenger (2013) have examined the independence of facial identity and expression from the view point of several indices of perceptual independence and separability, demonstrating a high degree of convergence.
Perceptual independence and holism
Students of perception often consider faces to be the ultimate Gestalt stimuli (Palmer, 1999). Gestalt theories of grouping centered on the articulation of the principle of Pragnanz, according to which perceptual organization provides the observer with a “simpler” percept than the ungrouped percept. However, these theories have not made clear statements about how this process takes place. The idea that grouping of facial features is a result of violations of some type of perceptual independence is compelling. However, in itself, perceptual independence cannot serve as a complete theory of holism or configurality. At best, this concept allows the theorist to define the primitive building blocks of holism. It is the theorist’s job to explain what purpose these violations serve and what are the processes and states that are involved in their emergence. These goals should be attained by future theories of holistic processing and configurality.
References
Ashby, F. G. (1992). Multivariate probability distributionsMultivariate probability distributions. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 1–34). Hillsdale: Erlbaum.
Ashby, F. G., & Lee, W. W. (1991). Predicting similarity and categorization from identification. Journal of Experimental Psychology: Genera, 120, 150–172.
Ashby, F. G., & Maddox, W. T. (1994). A response time theory of separability and integrality in speeded classification. Journal of Mathematical Psychology, 38, 423–466.
Ashby, F. G., & Townsend, J. T. (1986). Varieties of perceptual independence. Psychological Review, 93, 154–179.
Attneave, F. (1951). Applications of information theory to psychology. New York: Holt.
Attneave, F. (1954). Some informational aspects of visual. Psychological Review, 61, 183–193.
Bradshaw, J. L., & Wallace, G. (1971). Models for the processing and identification of faces. Perception & Psychophysics, 9, 443–448.
Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305–327.
Brunswik, E. (1956). Perception and the representative design of psychological experiments. Berkeley: University of California Press.
Cornes, V., Donnelly, N., Godwin, H., & Wenger, M. J. (2011). Perceptual and decisional factors affecting the detection of the Thatcher illusion. Journal of Experimental Psychology. Human Perception and Performance, 37, 645–668.
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.
Diamond, R., & Carey, S. (1986). Why faces are and are not special: An effect of expertise. Journal of Experimental Psychology. General, 115, 107–117.
Ellison, J. W., & Massaro, D. W. (1997). Featural evaluation, integration, and judgment of facial affect. Journal of Experimental Psychology. Human Perception and Performance, 23, 213–226.
Farah, M. J., Wilson, K. D., Drain, M., & Tanaka, J. N. (1998). What is “special” about face perception? Psychological Review, 105, 482–498.
Fific, M., Nosofsky, R., & Townsend, J. (2008). Information processing architecture in multidimensional classification: A validation test of the system factorial technology. Journal of Experimental Psychology: Human, Perception, and Psychophysics, 34, 356–375.
Fitousi, D., & Wenger, M. J. (2013). Variants of independence in the perception of facial identity and expression. Journal of Experimental Psychology. Human Perception and Performance, 39, 133–155.
Fitousi, D., Wenger, MJ., Heide, RV. der Bittner, J. 2010b May. Attentional weighting in configural face processing.Attentional weighting in configural face processing. Poster presented at the 2010 Meeting of the Vision Science Society, Naples, FL
Galton, F. (1879). Composite portraits, made by combining those of many different persons into a single, resultant, figure. Journal of the Anthropological Institute, 8, 132–144.
Garner, W. R. (1962). Uncertainty and structure as psychological concepts. New York: Wiley.
Garner, W. R. (1974). The processing of information and structure. New York: Wiley.
Garner, W. R., Hake, H. W., & Eriksen, C. W. (1956). Operationism and the concept of perception. Psychological Review, 63, 149–159.
Garner, W. R., & McGill, W. J. (1956). Relation between information and variance analyses. Psychometrika, 21, 219–228.
Garner, W. R., & Morton, J. (1969). Perceptual independence: Definitions, models, and experimental paradigms. Psychological Bulletin, 72, 233–259.
Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley.
Hancock, P. J. B., Burton, A. M., & Bruce, V. (1996). Face processing: Human perception and principal components analysis. Memory & Cognition, 24, 26–40.
Kadlec, H., & Townsend, J. T. (1992a). Signal detection analysis of dimensional interactionsSignal detection analysis of dimensional interactions. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 181–228). Hillsdale: Erlbaum.
Kadlec, H., & Townsend, J. T. (1992b). Implications of marginal and conditional detection parameters for the separabilities and independence of perceptual dimensions. Journal of Mathematical Psychology, 36, 325–374.
Kolmogorov, A. N. (1968). Logical basis for information theory and probability theory. IEEE Transactions Information Theory, 14, 662–664.
Laming, D. (2010). Statistical information and uncertainty: A critique of applications in experimental psychology. Entropy, 12, 720–771.
Loftus, G. R., Oberg, M. A., & Dillon, A. M. (2004). Linear theory, dimensional theory, and the face-inversion effect. Psychological Review, 111, 835–863.
Luce, R. D. (2003). Whatever happened to information theory in psychology? Review of General Psychology, 7, 183–188.
Luce, R. D., & Tukey, J. W. (1964). Simultaneous conjoints measurement: A new type of fundamental measurement. Journal of Mathematical Psychology, 1, 1–27.
Macho, S., & Leder, H. (1998). Your eyes only? a test of interactive influence in the processing of facial features. Journal of Experimental Psychology. Human Perception and Performance, 24, 1486–1500.
Maddox, W. T. (1992). Perceptual and decisional separabilityPerceptual and decisional separability. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 147–180). Hillsdale: Erlbaum.
Melara, R. D., & Algom, D. (2003). Driven by information: a tectonic theory of stroop effects. Psychological Review, 110, 422–471.
Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based random walk model of speeded. Psychological Review, 104, 266–300.
O’Toole, A. J., Wenger, M. J., & Townsend, J. T. (2001). Quantitative Models of Perceiving and Remembering Faces: Precedents and PossibilitiesQuantitative models of perceiving and remembering faces: Precedents and possibilities. In M. J. Wenger & J. T. Townsend (Eds.), Computational, geometric, and process perspectives on facial cognition: Contexts and challenges (pp. 1–38). Mahwah: Erlbaum.
Palmer, S. E. (1999). Vision science: Photons to phenomenology. Cambridge: MAMIT.
Richler, J. J., Gauthier, I., Wenger, M. J., & Palmeri, T. J. (2008). Holistic processing of faces: Perceptual and decisional components. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 328–342.
Sergent, J. (1984). An investigation of component and configural processes underlying face recognition. British Journal of Psychology, 75, 221–242.
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.
Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54–87.
Tanaka, J. W., & Farah, M. J. (1993). Parts and wholes in face recognition. Quarterly Journal of Experimental Psychology, 46A, 225–245.
Tanner, W. P. (1956). Theory of recognition. Journal of the Acoustical Society of America, 28, 882–888.
Thomas, R. D. (1995). Gaussian general recognition theory and perceptual independence. Psychological Review, 102, 192–200.
Thomas, R. D. (2001a). Characterizing perceptual interactions in face identification using multidimensional signal detection theoryCharacterizing perceptual interactions in face identification using multidimensional signal detection theory. In M. J. Wenger & J. T. Townsend (Eds.), Computational, geometric, and process perspectives on facial cognition: Contexts and challenges (pp. 193–228). Mahwah: Erlbaum.
Thomas, R. D. (2001b). Perceptual interactions of facial dimensions in speeded classification and identification. Perception & Psychophysics, 63, 625–650.
Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. Cambridge: Cambridge University press.
Townsend, J. T., Hu, G. G., & Ashby, F. G. (1981). A test of visual feature sampling independence with orthogonal straight lines. Psychological Research, 43, 259–275.
Townsend, J. T., Hu, G. G., & Evans, R. J. (1984). Modeling feature perception in brief displays with evidence for positive interdependencies. Perception & Psychophysics, 36, 35–49.
Townsend, J. T., & Spencer-Smith, J. (2004). Two kinds of global perceptual separability and curvatureTwo kinds of global perceptual separability and curvature. In E. Schroger & C. Kaernbach (Eds.), Psychophysics beyond sensation: Laws and invariants of human cognition (pp. 89–109). Mahwah: Erlbaum.
Tversky, A., & Krantz, D. H. (1969). Similarity of schematic faces: A test of interdimensional additivity. Perception & Psychophysics, 5, 124–128.
Wenger, M. J., & Ingvalson, E. M. (2002). A decisional component of holistic encoding. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 872–892.
Wenger, M. J., & Ingvalson, E. M. (2003). Preserving informational separability and violating decisional separability in facial perception and recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1106–1118.
Wenger, M. J., & Townsend, J. T. (2006). On the costs and benefits of faces and words. Journal of Experimental Psychology. Human Perception and Performance, 32, 755–779.
Young, A. W., Hellawell, D., & Hay, D. C. (1987). Configurational information in face perception. Perception, 16, 747–759.
Author Note
The author would like to thank Jim Townsend, Leslie Blaha, Mario Fific, and Danny Algom for their insightful comments on earlier versions of the manuscript.
Correspondence regarding this paper should be addressed to Daniel Fitousi, Ariel University, Department of Behavioral Sciences, Israel. danielfi@ariel.ac.il
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Independence and information theory
In a system A with N A possible states, we perform a measurement A that can yield one of the possible values \( {a_1},.....{a_N} \). A probability p(a i ) is assigned with each of these states. The average amount of information gained from a measurement that reveals one particular value a i is provided by the entropy or uncertainty H(A) of the system (Cover & Thomas, 1991; Shannon, 1948):
which is the negative sum of the weighted log probabilities. Some refer to this value as the level of “surprisal” one experiences when reading the result of a measurement. Note that the treatment here applies to discrete probabilities. Application to continuous probabilities is possible (see Cover & Thomas, 1991). The mutual information H(A, B) of two discrete systems A and B is defined analogously:
Here, p(a i , b j ) denotes the joint probability that A is in state a i and B is in state b j . The number of possible states N A and N B may not be on par. If the systems A and B are statistically independent, the mutual probabilities factorize, and the mutual information H(A, B) becomes
The mutual information (Kolmogorov, 1968; Shannon, 1948) T(A, B) between the systems A and B is then defined as
The Shannon mutual information provides a test of violations from informational independence.
Commonly used measures of correlation, such as the Pearson correlation or Euclidean distance, reveal only linear dependencies, whereas the mutual information is capable of detecting any functional relation. Hence, when the mutual information of two variables T(A, B) is zero, one can safely conclude that independence holds. However, when the Pearson correlation is zero, independence does not necessarily hold. Thus, the mutual information can be viewed as a generalized measure of correlation that is equivalent to Pearson correlation but sensitive to any form of relations. Note, however, that mutual information can be either equal to or larger than zero, whereas the Pearson correlation can take negative or positive values. The advantages of the mutual information measure offset this limitation.
Appendix 2: Methods of the numerical simulations
Various GRT models were generated with the aim of representing one or a subset of the following violations: PI, PS, and DS. Each GRT model included two dimensions, each with two levels. Bivariate Gaussian distributions with equal marginal variances were assumed in all cases (see the text). This is in accordance with previous studies (Ashby & Lee, 1991; Thomas, 2001b). The parameters of the models are presented in Table 10.
Rights and permissions
About this article
Cite this article
Fitousi, D. Mutual information, perceptual independence, and holistic face perception. Atten Percept Psychophys 75, 983–1000 (2013). https://doi.org/10.3758/s13414-013-0450-0
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13414-013-0450-0