Abstract
A common question in perceptual science is to what extent different stimulus dimensions are processed independently. General recognition theory (GRT) offers a formal framework via which different notions of independence can be defined and tested rigorously, while also dissociating perceptual from decisional factors. This article presents a new GRT model that overcomes several shortcomings with previous approaches, including a clearer separation between perceptual and decisional processes and a more complete description of such processes. The model assumes that different individuals share similar perceptual representations, but vary in their attention to dimensions and in the decisional strategies they use. We apply the model to the analysis of interactions between identity and emotional expression during face recognition. The results of previous research aimed at this problem have been disparate. Participants identified four faces, which resulted from the combination of two identities and two expressions. An analysis using the new GRT model showed a complex pattern of dimensional interactions. The perception of emotional expression was not affected by changes in identity, but the perception of identity was affected by changes in emotional expression. There were violations of decisional separability of expression from identity and of identity from expression, with the former being more consistent across participants than the latter. One explanation for the disparate results in the literature is that decisional strategies may have varied across studies and influenced the results of tests of perceptual interactions, as previous studies lacked the ability to dissociate between perceptual and decisional interactions.
Similar content being viewed by others
References
Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, 19, 716–723.
Ashby, F. G., & Lee, W. W. (1991). Predicting similarity and categorization from identification. Journal of Experimental Psychology: General, 120(2), 150.
Ashby, F. G., & Maddox, W. T. (1994). A response time theory of separability and integrality in speeded classification. Journal of Mathematical Psychology, 38(4), 423–466.
Ashby, F. G., & Soto, F. A. (2014). Multidimensional signal detection theory. In J. R. Busemeyer, J. T. Townsend, Z. Wang, & A. Eidels (Eds.), Oxford handbook of computational and mathematical psychology. New York: Oxford University Press (in press).
Ashby, F. G., & Townsend, J. T. (1986). Varieties of perceptual independence. Psychological Review, 93(2), 154–179.
Ashby, F. G., Waldron, E. M., Lee, W. W., & Berkman, A. (2001). Suboptimality in human categorization and identification. Journal of Experimental Psychology: General, 130(1), 77.
Baudouin, J. Y., Martin, F., Tiberghien, G., Verlut, I., & Franck, N. (2002). Selective attention to facial emotion and identity in schizophrenia. Neuropsychologia, 40(5), 503–511.
Billingsley, P. (2012). Probability and Measure. Hoboken, New Jersey: John Wiley & Sons
Blais, C., Arguin, M., & Marleau, I. (2009). Orientation invariance in visual shape perception. Journal of Vision, 9(2), 1–23.
Borg, I., & Groenen, P. (2005). Modern Multidimensional Scaling : Theory and Applications. New York: Springer.
Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10(4), 433–436.
Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77(3), 305–327.
Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference understanding AIC and BIC in model selection. Sociological Methods and Research, 33(2), 261–304.
Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika, 35(3), 283–319.
Cornes, K., Donnelly, N., Godwin, H., & Wenger, M. J. (2011). Perceptual and decisional factors influencing the discrimination of inversion in the Thatcher illusion. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 645.
D’Errico, J. (2006). Adaptive robust numerical differentiation. MATLAB Central File Exchange. Retrieved April 19, 2014, from http://www.mathworks.com/matlabcentral/fileexchange/file_infos/13490-adaptive-robust-numerical-differentiation
Dailey, M., Cottrell, G. W., & Reilly, J. (2001). California facial expressions, CAFE. Unpublished digital images, University of California, San Diego, Computer Science and Engineering Department.
de Beeck, H. P. O., Haushofer, J., & Kanwisher, N. G. (2008). Interpreting fMRI data: maps, modules and dimensions. Nature Reviews Neuroscience, 9(2), 123–135.
Ekman, P., Friesen, W. V., & Hager, J. (1978). The Facial Action Coding System (FACS): A technique for the measurement of facial action Palo Alto. Palo Alto: Consulting Psychologists.
Ellamil, M., Susskind, J. M., & Anderson, A. K. (2008). Examinations of identity invariance in facial expression adaptation. Cognitive, Affective, and Behavioral Neuroscience, 8(3), 273.
Ennis, D. M., & Ashby, F. G. (2003). Fitting the decision bound models to identification categorization data. Santa Barbara: University of California.
Etcoff, N. L. (1984). Selective attention to facial identity and facial emotion. Neuropsychologia, 22(3), 281–295.
Fitousi, D., & Wenger, M. J. (2013). Variants of independence in the perception of facial identity and expression. Journal of Experimental Psychology: Human Perception and Performance, 39(1), 133–155.
Fox, C. J., & Barton, J. J. S. (2007). What is adapted in face adaptation? The neural representations of expression in the human visual system. Brain Research, 1127, 80–89.
Fox, C. J., Oruç, I., & Barton, J. J. S. (2008). It doesn’t matter how you feel. The facial identity aftereffect is invariant to changes in facial expression. Journal of Vision, 8(3), 11.
Ganel, T., & Goshen-Gottstein, Y. (2004). Effects of familiarity on the perceptual integrality of the identity and expression of faces: The parallel-route hypothesis revisited. Journal of Experimental Psychology: Human Perception and Performance, 30(3), 583–596.
Ganel, T., Valyear, K. F., Goshen-Gottstein, Y., & Goodale, M. A. (2005). The involvement of the “fusiform face area” in processing facial expression. Neuropsychologia, 43(11), 1645–1654.
Garner, W. R. (1974). The processing of information and structure. New York: Erlbaum.
Hartigan, J. A., & Hartigan, P. M. (1985). The dip test of unimodality. The Annals of Statistics, 70–84.
Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223–232.
Kadlec, H., & Townsend, J. T. (1992a). Signal detection analysis of multidimensional interactions. In F. G. Ashby (Ed.), Multidimensional Models of Perception and Cognition (pp. 181–231). Hillsdale, NJ: Erlbaum.
Kadlec, H., & Townsend, J. T. (1992b). Implications of marginal and conditional detection parameters for the separabilities and independence of perceptual dimensions. Journal of Mathematical Psychology, 36(3), 325–374.
Kanwisher, N. (2000). Domain specificity in face perception. Nature Neuroscience, 3, 759–763.
Lee, M. D., & Wetzels, R. (2010). Individual differences in attention during category learning. In: R. Catrambone & S. Ohlsson (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 387–392). Austin, TX: Cognitive Science Society.
Lehky, S. R. (2000). Fine discrimination of faces can be performed rapidly. Journal of Cognitive Neuroscience, 12(5), 848–855.
Mack, M. L., Richler, J. J., Gauthier, I., & Palmeri, T. J. (2011). Indecision on decisional separability. Psychonomic Bulletin & Review, 18(1), 1–9.
Maddox, W. T., & Ashby, F. G. (1996). Perceptual separability, decisional separability, and the identification- speeded classification relationship. Journal of Experimental Psychology: Human Perception & Performance, 22, 795–817
Maddox, W. T., Ashby, F. G., & Waldron, E. M. (2002). Multiple attention systems in perceptual categorization. Memory and Cognition, 30, 325–339.
Mestry, N., Wenger, M. J., & Donnelly, N. (2012). Identifying sources of configurality in three face processing tasks. Frontiers in Perception Science, 3, 456.
Navarro, D. J., Griffiths, T. L., Steyvers, M., & Lee, M. D. (2006). Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology, 50(2), 101–122.
Pell, P. J., & Richards, A. (2013). Overlapping facial expression representations are identity-dependent. Vision Research, 79(7), 1–7.
Preacher, K. J., & Merkle, E. C. (2012). The problem of model selection uncertainty in structural equation modeling. Psychological Methods, 17(1), 1.
Richler, J. J., Gauthier, I., Wenger, M. J., & Palmeri, T. J. (2008). Holistic Processing of Faces: Perceptual & Decisional Components. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(2), 328–342.
Schweinberger, S. R., Burton, A. M., & Kelly, S. W. (1999). Asymmetric dependencies in perceiving identity and emotion: Experiments with morphed faces. Perception & Psychophysics, 61(6), 1102–1115.
Schweinberger, S. R., & Soukup, G. R. (1998). Asymmetric relationships among perceptions of facial identity, emotion, and facial speech. Journal of Experimental Psychology: Human Perception and Performance, 24(6), 1748–1765.
Silbert, N. H. (2012). Syllable structure and integration of voicing and manner of articulation information in labial consonant identification. The Journal of the Acoustical Society of America, 131(5), 4076–4086.
Silbert, N. H., & Thomas, R. (2013). Decisional separability, model identification, and statistical inference in the general recognition theory framework. Psychonomic Bulletin & Review, 20(1), 1–20.
Soto, F. A., & Wasserman, E. A. (2011). Asymmetrical interactions in the perception of face identity and emotional expression are not unique to the primate visual system. Journal of Vision, 11(3).
Stankiewicz, B. J. (2002). Empirical evidence for independent dimensions in the visual representation of three-dimensional shape. Journal of Experimental Psychology: Human Perception and Performance, 28(4), 913–932.
Thomas, R. (2001). Perceptual interactions of facial dimensions in speeded classification and identification. Attention, Perception, & Psychophysics, 63(4), 625–650.
Thomas, R. D., & Silbert, N. H. (2014). Technical clarification to Silbert and Thomas (2013): “Decisional separability, model identification, and statistical inference in the general recognition theory framework”. Psychonomic Bulletin & Review, 21(2), 574–575.
Ungerleider, L. G., & Haxby, J. V. (1994). “What” and “where” in the human brain. Current Opinion in Neurobiology, 4(2), 157–165.
Vogels, R., Biederman, I., Bar, M., & Lorincz, A. (2001). Inferior temporal neurons show greater sensitivity to nonaccidental than to metric shape differences. Journal of Cognitive Neuroscience, 13(4), 444–453.
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54(3), 426–482.
Yankouskaya, A., Booth, D. A., & Humphreys, G. (2012). Interactions between facial emotion and identity in face processing: Evidence based on redundancy gains. Attention, Perception and Psychophysics, 74(8), 1692–1711.
Author Note
Preparation of this article was supported in part by AFOSR grant FA9550-12-1-0355, NIH (NINDS) Grant No. P01NS044393, and by Grant No. W911NF-07-1-0072 from the U.S. Army Research Office through the Institute for Collaborative Biotechnologies. The US government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the US Government.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Table S1
(PDF 35 kb)
Appendix
Appendix
Here we prove that the problem of non-identifiability of decisional separability described by Silbert and Thomas (2013) occurs in GRT-wIND only in the special case in which the decision bounds of all participants for each dimension are parallel to each other. We also describe procedures to: (1) estimate the parameters of a GRT-wIND model from identification data using maximum likelihood estimation, (2) run statistical tests for perceptual independence, perceptual separability and decisional separability, and (3) estimate parameters and test different types of independence as in previous applications of GRT.
Identifiability of decisional separability in the 2 × 2 GRT-wIND model
Silbert and Thomas (2013) showed analytically that a failure of decisional separability is non-identifiable in the Gaussian GRT model for a 2 × 2 identification experiment. That is, if the data from an experiment can be fit by a GRT model in which decisional separability fails, then it is always possible to find a different GRT model in which decisional separability holds and that predicts the exact same data pattern. We call this result the Silbert-Thomas non-identifiability, or STn for short.
We start by summarizing the proof offered by Silbert and Thomas (2013). Their theorem states that “Any perceptually separable but decisionally nonseparable configuration can be transformed to a configuration that is perceptually nonseparable, decisionally separable, and equivalent with respect to predicted response probabilities” (p. 17). Thus, they focus on the case in which the original configuration exhibits perceptual separability but violations of decisional separability. However, the more general result is that decisional separability is nonidentifiable in this model. As the authors indicate, “Any arbitrary (and, in general, not perceptually separable) linear bound model without decisional separability can be rotated and sheared to produce a model with decisional separability […], failure of decisional separability is never identifiable in this model” (pp. 4–5).
The proof for this theorem starts with a configuration without decisional separability and that has been translated so that the origin of the xy-plane coincides with the intersection of the two decision bounds h A and h B. The angle between h B and the x-axis is represented by ϕ and the angle between the bounds h A and h B is represented by ω. Decisional separability holds when ϕ = 0 and ω = π/2. Rotation of the original configuration by ϕ degrees brings h B to be parallel to the x-axis (and orthogonal to the y-axis), achieving decisional separability of component B from A. The horizontal shear transformation has the property of changing the angle between all lines in the plane except those parallel to the x-axis. Thus, for any value of ω, a horizontal shear transformation can be found that brings this angle to π/2 while keeping h B parallel to the x-axis, thus achieving decisional separability of component A from B while also keeping decisional separability of component B from A.
The rotation and shear transformations can be represented by the transformation matrices L 1 and L 2, respectively, which combine to produce:
This is an area-preserving affine transformation. The change-of-variables theorems for densities guarantees that probabilities will be preserved under such transformation (Billingsley 2012). This means that the predicted probabilities of correct responses in the original configuration and the decisionally-separable configuration are the same, as the values of the integrals involved do not change. The means and covariance matrices in the decisionally-separable configuration can be computed from the original means and covariance matrices by using the formulas:
In the remainder of this section, we show the conditions under which decisional separability is non-identifiable in GRT models with more than one bound per dimension. GRT-wIND and n x m GRT models with n > 2 and m > 2 are special cases of this general class. We start by identifying the conditions under which STn holds for a model with two bounds per dimension. It is then straightforward to see that the same conditions apply for any larger number of bounds per dimension.
Theorem
In a Gaussian GRT model with two dimensions and two linear bounds per dimension, where the ith bound for dimension A is represented as h Ai and the jth bound for dimension B as h Bj , the non-identifiability of decisional separability identified by Silbert and Thomas (2013) is true if and only if h A1 ║ h A2 and h B1 ║ h B2.
Proof
We first prove that if h A1 ║ h A2 and h B1 ║ h B2, then STn holds. As with the proof of STn, we start with a configuration without decisional separability that has been translated so that the origin of the xy-plane coincides with the intersection of h A1 and h B1. We represent the angle between h Bj and the x-axis as ϕ j and the angle between of h Ai and h Bj as ω ij . Because h B1 and h B2 are parallel to each other, but not parallel to the x-axis, they intersect the latter at congruent angles; that is, ϕ 1 = ϕ 2 . Thus, rotation of the original configuration by ϕ 1 degrees brings both h B1 and h B2 to be parallel to the x-axis and orthogonal to the y-axis, achieving decisional separability of component B from A. After rotation, it is still true that h A1 ║ h A2 and h B1 ║ h B2, because rotation preserves parallelism. This means that ω ij = ω for all i and j. Thus, a single shear tranformation can bring this angle to π/2, achieving decisional separability of component A from B while also keeping decisional separability of component B from A.
To complete the proof, we must show that if STn holds, then h A1 ║ h A2 and h B1 ║ h B2. For STn to hold, a decisionally-separable configuration must exist that can be found by applying an affine transformation L to an original configuration without decisional separability. By definition, in this decisionally-separable configuration h A1 ⊥ x, h A2 ⊥ x, h B1 ⊥ y and h B2 ⊥ y. Because two lines that are both perpendicular to a third line are parallel to each other, with all lines in the same plane, h A1 ║ h A2 and h B1 ║ h B2 in the decisionally separable configuration. To go from the decisionally separable configuration to the original configuration, we must apply the transformation L −1. This inverse transformation exists because both shear and rotation are invertible transformations. The inverse of an affine transformation is itself an affine transformation that conserves parallelism, so application of L −1 to the decisionally-separable transformation conserves the property that h A1 ║ h A2 and h B1 ║ h B2. Thus, if STn holds, then bounds must be parallel in the decisionally separable configuration as well as in the original configuration.
This completes the proof for the case in which there are two linear bounds per dimension. A corollary is that for models with more than two bounds per dimension, STn holds if and only if each bound in one dimension is parallel to each of the other bounds in that specific dimension.
Here we have exclusively dealt with part (i) of the theorem proposed by Silbert and Thomas (2013). Part (ii) of this theorem proposes that a configuration with mean shift integrality and decisional separability is unidentifiable from a configuration with perceptual separability and without decisional separability. This theorem also deals with the non-identifiability of decisional separability, so as before it only holds for models with more than one bound per dimension if those bounds are parallel. Furthermore, an additional condition for this theorem to hold is that all covariance matrices in the model must be identical (Thomas and Silbert 2014). This in general is not the case in GRT-wIND or in traditional GRT models for designs larger than 2 × 2, which allow for estimation of different variances and covariances for each perceptual distribution.
In conclusion, STn is not generally true in GRT-wIND or any other model with more than one bound per dimension. The non-identifiability of decisional separability arises in such models only under very specific circumstances.
Maximum likelihood estimation for GRT-wIND
The data from each participant in an identification experiment are summarized in a confusion matrix, with rows corresponding to each stimulus in the experiment, columns corresponding to each response, and response frequencies reported in each cell of the matrix. Let S 1 , S 2 , …, S n denote the n stimuli in an identification experiment and let R 1 , R 2 , …, R n denote the n responses. Let r ij denote the frequency with which the participant responded R j on trials when stimulus S i was presented. Finally, there are N participants in the experiment, indexed by k = 1, 2,…,N. Given a set of parameter values for the model, the likelihood of this confusion data is computed in two steps.
In the first step, the predicted confusion matrix of each participant is computed using standard methods. For example, the predicted probability that a participant responds R j on trials when stimulus S i was presented, denoted by P(R j |S i ), is computed by integrating the volume of the S i perceptual distribution in response region R j . A numerical approximation to this multiple integral can be computed efficiently using Cholesky factorization (Ennis and Ashby 2003; for a tutorial overview, see Ashby and Soto 2014).
The second step is to compute the log of the likelihood function for participant k:
These log-likelihoods are then summed across all participants:
The maximum likelihood estimates of the parameters in a GRT-wIND model are those that maximize the expression in Equation A5.
Statistical tests of independence with GRT-wIND
The large number of parameters in a GRT-wIND model makes the computational cost of using likelihood ratio tests and model selection procedures prohibitive. Thus, we recommend a deviation from the custom of computing such tests in GRT analyses. The strategy used here consists of fitting the full GRT-wIND model and testing maximum-likelihood parameter estimates against expected values from null hypotheses using a Wald test (Wald 1943).
Let \( \underset{\bar{\mkern6mu}}{\widehat{\theta}} \) be a column vector containing the maximum likelihood parameter estimates. The Wald test can be used to test any null hypothesis that can be expressed in the form of linear restrictions on \( \underset{\bar{\mkern6mu}}{\widehat{\theta}} \):
where R is a matrix with number of columns equal to the number of parameters and number of rows equal to the number of restrictions being tested, and \( \underset{\bar{\mkern6mu}}{q} \) is a column vector with number of rows equal to the number of restrictions being tested. For example, if we wanted to test the hypothesis that \( {\widehat{\theta}}_1=0 \), then R would have a single row (we are testing a single restriction) with a +1 in the first cell of that row and zeros in all other cells, while \( \underset{\bar{\mkern6mu}}{q} \) would have a single cell with a zero in it. If we want to additionally test the hypothesis that \( {\widehat{\theta}}_2-{\widehat{\theta}}_3=10 \), then we would add a second row to R with a +1 in the second column (corresponding to \( +{\widehat{\theta}}_2 \)) and –1 in the third column (corresponding to \( -{\widehat{\theta}}_3 \)), while \( \underset{\bar{\mkern6mu}}{q} \) would now have a second cell with the value 10 in it.
Null hypotheses are tested using the Wald statistic:
where []T represents matrix transpose. The statistic W has a chi-squared distribution with degrees of freedom equal to the number of restrictions being tested (the length of \( \underset{\bar{\mkern6mu}}{q} \)). Computing W requires the covariance matrix of the maximum likelihood estimates, which can be estimated using the Hessian of the log-likelihood function at the solution:
Usually the Hessian in Eq. A4 can be obtained from the same optimization software that is used to obtain the parameter estimates that maximize the log-likelihood, but better estimates are obtained from numerical differentiation software. In this study, we used the DERIVEST suite (D’Errico 2006) to obtain estimates of the Hessian.
For the 2 × 2 identification design used here, the restrictions imposed on the model by perceptual separability of dimension A from dimension B are the following:
The restrictions imposed in the model by perceptual separability of dimension B from dimension A are the following:
The restrictions imposed in the model by perceptual independence in each of the perceptual distributions are the following:
The Wald test allows tests of decisional separability for the whole group or for each participant individually. Here, we focus on the latter kind of test. Testing whether decisional separability of dimension A from dimension B holds in participant k involves a single restriction:
Testing whether decisional separability of dimension B from dimension A holds in participant k involves the following restriction:
Model fit and selection in the traditional GRT approach
To fit any GRT model to data (e.g., the models in the hierarchy shown in Fig. 5), the confusion matrix from a single participant is used to find the values of the free parameters that maximize Eq. A5.
A popular method to test assumptions about independence and separability is to fit a restricted and an unrestricted version of the model to data. The restricted model contains a number of parameters that are set to values reflecting the assumption under test. For example, testing perceptual independence would require setting all ρ parameters to zero. The same parameters would be free to vary in the unrestricted model. Once both models are fit to data, the likelihood of the data at the solutions (L U and L R for the unrestricted and unrestricted versions, respectively) can be used to run a likelihood ratio test, by computing the following statistic:
which follows a Chi-squared distribution with degrees of freedom equal to the difference in number of free parameters between the two models.
The likelihood ratio test can only be applied to select between two nested models. To select between two non-nested models, it is possible to use the Akaike information criterion (AIC, Akaike 1974) for model comparison. Here we use a version of AIC corrected for a bias problem present when the number of data points is small compared to the number of free parameters (see Burnham and Anderson 2004):
where m is the number of free parameters in the model and n 2 is the number of cells in the confusion matrix. The first two terms in Eq. A7 correspond to the traditional definition of AIC and the last term corresponds to the correction factor. A smaller value of AIC represents a better fit of the model to the data.
In the present study, as in previous model-based applications of GRT (e.g., Ashby and Lee 1991; Ashby et al. 2001; Fitousi and Wenger 2013 Thomas, 2001), a hierarchy of models was fit to the data from each participant (see Fig. 5). The procedure starts at the top of the hierarchy and compares nested models through likelihood ratio tests until the test results in a non-significant increase in fit. If more than one candidate model survives this process, the model with the smallest AICC is selected.
Rights and permissions
About this article
Cite this article
Soto, F.A., Vucovich, L., Musgrave, R. et al. General recognition theory with individual differences: a new method for examining perceptual and decisional interactions with an application to face perception. Psychon Bull Rev 22, 88–111 (2015). https://doi.org/10.3758/s13423-014-0661-y
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13423-014-0661-y