Introduction

Fundamental differences exist between the perception of faces and objects. Faces are perceived more holistically than objects, and the spatial relationships between features are more important in faces than objects (Farah, Wilson, Drain, & Tanaka, 1998; Ge, Wang, McCleery, & Lee, 2006; Tanaka & Sengco, 1997). Similarly, while objects and faces are best recognized when viewed in an upright orientation, inversion produces a substantially greater recognition deficit with faces (Yin, 1969). This selective impairment is likely due to processing differences between faces and objects. Specifically, inversion may impair the ability to accurately encode the holistic or spatial information that is vital to face perception but less relied upon in object perception (Freire, Lee, & Symons, 2000; Kemp, McManus, & Pigott, 1990; Leder & Bruce, 2000; Rossion & Gauthier, 2002; Searcy & Bartlett, 1996).

Holistic processing binds the facial features to their spatial arrangement and the external contour of the face to produce a single integrated face percept (Sergent, 1984). This binding is considered essential to the recognition of both the whole face and its individual facial features (Tanaka & Farah, 1993) and does not appear to apply to inverted faces or objects (Goffaux & Rossion, 2006; Hole, George, & Dunsmore, 1999; reviewed in Maurer, LeGrand, & Mondloch, 2002). For example, viewing facial features in isolation or in the presence of a new configuration decreases feature recognition (Tanaka & Sengco, 1997), suggesting that the encoding of facial features is fundamentally intertwined with their spatial arrangement. This is not true for inverted faces. The recognition of features from an inverted face does not depend on context, implicating a piecemeal approach to processing (i.e. analytical processing).

To date, most research on face perception strives to understand facial processing by examining how it differs from object perception. However, here, we seek to understand face perception by doing the opposite. We ask: when are faces perceived like objects? This question derives from a study by Schwaninger, Ryf, and Hofer (2003) in which the authors found no effect of inversion on face perception. In this study, participants made estimates of the distances between facial features, but the accuracy of these estimates showed no benefit for upright faces. There was also no benefit for faces compared to lines. When estimation accuracy was compared between facial feature distances and equivalent line lengths, participants displayed a strong face inferiority effect (Suzuki & Cavanagh, 1995) in which the amount of estimation error was greater for faces than for lines.

The failure to find a face inversion effect in facial feature distance estimation is surprising given the large body of literature supporting an inversion effect with face perception (e.g., Bartlett & Searcy, 1993; Thompson, 1980; Young, Hellawell, & Hay, 1987), recognition and discrimination (e.g., Freire et al., 2000; Le Grand, Mondloch, Maurer, & Brent, 2001; Searcy & Bartlett, 1996; Tanaka & Farah, 1993; Tanaka & Sengco, 1997; Yin, 1969; Yovel & Kanwisher, 2004). Therefore it seems that a fundamental difference exists between the processes used for distance estimation and those used for more general face perception tasks. This discontinuity may be explained by the differences between face and object processing. Specifically, the resilience of the facial feature distance estimates observed by Schwaninger et al. (2003) to inversion and the benefit for line length estimation suggests that distance estimation requires analytical processing, a process typically reserved for non-face objects. This would result in sub-optimal performance for the holistically processed upright faces.

It may seem counterintuitive for a disadvantage in distance estimation to occur with faces, especially since face perception is highly sensitive to the spatial relationships between facial features. However, we are suggesting that this assessment focuses on absolute distances rather than relative placements, therefore requiring a part-based judgment. For example, this kind of facial feature distance estimation may be similar to the estimation of the distance between two points in a bisection task. In a bisection task, participants view a spatial distance that is divided by a central marker into two parts. This central marker is located either on or near the middle of the original spatial distance, and participants judge the equality of the two subdivisions (Levi & Klein, 1992). If the estimation of facial feature distances also uses this method of processing, then this could create an exception to the effect of inversion on the perception of spatial relations in a face.

If distance estimation requires analytical processing, then we can predict a lack of an inversion effect or perhaps even an inversion benefit for tasks requiring analytical processing in faces. Here, we used four experiments to test the ability to discriminate differences in facial feature distances when the faces varied in horizontal or vertical compression. This manipulation primarily altered the horizontal distance between the eyes or the vertical distance between the eyes and the mouth. We converted paired comparison judgments of distance to thresholds for a just-noticeable difference. If such distance comparisons benefitted from holistic processing, then we expected to find a strong inversion effect. Otherwise, there should be no impairment with inversion.

Experiment 1

Experiment 1 assessed individual sensitivity to differences in center-compression and center-expansion: a manipulation that primarily altered the distance between the eyes. These faces are similar to those used by Webster and MacLin (1999) and Rhodes, Jeffery, Watson, Clifford, and Nakayama (2003) and are displayed in Fig. 1. We tested whether the perception of differences in interocular distance is better in upright rather than inverted faces. Since contrast-negation impairs the discrimination of differences in facial feature distances (Kemp et al. 1990), we also presented contrast-negated faces. Participants viewed a reference face and used the method of adjustment to make a comparison face just-noticeably more compressed or more expanded.

Fig. 1
figure 1

Experiment 1 stimuli. a The contour of displacement followed by the morphing algorithm. b Reference faces produced by the morphing algorithm described in Experiment 1

Method

Participants

Participants included three members of the UCSD Vision Laboratory, and seven University of California, San Diego graduate students. The graduate students participated in exchange for $10 an hour. Vision in all subjects was normal or corrected to normal.

Stimuli

An average Caucasian male face was created from 32 photographs of Caucasian males using the method described in Levin (2000). Using Matlab 7.1, a photograph of an average Caucasian male was distorted using a procedure similar to that described in Webster and MacLin (1999), and is described as follows.

We distorted the average face by horizontally expanding or contracting relative to a midpoint between the eyes (x m , y m ). The amount of pixel displacement was proportional to the horizontal derivative of a circular Gaussian envelope. This caused the displacement to be maximized at one SD away from the midpoint between the eyes. Therefore, when (x i , y i ) represents a given pixel, the horizontal distance between the pixel and the midpoint is

$$ {\hbox{x}}_{\rm{i}}^{\prime } = {{\hbox{x}}_{\rm{i}}} - {{\hbox{x}}_{\rm{m}}} $$

and the vertical distance between the pixel and the midpoint is

$$ {\hbox{y}}_{\rm{i}}^{\prime } = {{\hbox{y}}_{\rm{i}}} - {{\hbox{y}}_{\rm{m}}}{.} $$

In the distorted image, let the shift applied to pixel i be Δxi. Then,

$$ {\hbox{Formula}}\,{1}{.}\,\Delta {{\hbox{x}}_{\rm{i}}} = \alpha \,*{\hbox{x}}_{\rm{i}}^{\prime }*\exp \left( { - \left( {{\hbox{x}}{{_{\rm{i}}^{\prime }}^2} + {\hbox{y}}{{_{\rm{i}}^{\prime }}^2}} \right)/2{\sigma^2}} \right) $$

where the amplitude (α) of the distortion varied from –1 to 1 in steps of .04, and σ = 0.18 times the face width. The maximum displacement when α = 1 occurred at xi – xm = σ and was 27.3 pixels. This displacement became proportionally smaller for smaller αs. This algorithm could also create partial pixel displacements through luminance interpolation, allowing for fine-tuning of a participant’s sensitivity to horizontal displacement.

This differs from the Webster and MacLin (1999) algorithm by including y' as well as x' in determining the displacement. This produces the greatest horizontal displacement between the eyes and less horizontal displacement as the vertical distance from the eyes increased. Therefore, in our algorithm, the horizontal distortion was localized to the region of the eyes, and the external contour of the face remained practically unaltered. Figure 1a displays the contour of the displacements in a maximally compressed face. Using Formula 1, we generated 51 distorted faces that systematically varied in expansion and contraction. Figure 1b displays a sample of the resulting faces. These faces were then inverted and contrast-negated to create two additional face sets.

A reference face was presented in the upper left corner of the display and a comparison face was in the lower right corner. Stimuli were presented in an offset manner to prevent the use of low-level visual cues such as edge matching. Reference stimuli (Fig. 1b) in all three conditions included 5 different distortions: expanded by 20.7 pixels (high expansion); expanded by 10.9 pixels (moderate expansion); the average; compressed by 10.9 pixels (moderate compression); and compressed by 20.7 pixels (high compression). The initial comparison face was selected to be randomly between four and six distortions away from the reference.

The selection of this as a starting point was driven by the desire to improve the efficiency of our methods. By starting the participant near the expected threshold point, we expected to reduce the amount of adjustment needed to determine a just-noticeable difference compared to starting at either of the extremes. The starting distortions were counterbalanced so as to not produce bias across trials. We did not want to start with a comparison face that matched the reference face, since participants could then produce small thresholds through non-vision-based strategies (e.g., making a single increment adjustment).

Stimuli were viewed on a 51-cm Iiyama HM204DT A CRT-monitor with a gray background using Matlab 7.1 and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). Participants were seated at a distance of 90 cm. Images were 640  ×  480 pixels and presented at a resolution of 1,200  ×  1,024 pixels. On average, each face encompassed 4.7  ×  6.6 degrees visual angle.

Design

Participants made six just-noticeably compressed and six just-noticeably expanded judgments for each upright, inverted, and contrast-negated reference face. The direction of the judgment, compressed or expanded, was counterbalanced. Face type was pseudo-randomized, and reference distortion order was randomized. For example, a participant made compressed judgments for all the upright reference faces followed by all the inverted reference faces, and then all the contrast-negated reference faces. Then the participant made expanded judgments for all of the contrast-negated faces, followed by all of the inverted faces, and finally all of the upright faces. There were 180 total trials for each subject.

To assess the validity of the thresholds obtained by the method of adjustment, inverted faces were also tested using a staircase procedure. Participants responded to whether the comparison face was more compressed or more expanded than the reference following one of two randomly interleaved staircases. Each staircase contained a 4:1 step size that tracked the 20 and 80% more-compressed points. Therefore, in one stair case, responding “more compressed” made the comparison face more expanded by 4 distortion units, and a response of “more expanded” made the comparison face more compressed by 1 distortion unit. In the second staircase, it was the opposite; a response of “more compressed” made the comparison face more expanded by 1 distortion unit, and a “more expanded” response caused the comparison face to become more compressed by 4 distortion units. Tracking these percentages, rather than 25 and 75%, maximized the efficiency of the staircase procedure. There were 250 total trials. Inverted face stimuli and presentation were the same as in the method of adjustment task. Thus, participants experienced 50 trials with the highly expanded reference face, 50 trials with the moderately expanded reference face, 50 trials with the average reference face, 50 trials with the moderately compressed reference face, and 50 trials with the highly compressed reference face.

Procedure

Each method of adjustment trial commenced with a beep. Then, participants viewed a reference face and a beginning comparison face. Participants panned through the continuum of 51 center-compressed and center-expanded comparison faces using a mouse and viewed each comparison face one at a time. The just-noticeably compressed and just-noticeably expanded faces were selected with a click of a mouse button. The next trial began automatically. No feedback was provided.

In the staircase procedure, each trial commenced with a beep. Then, participants viewed a reference face and a comparison face. Participants pressed the left arrow key on the keyboard if the comparison face appeared more compressed and the right arrow key if the comparison face appeared more expanded. The next trial began automatically. No feedback was provided.

Data analysis

Using the data from the method of adjustment, thresholds were obtained by taking the number of distortion steps between just-noticeably compressed (JNC) selections and the just-noticeably expanded (JNE) selections and dividing it by 2. Thus, thresholds represented the shift in distortion steps necessary to perceive a difference between the point of subjective equality for the reference face and the comparison face, regardless of direction (i.e. compression or expansion). This was computed separately for each participant, each face type, and each reference face. Every four distortion steps roughly equaled one pixel of displacement. Distortion thresholds were tested for significance using a 3 × 5 within-subjects ANOVA with face type and reference face as repeated measures variables in SPSS 11.0.1.

Thresholds for the staircase data were determined for each reference face by fitting the proportion of compressed and expanded responses for each participant and each reference to a logistic function. Psychometric functions were fitted using psignifit version 2.5.6 (see http://bootstrap-software.org/psignifit/), a software package which implements the maximum-likelihood method described by Wichmann and Hill (2001) and runs in Matlab 7.1. These thresholds represented the amount of displacement necessary to perceive a difference regardless of direction. The staircase thresholds were compared to the discrimination thresholds using a paired t test for each reference face in SPSS 11.0.1.

Results

Face inversion and contrast negation did not clearly affect discrimination, p = .46. The mean threshold in distortion steps was 17 (SE = 1.8) for upright faces, 18 (SE = 2.1) for inverted faces, and 20 (SE = 2.9) for contrast-negated faces. Table 1 contains the mean thresholds and standard deviations for each face type and reference face. Figure 2 displays the threshold profiles for the upright, inverted, and contrast-negated faces.

Table 1 Horizontal discrimination thresholds in experiment 1
Fig. 2
figure 2

Discrimination thresholds in Experiment 1. Smaller thresholds indicate better discrimination of horizontal spacing change. Error bars 95% confidence interval, n = 10. Upright face discrimination (diamonds), inverted face discrimination (squares), and contrast-negated face discrimination (triangles)

The level of distortion in the reference face had a strong effect on discrimination thresholds, F(4, 36) = 12.2, p < .001. Overall, discrimination was best between the highly expanded faces, and sensitivity decreased roughly linearly as compression increased. This was confirmed by a significant linear trend, F(1, 9) = 18.1, p = .002. A non-significant quadratic trend further demonstrated that there was no benefit for discrimination between natural faces, suggesting the use of non-face specific processing. Discrimination was worst between the maximally compressed faces. These results remained true regardless of the type of face presented, F(8, 72) = 1.71, p = .11.

There was no difference in threshold based on method, all ps > .05, indicating that the method of adjustment produced valid discrimination thresholds. Mean thresholds for inverted faces in the staircase procedure are presented alongside the inverted face thresholds from the method of adjustment in Table 2.

Table 2 Method of adjustment and staircase thresholds for inverted faces in experiment 1

Discussion

There was no benefit for geometrical discrimination between natural faces and no effect of inversion. Contrast-negation slightly decreased the ability to discriminate, raising the thresholds by a factor of about 1.2. Since the upright, natural faces were the most similar to a prototypical face, they should also be the most likely to engage in holistic processing. If true, then these thresholds should also be smaller due to greater sensitivity to differences in the spatial relations of facial features. The failure to observe such a benefit for upright, natural faces suggests that participants did not engage in holistic processing. Instead, these results are consistent with the predictions from analytical processing.

The type of distortion applied to our faces may, however, limit our ability to detect an inversion effect. There is some evidence that, unlike differences in vertical facial feature distances (e.g., eye-to-mouth distance), the perception of shifts in horizontal spacing is resilient to inversion. Specifically, Malcolm, Leung, and Barton (2004) demonstrated that vertical shifts in mouth location within an inverted face were the least detected type of spacing change. This corresponds well with the results reported by Goffaux and Rossion (2007) in which inversion significantly hindered the perception of difference in vertical but not horizontal facial feature distances. Goffaux and Rossion proposed that vertical information is more sensitive to inversion than horizontal information because the facial features are organized primarily around the vertical axis.

Participants may also have done the task by assessing differences in local features such as the width of the nose or the brightness of the space between the eyes. The task itself is also highly unnatural. If such discrimination were to occur in the natural environment, it would happen between identities not within a single identity. Therefore, the extrinsic nature of the task in Experiment 1 may not have invoked the entire set of face processing mechanisms, preventing the detection of an inversion effect.

To rectify each of these concerns, Experiment 2A sought to replicate the results of Experiment 1, by systematically altering the vertical spacing in faces with different identities. Using multiple identities removed the ability to rely on local cues such as length of the nose, while improving the ecological validity of the task (Fig. 3).

Fig. 3
figure 3

Upright reference faces (top) and comparison face matches (second row), inverted comparison faces (third row), and contrast-negated comparison faces (bottom) in Experiment 2A

Experiment 2A

Experiment 2A examined the influence of inversion and contrast-negation on the discrimination of faces that varied in eye-to-mouth distance and identity.

Method

Participants

46 undergraduates from the University of California, San Diego, with normal or corrected-to-normal vision participated in exchange for course credit.

Stimuli

We aligned the eyes and mouths of 22 grayscale male Caucasian faces with the eyes and mouth of the averaged male Caucasian face from Experiment 1. Using Matlab 7.3 and the method described in Experiment 1, we then produced 41 levels of distortion for each face ranging in even vertical steps from highly compressed to highly elongated. The algorithm for vertical displacement varied only slightly from Formula 1. No horizontal distances were included in the equation, so the distortion affected each pixel’s y-coordinate alone:

$$ {\hbox{Formula 2}}{. }\Delta {{\hbox{y}}_{\rm{i}}} = \alpha \times {\hbox{y}}_{\rm{i}}^{\prime } \times \exp \left( { - \left( {{\hbox{y}}_{\rm{i}}^{{\prime 2}}} \right)/2{\sigma^2}} \right) $$

where the amplitude (α) of the distortion varied from –1 to 1 in steps of .05 and σ = .20 of the face length. This resulted in a maximum pixel displacement when α = 1 of 42.5 pixels. From these faces, a continuum of 41 differentially distorted comparison faces was created, in which the adjacent levels of distortion originated from different individual faces.

Five reference faces were selected as in Experiment 1.Footnote 1 The identity used for the references faces was different from the identities of the comparison faces. This prevented participants from providing artificially low thresholds by matching the identities. Figure 3 displays the reference faces (top) as well as their comparison face matches (bottom).

Each face was contrast-negated with respect to the average pixel intensity of the image, and inverted to create a total of three different face types: upright, inverted, and contrast-negated. All faces were placed into a black oval frame to minimize the effect of different external facial contours and hair.

All stimuli were viewed on a gray background. The images were 320 × 240 pixels displayed at a resolution of 1,200 × 1,024 pixels using Matlab 7.3 and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997).

Design and procedure

Participants made JNC and just-noticeably elongated (JNL) judgments following the method of adjustment described in Experiment 1. Thus, each participant experienced 60 trials with contrast-negated faces, 60 trials with inverted faces, and 60 trials with upright faces for 180 total trials.

Data analysis

Distortion thresholds were measured in the manner described for Experiment 1. Thresholds were analyzed using a 5 × 3 × 2 mixed-model ANOVA in SPSS 11.0.1, with reference face and face type as the within-subjects factors and judgment order as the between-subjects factor.

Results

There was no prohibitory effect of inversion or contrast-negation on the discrimination of differences in facial feature distances,Footnote 2 p = .81. Experiment 2A completely replicated Experiment 1. There was no main effect or significant interactions with judgment order; therefore this was removed as a factor in the data analysis. Table 3 contains the threshold means and standard deviations for discrimination, and Fig. 4 displays the threshold profiles for each face type.

Fig. 4
figure 4

Discrimination results for Experiment 2A. Smaller thresholds indicate better discrimination of vertical spacing change. Error bars 95% confidence interval, n = 46. Upright face discrimination (diamonds), inverted face discrimination (squares), and contrast-negated face discrimination (triangles)

Table 3 Vertical discrimination thresholds in Experiment 2A

Discrimination varied significantly with the reference face, F(2.43, 109) = 54.6, p < .001. Similar to Experiment 1, a non-significant quadratic trend indicated no benefit for natural faces. Rather, discrimination was best around the maximally elongated reference face and became increasingly worse as compression increased following a significant linear trend, F(1, 45) = 96.7, p < .001. This again suggests that our experience with natural faces does not facilitate discrimination.

There was a trend for an interaction between reference face and face type, F(5.33, 240) = 2.99, p = .07, with a slight benefit for inverted faces (compared to upright faces) at the original and moderately compressed distortions and a mild benefit for upright faces (compared to inverted faces) at the more extreme distortions. Thresholds for the contrast-negated faces generally fell between the upright and inverted face thresholds.

Discussion

These data clearly show that there is no substantial benefit for the perception of facial feature distances in upright faces. There was also no benefit for discrimination between natural faces. Instead, sensitivity decreased as compression increased, and if anything, inversion improved performance with the natural faces. Therefore, Experiment 2A successfully replicated the results from Experiment 1 and supports our original suggestion that the geometrical discrimination task requires part-based processing and not holistic comparisons. Moreover, these results indicate that our inability to find a face inversion effect in Experiment 1 did not result from greater resilience of horizontal manipulations to inversion, as suggested by Goffaux and Rossion (2007). This can be understood if the nature of the task leads participants to treat facial feature distances like the distance between two points in a bisection task, in effect causing each face to be processed like an object. If true, then the effects traditionally observed with a face discrimination task should be absent. Maurer et al. (2002) originally alluded to this idea when discussing pilot results from a face discrimination task that failed to produce an inversion effect. They proposed that the missing inversion effect resulted from the use of an alternative, non-holistic processing strategy in which the distances between facial features were treated as features in their own right.

Given the difference between our results and the face inversion effect found in a standard discrimination task in which participants make same/different judgments (e.g., Freire et al., 2000; Goffaux & Rossion, 2007; Le Grand et al., 2001; Searcy & Bartlett, 1996; Yovel & Kanwisher, 2004), we will next show that engagement in holistic processing is task dependent by having participants engage in a more holistic task: categorization.

Experiment 2B

Experiment 2B considers the effect of inversion on face categorization by having participants categorize faces as elongated or compressed. Face categories develop to describe upright faces, and face-specific (i.e. holistic) processing is essential to the formation of these categories (McKone, Martini, & Nakayama, 2003). Therefore, we expect the categorization of elongation and compression to rely on holistic processing. If true, then we should observe greater sensitivity to variation at the category boundary with upright faces and little sensitivity to such variation with inverted and contrast-negated faces.

Method

Participants and stimuli

The participants in Experiment 2A also participated in Experiment 2B. We used the 41 compressed and elongated faces from Experiment 2A in their upright, inverted, and contrast-negated form. In addition, we created a mask by randomizing the pixels in the average male Caucasian face to avoid discontinuities in average lightness and then placing it in a black oval frame.

Design

Participants categorized the 41 distorted test faces from Experiment 2A. Face presentation order was pseudo-randomized. Although 20 of the 22 original faces contributed both one elongated face and one compressed face to the final set of 41 vertically distorted faces, participants never viewed the same identity twice in a row. As in Experiment 1, face type was pseudo-randomized and participants categorized all of one face type (e.g., upright faces) before categorizing another face type (e.g., inverted). There were 123 categorization trials.

Procedure

Each trial began with a beep, followed by a 500-ms mask. Then, the mask was removed, and participants viewed the to-be-categorized face. If the face appeared elongated, the participant depressed the left arrow key, and if it appeared compressed, the participant depressed the right arrow key. Choice response terminated the trial. No feedback was provided.

Data analysis

Categorization data for each condition were aggregated across participants and assessed using psignifit version 2.5.6 (Wichmann & Hill, 2001) in Matlab 7.3. The psychometric fits for each condition produced estimates of the distortion level that was perceived as elongated 50% of the time, as well as the slope and the 95% confidence interval of the slope at the threshold distortion level. This slope represents the pooled sensitivity to differences in vertical configuration. It is the amount of increase in elongation necessary to elicit a perceptible change in elongation. If that sensitivity is high, then the slope should be large, but if the participants were relatively insensitive to small deviations in vertical spacing, then the slope should be small.

Results

Participants displayed a clear face inversion effect. The 95% confidence intervals for the slopes of the fits for upright (slope = .03, 95% CI = .0057) and contrast-negated faces (slope = .027, 95% CI = .0051) overlapped with each other and were both significantly steeper than with inverted faces (slope = .018, 95% CI = .0028). Psychometric fits of each face type are plotted in Fig. 5.

Fig. 5
figure 5

Proportion of compressed categorization responses for upright, inverted, and contrast-negated faces in Experiment 2B, n = 46. 20 and 80% compressed thresholds with 95% confidence intervals are shown

Discussion

Unlike Experiment 2A, here we found a clear effect of inversion. Participants were more sensitive to differences in elongation and compression when the faces were upright or contrast-negated rather than inverted. This suggests that the categorization of elongation and compression in faces relies on holistic processing, and that there is a fundamental difference between the processing mechanisms used for categorization compared to geometrical discrimination. If true, then engagement in holistic processing is modulated by not only the stimulus (e.g., face vs house) but also the nature of the task.

It is interesting to note that categorization was relatively unaffected by contrast-negation. This result is unexpected, since contrast-negation generally produces an impairment similar to that of inversion. However, since the focus of the current paper is on inversion, we will not focus on this effect but also suggest that future research explore what types of face information are preserved with contrast-negation.

Although Experiments 2A and 2B appear to show a dichotomy between categorization and geometrical discrimination, Barton, Keenan, and Bass (2001) found that the effect of inversion on the discrimination of spatial relations decreases with increased viewing time. Therefore, the unlimited viewing duration used in Experiment 1 and 2A may not have captured the inversion effect. Moreover, it is not uncommon for the effect of inversion to appear only in response times. For example, the original demonstration of the composite-face effect involved simultaneous comparisons and showed a FIE and composite-face effect in response times only (Young et al., 1987). Since Experiments 1 and 2A did not measure response times, it is possible that participants did experience an inversion effect, but we did not record it.

Moreover, using sequential rather simultaneously presented comparison faces may also improve our ability to detect a FIE in discrimination. Recent experiments on the composite-face effect show that when comparison faces are presented sequentially, participants’ accuracies demonstrate clear composite-face effects (Richler, Gauthier, Wenger, & Palmeri, 2008; Richler, Tanaka, Brown, & Gauthier, 2008). Therefore, Experiment 3 tests discrimination when the reference and comparison faces are presented one at a time and measures response times.

In a review of the FIE, Valentine (1988) cited inconsistencies in the ability to obtain a FIE in matching tasks compared to recognition tasks and thus questioned whether the FIE involves comparison to a memory trace. Wenger and Ingvalson (2002) additionally suggested that holistic processing may develop through the use of memory. Although Jacques, d’Arripe, and Rossion (2007) found evidence for the FIE within 170 ms by using adaptation in an event-related potential design, this does not preclude an effect of inversion in the later processing stages. Therefore, to give our task the best chance of exhibiting a FIE, we inserted delays of 0, 250, and 5,000 ms between the reference and comparison faces. If an inversion effect is obtained, then these delays would also allow us to assess the time course of the impairment and whether it occurs at the iconic store, short-term store or long-term store (Baddeley, 1997).

Experiment 3

Method

Participants

Participants included 20 undergraduates or members of the University of California, San Diego community. Community members participated in exchange for $10 an hour and undergraduates received course credit. Vision in all participants was normal or corrected to normal.

Stimuli

The faces were the same as in Experiment 2A, except there were no contrast-negated faces. In addition, we added a Gaussian blur to the mask in Experiment 2B. This eliminated the presence of lines that could be used as a reference during the delay period.

Design

The reference and comparison faces were separated by a delay of 0, 250, or 5,000 ms. There were two sessions for this experiment. In each session, faces were presented in one orientation—either upright or inverted. The order of face orientation was counterbalanced between subjects. Trials were blocked by delay, and the order of the delays and reference faces were randomized. Each session was separated by at least 1 day. As described in Experiment 1, participants responded to whether the comparison face was more compressed or more elongated than the reference following one of two randomly interleaved staircases. Each staircase contained a 4:1 step size that tracked the 20 and 80% more-compressed points.

Procedure

Participants heard a beep, viewed a 150-ms fixation, then a reference face. After 1,000 ms, a mask replaced the reference face for 0 (no mask), 250, or 5,000 ms, after which a comparison face appeared and remained on display until the participant indicated whether it was more compressed or elongated than the reference face. Participants did this by pressing either the left arrow key or the right arrow key on the computer keyboard. Choice response terminated the trial. There was no feedback. At the end of each block, participants were offered a break. There were 250 trials per block and 750 trials per session.

Data analysis

Response data for each participant were analyzed separately for each combination of variables using Functional Adaptive Sequential Testing (FAST) designed by Vul and MacLeod (2007). Using the individual subject data as a basis for its simulations, FAST identified the parameters of the logistic psychometric function, in particular the threshold parameter, for which the probability of the data was maximized. The threshold was the inter-quartile range expressed in distortion steps. These were subjected to a 2 × 3 × 5 within subjects ANOVA in SPSS 11.0.1 with inversion, delay, and reference distortion as repeated measures factors.

Response data were also combined across participants and analyzed using FAST (Vul & MacLeod, 2007). By computing this for each condition and collapsing data across conditions, we obtained probabilities and consensus discrimination thresholds for models in which inversion, delay time, and reference distortion were and were not factors. Combining the data across participants reduced the impact of outlying individuals on the results and produced a model representative of the population. The effects of inversion and delay on the consensus discrimination thresholds were then assessed using t tests.

Response times were analyzed for each participant to determine the presence of outliers. Any RTs beyond two standard deviations away from the mean of that participant’s data were excluded as outliers. The remaining RTs were averaged for each participant to provide values for each combination of inversion, delay, and reference distortion. To improve the conformity of the RTs to a normal distribution, we analyzed the log of the RT data using a 2 × 3 × 5 repeated measures ANOVA in SPSS 11.0.1 with inversion, delay, and reference distortion as repeated measures factors.

Results

Based on the thresholds obtained from individual subjects, discrimination was unaffected by face inversion. The effect of inversion and the delay by inversion interaction were both non-significant, ps > .05. The main effect of delay, reference distortion, and all remaining interactions also failed to produce significant variation in discrimination, ps > .05, suggesting that our failure to find a face inversion effect in Experiments 1 and 2A did not result from the unlimited viewing duration.

The consensus data further supported these results; inversion did not significantly affect discrimination, t(14) = 0.90, p = .38. Delay length also did not affect discrimination, although there was a strong trend for worse discrimination after a 5,000-ms delay compared to a 250-ms delay, t(9) = 2.13, p = .051. Critically, there was no evidence of an interaction between delay and inversion on face discrimination (Fig. 6a). Therefore, the ability to detect a face inversion effect does not appear to depend on the length of memory storage.

Fig. 6
figure 6

Experiment 3 discrimination results, n = 20. HE Highly elongated reference, ME moderately elongated reference, O original undistorted reference, MC moderately compressed reference, HC highly compressed reference. a Discrimination thresholds determined by FAST (Vul & MacLeod, 2007) for the pooled subject data. b Average response times when discriminating between geometrically distorted upright and inverted faces, collapsed across delay length. Delay length did not interact with orientation or reference distortion. Upright face discrimination (diamonds), inverted face discrimination (squares)

Examination of response times likewise revealed no evidence that inversion affected the ability to perceive differences in compression or elongation, p = .33. This gives no support for the suggestion that subjects preserve accuracy for inverted faces by making a speed/accuracy trade-off.

Nor was there any superiority in performance for natural faces. Overall, participants responded fastest to the maximally compressed reference faces and significantly slower with the midrange reference faces, following a quadratic trend, F(1, 9) = 7.88, p = .011. This finding is further supported by a significant effect of reference distortion on response times, F(4, 76) = 3.57, p = .01. Participants’ response times varied with delay, F(2, 38) = 45.8, p < .001. Bonferroni comparisons indicated that participants took longer to respond after a 5,000-ms delay compared to a 0- or 250-ms delay, ps < .001.

If upright natural faces were processed more efficiently than inverted natural faces, we might expect to observe an interaction between orientation and reference distortion in the response times. This interaction was indeed significant, but notably, the polarity of the interaction is quite unexpected in that natural upright faces are the most slowly processed F(4, 76) = 2.94, p = .026 (Fig. 6b). On this evidence, the upright natural faces are processed not more but less efficiently. All remaining interactions were non-significant.

Discussion

Our results demonstrated an inverted inversion effect for response times in the geometrical discrimination task. Participants were able to make swifter discriminations between the more natural faces when these faces were inverted rather than upright. This reflects greater efficiency for discrimination between inverted faces with natural feature arrangements, a result that is contrary to the traditional face inversion effect.

Despite the sequential presentation design, neither inversion nor delay affected the discrimination thresholds. The longer response times observed with the 5,000-ms delay likely reflect the extra time needed to retrieve the memory of the reference face from long-term storage. This effect is not surprising and lends support to the validity of our design. More importantly, there was no interaction between delay and inversion. If face inversion affected the ability to transfer holistic information into long-term store, then the response times with a 5,000-ms delay should have been longer for inverted faces compared to upright faces. This was not the case. Therefore, our failure to find a face inversion effect in Experiments 1 and 2A is not due to the lack of a memory component.

In Experiment 2A, it seemed that the geometrical discrimination task induced subjects to shift from holistic processing to an analytical approach. The inverted inversion effect observed here supports this suggestion. There are two potential means for achieving object-based, non-holistic processing in faces. One possibility is that the viewer automatically engages in the type of processing most efficient for the task. Therefore, if geometrical discrimination is best served by a non-holistic strategy, the viewer will automatically use this processing approach. Alternatively, faces may automatically induce holistic processing. In this view, the participant must disengage from holistic processing and initiate a non-holistic processing strategy to complete the discrimination task. Since inversion promotes analytical processing (Goffaux & Rossion, 2006; Hole et al., 1999; Wenger & Ingvalson, 2002), this shift from holistic to part-based processing should be easy with inverted faces, producing faster response times. Upright faces, however, cause viewers to strongly engage in holistic processing. Therefore, if a non-holistic method is more beneficial for the task at hand, then the participant must initiate a shift to the non-holistic process, creating slower response times for upright faces.

Given the benefit of inversion for upright, natural faces, our results best support disengagement in holistic processing rather than automatically engaging in analytical processing. However, since both orientation and distortion influenced response times, this suggests that the depth of holistic processing also depends upon the degree to which the stimulus appears “face-like”, with more natural faces engaging more deeply in holistic processing. If true, then upright natural faces should produce the slowest responses and display the greatest benefit from inversion.

One possible impetus for the switch from holistic to analytical processing is selective attention. By selectively attending to the relevant part of the face (e.g., eye-to-mouth distance), participants may invoke analytical processing. In Experiments 4A and 4B, we test this by asking subjects to perform a same–different task rather than a geometrical discrimination task. A same-different task may encourage holistic processing more than the geometrical discrimination task, since the question of sameness or difference draws attention to the whole face rather than a specific part of the face (e.g., eye-to-mouth distance). Therefore, we would expect participants to display a traditional face inversion effect. This is tested in Experiment 4A. But if participants are asked to explicitly judge whether the eye-to-mouth distances are the same or different, as in Experiment 4B, we may instead observe an inverted inversion effect.

Experiment 4A

Method

Participants

Participants included 15 undergraduates at the University of California, San Diego community. Students received course credit in exchange for participation. Vision in all participants was normal or corrected to normal.

Stimuli

The morphing algorithm for vertical displacements described in Experiment 2A was applied to the average Caucasian male face used in Experiment 1 to produce 201 faces: 100 compressed faces, 100 elongated faces, and the original face. The only difference between morphing procedure in Experiment 2A and its application here is that the amplitude (α) of the distortion varied from –1 to 1 in steps of .01 rather than .04. Using a single male face rather than multiple identities allowed participants to do a same–different task without being told the nature of the difference. Although it is likely that participants learned how the faces varied over the course of several trials, we did not want to overtly direct attention to the nature of the difference since it is as yet unknown how selective attention may influence the task.

The original face was always the reference face and the comparison face was determined by a staircase procedure. All faces were placed into a black oval frame to prevent participants from making judgments based on differences in the external contour (e.g., ear length). Faces were presented in both upright and inverted orientations and were viewed on a gray background. The images were 640 × 480 pixels and displayed at a resolution of 1,024  ×  7,68 pixels on a 51-cm high resolution RGB Sony CRT monitor using Matlab 7.3 and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). Participants were seated at a distance of 50 cm.

Design and procedure

An adaptive staircase procedure based on the PEST method (Taylor & Creelman, 1967) was used to obtain each participant’s 80% threshold. Similar to Experiment 1, participants responded to whether the comparison face was more compressed or more elongated than the reference face following one of two randomly interleaved staircases. Each staircase contained a 4:1 step size that tracked the 20 and 80% more-compressed points. A correct response adjusted the distortion level of the comparison face so that it was closer to the reference face by one step size, while an incorrect response shifted the distortion level of the comparison face away from the reference face by three step sizes (i.e. making it easier to discriminate), with maximum step size set at 20 distortion units. Step size was further governed by an acceleration factor of 1.2 and a reversal factor of 1.6. Thus, step size was increased following two consecutive correct or incorrect responses and decreased following a reversal in correctness (i.e. a switch from correct to incorrect or incorrect to correct). Figure 7 shows the trial progression for one subject when tracking the 80% compressed point. In addition, every ten trials contained a catch trial in which the comparison face was maximally compressed or maximally elongated.

Fig. 7
figure 7

Trial by trial comparison distortions while tracking the 80% compressed point for upright (line and *) and inverted (line and o) faces in one subject in Experiment 4A. Compression increased with distortion level; 100 represented the original, undistorted face. The same method was used in Experiment 4B

Data analysis

Thresholds were determined for each orientation by fitting the proportion of compressed and elongated responses for each participant to a logistic function. Psychometric functions were fitted using psignifit version 2.5.6 (see http://bootstrap-software.org/psignifit/), a software package which implements the maximum-likelihood method described by Wichmann and Hill (2001) and runs in Matlab 7.3. Thresholds represented the amount of displacement necessary to perceive a difference regardless of direction. Average response times for each participant were trimmed of outliers using the procedure described in Experiment 3. Logging the thresholds and RTs improved conformity to the normal distribution. Therefore, log thresholds and log RTs were analyzed using paired samples t tests with orientation as the independent variable in SPSS 11.0.1; however, we report the original, non-transformed means and standard deviations in the text.

Results

Participants displayed a clear effect of inversion, t(13) = -4.40, p = .001, with greater sensitivity to differences in the upright faces (M = 18.7, SD = 2.92) than inverted faces (M = 27.4, SD  = 4.38). Response times displayed no effect of inversion, p = .22. Mean upright response times equaled 2.24 s (SD = 1.26), and mean response time to inverted faces equaled 2.36 s (SD = 1.35).

Discussion

Participants clearly experienced a face inversion effect. This suggests that when precise judgments are required about facial feature distances, such as in a geometrical discrimination task, participants use an object-based analytical processing strategy rather than holistic processing. But when the task involves a general comparison between faces (e.g., same or different), participants use holistic processing.

Experiment 4B

Methods

Fourteen new undergraduates from the University of California, San Diego, participated in exchange for course credit. Vision in all participants was normal or corrected to normal. The stimuli, design, and procedure were the same as in Experiment 4A. Data analysis was similar to that of Experiment 4A, except we also computed a mixed-model ANOVA using the data from Experiments 4A and 4B to assess whether attention can mitigate the presence of a face inversion effect.

Results

Participants again displayed a clear inversion effect, t(13) = −5.35, p < .001. Sensitivity to differences in eye-to-mouth distance was greater in upright faces (M = 30.4, SD = 4.12) than inverted faces (M = 36.7, SD = 5.45). There was again no effect of inversion on response times, p = .33; mean upright response times equaled 3.34 s (SD = 1.36), and mean inverted response times equaled 3.21 s (SD = 1.49).

Notably, there was an interaction between the thresholds in Experiments 4A and 4B. In other words, attention mitigated the strength the inversion effect in the discrimination thresholds, F(1, 26) = 5.01, p = .034. Participants who were instructed to make judgments about the eye-to-mouth distances were less susceptible to the face inversion effect than individuals who were not informed about the nature of the variation. However, participants in Experiment 4B were also significantly less sensitive to variation in eye-to-mouth distance (i.e. displayed higher thresholds) than participants in Experiment 4A in both the upright and inverted faces, F(1, 26) = 5.40, p = .028 (Fig. 8). This suggests that, although attention to a specific face part can decrease the face inversion effect, it also inhibits the ability to make accurate same-different judgments.

Fig. 8
figure 8

Discrimination thresholds for 4A and 4B discrimination results, n = 14 in each experiment. Both experiments involved a same–different task, but in Experiment 4B participants were explicitly asked to judge eye-to-mouth distance. Attention to eye-to-mouth distance decreased both sensitivity to differences in the faces and the size of the face inversion effect

Discussion

Despite directing participants’ attention to the eye-to-mouth distances, participants still exhibited a typical face inversion effect. Although the size of the inversion effect was less than that demonstrated in Experiment 4A, overall performance was much worse. Therefore, it seems that any amelioration of the face inversion effect associated with attention is secondary to an overall decrement in discrimination. This suggests that same–different tasks invoke holistic processing, and the application of attention to a specific property of the face (e.g., eye-to-mouth distance) decreases the efficacy of this holistic encoding but is not enough to invoke a shift to analytical processing. This result is consistent with other unpublished results we have found using faces that varied in eye-to-mouth distance or were Thatcherized, and also consistent with the results of Anaki, Nica, and Moscovitch (2010) in which the same–different compatibility of an irrelevant face dimension influenced upright faces but not inverted faces.

The degree to which these results relate to the performances observed in Experiments 1–3 is unclear. While we assume participants in Experiments 1, 2A, and 3 directed attention to the feature distances, there was no face inversion effect. The inverted inversion effect in response times found in Experiment 3 and the trend for an inverted inversion effect in Experiment 2A suggests that, if anything, there is a benefit for inverted faces. Yet, in Experiment 4B, there was a clear, though small, face inversion effect. Therefore, it seems that a geometrical discrimination task requires a different processing strategy than a same–different task, even when that same–different task encourages attention to feature distances. Thus, any shift from holistic to analytical processing is unlikely to be due to attention, and may derive instead from some other aspect of the task. This is consistent with findings from Boutet, Gentes-Hawn, and Chaudhuri (2002) in which attention did not attenuate the composite-face effect—an effect that relies strongly on holistic processing.

One possible alternative is that the use of analytical processing with the geometrical discrimination task results from the unnatural nature of the discrimination task. Since faces are not generally compared using less or more judgments of feature distance, it is possible that the processes used when making these judgments are not facilitated by face expertise. But in Experiment 4B, participants engaged in the highly natural and familiar task of making same–different judgments in a face; the acquired skill involved does not extend fully to inverted faces, for reasons unrelated to attention. In Experiment 2A, however, participants must assume there is a difference rather than relying on the holistic same–different process to come to that conclusion. While most face recognition processes stop after a same or different judgment (e.g., comparison to a memory trace) or categorization judgment (e.g., male/female, Asian/Caucasian), here participants must go beyond this stage and make a less or more decision. Since this decision is not a part of the standard face perception process, it likely uses a non-face specific mechanism, i.e. analytical processing. In this way, participants may experience a strong inversion effect in Experiment 4B despite attending to the relevant information and display no inversion effect in Experiment 2A.

General discussion

In the present study, we explored the nature of holistic and analytical encoding with regards to faces by asking participants to make judgments on geometrical differences in upright, inverted and contrast-negated faces. Both contrast-negation and inversion are known to disrupt holistic processing; therefore, we compared sensitivity to geometrical differences in these conditions to sensitivity with upright faces. In four experiments, we showed that in certain cases faces can be processed like objects. When the task involved precise “less or more” discriminations of differences in facial feature distances, participants displayed the hallmarks of part-based, analytical processing. This included not only a resistance to the standard face inversion effect but a facilitation of discrimination between inverted natural faces. Contrarily, a same–different task and a categorization task involving geometric differences between faces did produce a typical face inversion effect, even when participants were asked to pay attention to the geometrical differences. These results suggest that the act of making precise less or more judgments, at least in the context of geometrical differences in the face, can cause a shift from holistic to analytical processing. These results also suggest that this switch does not result from a simple shift in directed attention.

The current study provides two important steps forward in our understanding of face perception. While previous experiments on face perception explicitly demonstrated the use of holistic encoding, we investigated the use of analytical processing. By using a task that encourages analytical processing, we could observe the flexibility surrounding face encoding. Research on acquired prosopagnosia suggests that once the neural pathways for face perception are developed, they are always engaged for faces, even when they are dysfunctional (Farah, Wilson, Drain, & Tanaka, 1995). The perception of spatial relations between facial features is a fundamental component of face perception. Therefore, it would stand to reason that the discrimination of differences in these spatial relations, such as when the eye-to-mouth distance changes, would involve holistic comparisons and presumably engage these face-specific neuronal pathways. Yet here we discovered that, in terms of holistic processing, this is not always the case.

First, we found that although the perception of facial feature distances is usually impaired by inversion, it is not always impaired by inversion. Indeed, if anything, we found a face inferiority effect (Suzuki & Cavanagh, 1995) with performance facilitated by inversion. In individuals with congenital prosopagnosia, performance in face recognition and discrimination is often better when the faces are inverted (Behrmann, Avidan, Marotta, & Kimchi, 2005), giving rise to the ‘inversion superiority effect’ (Farah et al., 1995). Research suggests that the face inversion effect in typical individuals results from a strong reliance on configural encoding (Bartlett & Searcy, 1993; Freire et al., 2000; Leder & Bruce, 2000; Le Grand et al., 2001; for a review, see Rossion & Gauthier, 2002; Searcy & Bartlett, 1996). The fact that performance is unaffected, or perhaps even better, under inversion than upright suggests that the deficit in facial recognition experienced by congenital prosopagnosics is due to an inability to process a face configurally. Similarly, the inverted inversion effect observed here suggests that participants used analytical processing to make geometrical discriminations. Since a basic tenant of the face inversion effect is the impairment of the perception of metric distances in the face, this seems quite an important finding.

It is also interesting to note that thresholds in the upright condition of Experiment 2A (in which participants experienced no inversion effect) and Experiment 4A (in which participants exhibited a strong inversion effect) did not vary in accuracy (Exp 2A: M = 16, SD = 5.2; Exp 4A: M = 18.7, SD = 2.92). Therefore, if participants used an analytical processing strategy in Experiment 2A, it did not decrease their sensitivity to metric distances in the face. This is surprising, since our profound ability to recognize individual faces compared to, for example, houses (e.g. Diamond & Carey, 1986; Yin, 1969) is often attributed to greater sensitivity to differences in faces and touted as evidence of our face-specific holistic processing skills. Yet these findings are consistent with a recent study by Konar, Bennett, and Sekular (2010) that found no correlation between recognition accuracy and the size of the composite-face effect. This suggests that there is essentially no correlation between the ability to holistically process a face and correctly recognize a face, or at the very least, the relationship is less than straight forward. While we did not measure identification accuracy, we can expand upon their results and suggest that, like identification, holistic processing does not always influence sensitivity to metric differences in the face.

These results may also be relevant to the persistence of aftereffects from face adaptation across orientations. Using similar distortions to those in Experiment 1, Webster and MacLin (1999) demonstrated that adapting to a compressed face produces an aftereffect in which a previously normal face appears expanded (and vice versa). Moreover, this aftereffect is able to transfer across orientations, although the effect is notably smaller when tested in the opposite orientation (Watson & Clifford, 2003; Webster & MacLin, 1999). If holistic processing is specific to upright faces, then the ability of this aftereffect to transfer across orientations suggests that some high-level adaptation is occurring through object-processing mechanisms. Given the resilience of geometrical discrimination to inversion, this task may tap into the same properties that allow for aftereffects across orientations.

Our second important finding is that the use of holistic processing appears dependent upon at least three factors: (1) whether the task is natural, (2) the degree to which the face resembles the norm or prototypical face template, and (3) the orientation of the face.

Support for factor (1) derives from the differences in results observed in the geometrical discrimination task and the same–different/categorization tasks. Same or different judgments are a part of the recognition process and used when deciding if a person’s face matches a specific memory trace. Similarly, it is natural to categorize a face (e.g., male/female, Asian/Caucasian). Therefore, holistic processing has likely developed to support such actions. Rarely are absolute estimates of facial feature distances made in the natural world. It may be interesting to investigate whether an artist’s ability to discriminate these distances, specifically one who paints portraits, is more efficient than a novice’s judgments and uses the same method of encoding as a novice viewer.

Support for factors (2) and (3) derives from our inverted inversion effect. Face perception involves encoding the properties of the face relative to a norm or prototypical face template (Blanz, O’Toole, Vetter, & Wild, 2000; Lee, Byatt, & Rhodes, 2000; Leopold, O’Toole, Vetter, & Blanz, 2001; Loffler, Yourganov, Wilkinson, & Wilson, 2005; Rhodes & Jeffery, 2006). Here, our results suggest that the resemblance between a stimulus and the prototypical face is predictive of the depth of holistic encoding that a face will incur. The inverted inversion effect in Experiment 3 showed the fastest responses to the highly distorted inverted faces. This suggests that the task-relevant information was more accessible in these faces. Similarly, this information became less accessible when the faces were undistorted but still inverted, and even more so when undistorted and upright. This is to be expected if resemblance to a facial prototype is a factor in the depth of holistic encoding, since upright natural faces provide better matches to the face template than highly distorted faces. Therefore, we suggest that holistic processing is not an all-or-none phenomenon. Rather the depth of holistic encoding is a function of similarity to the face template, face orientation, and task demands.

Conclusions

Sensitivity to geometrical differences in faces is unaffected by inversion and did not show a benefit from natural faces. However, response times in the geometrical discrimination task showed a benefit with inversion, especially when the inverted faces contained natural configurations. Based on this, we suggest that the geometrical discrimination task employed here required analytical processing. We further suggest that the depth of holistic processing depends on the nature of the task, orientation, and similarity between a face and the facial template.