Ever since Shepard’s (1987) classic theoretical studies on how objects might be represented, the idea of a perceptual space has been a cornerstone of theories of object perception. In a series of articles, culminating in the seminal universal law of generalization, Shepard developed a mathematical description of a representational space governed by similarity. In order to supply the theories with experimental data, Shepard was also instrumental in developing multidimensional scaling analyses, which have become the core methods for measuring and analyzing such representational spaces (Borg & Groenen, 2005).

Faces are a special class of objects for us humans: A picture of a face contains a wealth of information about a person’s mood, age, sex, and identity. These features are processed quickly and effortlessly in daily life, leading to a very high level of expertise for face processing that allows for rich social interaction. The idea of a representational space for faces has gained considerable momentum in face research since the influential publication of the face space framework of Valentine (1991). The face space framework is in essence a special case of the representational space proposed by Shepard, and posits that faces are represented as vectors in a lower-dimensional vector space. The average face—which is the mathematical average of all previously encountered face exemplars—forms the origin of this vector space. Typical faces lie closer to the average, and atypical or distinctive faces lie farther away. This can, for example, explain the advantage of remembering distinctive faces: Since these faces have fewer neighbors—which are, in addition, spread farther apart in face space—they are less prone to be confused with similar-looking faces, and hence can be remembered more easily. The face space framework has also been used to explain several other hallmarks of face perception, such as the caricature and inversion effects (Valentine, 1991), the own-race bias (Byatt & Rhodes, 2004), and processing of attractiveness (Potter, Corneille, Ruys, & Rhodes, 2007).

A popular method for investigating perceptual spaces in general, and face space in particular, consists of similarity ratings followed by multidimensional scaling (MDS) analysis. MDS allows the experimenter to reconstruct the topology of the stimuli (e.g., faces) in a low-dimensional space. The dimensions of this space can be interpreted and made explicit by the inspection of extremes, as well as by additional rating experiments. Using a similarity rating experiment with faces, Johnston, Milne, Williams, and Hosie (1997) found that more-distinctive faces were located at the periphery of the reconstructed space, whereas typical faces were clustered around the origin; these results were highly consistent with the face space proposed by Valentine (1991). Together with Busey (1998), the study of Johnston et al. suggests that four to five dimensions seem sufficient to explain rating variability. Post-hoc ratings showed these dimensions to be consistent with face width, age, amount of facial hair, and forehead size, respectively. Similarly, additional experiments showed that the face space of children was roughly similar to that of adults (Nishimura, Maurer, & Gao, 2009). Finally, a recent study (Papesh & Goldinger, 2010) has demonstrated that other-race faces indeed seem to be clustered closer together than are own-race faces in the reconstructed perceptual space, just as would be predicted from the face space framework.

Adaptation experiments have provided further evidence for the usefulness of the face space framework: In Leopold, O’Toole, Vetter, and Blanz (2001), a morphable face model (Blanz & Vetter, 1999) was used to probe the perceptual reality of face space by using aftereffects. Participants first had to learn the identities of several faces. They were then adapted to “antifaces”—that is, faces that were generated by mirroring the training faces across the origin of face space (the average face). After participants had adapted, a subsequent presentation of the average face caused them to respond as if they were seeing the mirrored, original face. This experiment (as well as the other high-level face adaptation effects probed by Webster, Kaping, Mizokami, & Duhamel, 2004) provided strong evidence for a face representation in the form of a vector space. Similarly, recent neurophysiological and neuroimaging studies have shown that neural representations can also be explained well using the face space framework (Gao & Wilson, 2013; Leopold, Bondar, & Giese, 2006; Loffler, Yourganov, Wilkinson, & Wilson, 2005).

Whereas face processing is extremely well-studied in the visual domain, comparatively less is known about how the haptic modality might process face stimuli. In 2002, Kilgour and Lederman for the first time demonstrated participants’ capability to identify unfamiliar live human faces and face masks using only their sense of touch, showing that face information might be shareable across the senses. Since then, additional studies have confirmed this result using 3-D face masks. This previous research on haptic face recognition has shown that:

  • humans can haptically discriminate and identify faces at levels well above chance (Casey & Newell, 2007; Dopjans, Wallraven, & Bülthoff, 2009; Kilgour & Lederman, 2002);

  • haptic and visual processing of facial identity may be similarly orientation-sensitive (Kilgour, de Gelder, & Lederman 2004; Kilgour & Lederman, 2006; but see also Dopjans, Bülthoff, & Wallraven 2012; Dopjans et al., 2009);

  • haptic face processing is able to identify basic facial expressions (Lederman et al., 2007) and can lead to aftereffects similar to those observed in the visual domain (Matsumiya, 2013);

  • face information can be shared across the haptic and visual modalities bidirectionally to a certain extent (Casey & Newell, 2007; Kilgour & Lederman, 2002), with transfer being better from vision to haptics than from haptics to vision (Dopjans et al., 2009);

  • haptic face recognition in naive observers seems to be mostly feature-based initially, due to its reliance on serial encoding (Dopjans et al., 2012; Dopjans et al., 2009), but face recognition through serial encoding may be trained relatively quickly with performance approaching that of standard, unimpeded visual recognition (Wallraven, Whittingstall, & Bülthoff, 2013);

  • visual experience with faces is necessary for full haptic face recognition performance, as was shown in a comparison of congenitally blind and late-blind groups (Wallraven & Dopjans, 2013); and

  • visual and haptic face recognition may be subserved by discrete neural mechanisms (Kilgour, Kitada, Servos, James, & Lederman, 2005; Kitada, Johnsrude, Kochiyama, & Lederman, 2009; Pietrini et al., 2004).

Although this body of evidence suggests that haptic face processing may share several core components of visual face processing, the extent to which processing in the two modalities is similar remains unclear. The goal of the present study was to therefore to investigate the perceptual reality of a face space in haptic processing, potentially providing further evidence for the congruence of visual and haptic face processing. Crucially, in the following experiment, a cross-modal comparison was made, examining both visual and haptic processing of the same stimulus set of faces. In order to compare the face space characteristics of visual and haptic face representations, a similarity-rating experiment was conducted, followed by an MDS analysis. The perceptual spaces of both modalities, as reconstructed by MDS, were then compared in terms of their dimensionality and their topology.

Experiment

The experiment was designed to compare the perceptual reconstruction of the similarity relations in a set of face stimuli that could both be seen and touched. Two groups of participants were tested, with modality (vision or touch) as a between-subjects factor.

Method

Participants

A total of 22 participants (12 female, ten male; mean age = 23.4 years, SD = 2.2 years) were randomly split into a visual and a haptic group, with 11 participants in each. The participants all had normal or corrected-to-normal vision, were right-handed, and had no history of deficits in either face processing or tactile processing. All participants gave informed consent and were given a standard compensation of €8/h. The experiment was conducted according to the ethics guidelines of the Max Planck Institute for Biological Cybernetics. Participants and the data from participants were treated according to the Declaration of Helsinki.

Stimuli

The morphable MPI Face Database (Troje & Bülthoff, 1996), which contains 200 (100 female and 100 male) laser-scanned (Cyberware 3030PS) three-dimensional (3-D) heads without hair, was used for the experiment. All of the 3-D heads were put into correspondence, so as to allow morphing between individual heads (Blanz & Vetter, 1999). In a pilot rating experiment, 50 male and 50 female faces were first rated for their distinctiveness. From these faces, six exemplars (three male and three female faces) were picked that did not contain any artifacts from the scanning or correspondence procedure and that were rated as distinctive. These six faces were then averaged vertex-by-vertex to obtain an average face. Then each of the six original faces was morphed 50 % toward the average face, to obtain another six, less distinctive, averaged face exemplars. Finally, six additional 50 % morphs were created between the original faces. The full face space, containing a total of 19 faces, is shown in Fig. 1.

Fig. 1
figure 1

Stimulus face space. The faces at the six corners of the outer hexagon are the six original faces, and the face in the middle, marked with an “A,” is the average of these six faces. Faces in-between the original faces on the outer hexagon are the 50 %-morphs, and faces on the dashed lines are the 50 %-morphs of an original face with the average face. Note that the relative distances of the faces in the figure were equalized for display purposes and do not correspond to actual physical distances

Since both visual and haptic experiments were run on the same, tangible stimuli, in the next step all 19 faces were printed using a 3-D printer. All of the 19 faces were first prepared using a 3-D modeling tool (3DSMax, Autodesk, Canada) by giving them a shell-structure and mounting them on a pedestal for later presentation in the experimental setup. Printing was performed by an Eden 250 printer (Objet, Israel), which jetted white, acrylic-based photopolymer materials in thin layers (16 μm) onto a build tray, building up the face masks layer by layer. The final face masks (see Fig. 2a for two examples) weighed about 138 ± 5 g each, and measured 89 ± 5.5 mm wide, 120 ± 7.5 mm high, and 103.5 ± 5.5 mm deep.Footnote 1

Fig. 2
figure 2

(a) Two example stimuli from the set of 19 faces used in the experiment. (b) Snapshot of the exploration procedure in the haptic condition, showing the face mask fixed on the platform

Apparatus

The faces were positioned on a platform that was placed horizontally on top of a fixed table. All of the faces could be rigidly fixed to this platform and were always presented from a frontal view. Participants used a chinrest that was placed 30 cm away from the stand on which the objects were presented. An computer-controlled opaque screen separated the participants from the stand. During haptic exploration of the faces, an armrest was provided in order to prevent exhaustion (see Fig. 2b).

Procedure

For the visual group, participants placed their chins on a chinrest first. The left hand rested on a keyboard, so as to comfortably enter the similarity ratings using the number keys 1 (totally dissimilar) to 7 (totally similar). In the following step, a few test trials were run to acquaint participants with the number entry procedure. The experimenter then placed the first face on the platform. The screen opened, and the participant viewed the face for 6 s. The screen closed again, and the experimenter placed the second face on the platform. The screen opened again, and the participant viewed the second face for 6 s and responded on the keyboard. A total of 19 * (19 – 1)/2 + 19 trials = 190 fully randomized trials were presented, in order to compare every face with every other face (note that this design only compared face A followed by face B, and not the other way around; the choice of faces A and B was randomized across participants). Breaks were given every 50 trials, splitting the experiment into four blocks (with the last block containing only 40 trials). The total duration of the experiment was around 50 min. Participants were able to view the faces binocularly.

For the haptic group, the overall procedure was similar, except that the screen stayed closed throughout the experiment. In order to touch the face stimuli, participants rested their arm in a sling, putting their hand through a space beside the screen; this setup allowed for easy and comfortable exploration of the face. In addition, participants were instructed to keep their eyes closed in order to fully concentrate on the haptic exploration. The exploration time for each face was set to 12 s, to compensate for the slower processing in haptics (e.g., Dopjans et al., 2009). Since haptic exploration of faces may be an unusual task at first, participants were given 20 test trials to get used to the experiment setup and to comparing two faces haptically. The total duration of the experiment was around 2 h 15 min, with six short breaks and one longer break in order to avoid fatigue.

Results

In the following section, the results of the similarity rating experiments are analyzed for both the visual and haptic conditions. First the focus is on the similarity ratings themselves, followed by the MDS analysis. In each case, a cross-modal comparison of the two conditions is also conducted.

Similarity ratings

Since the haptic task was perceived as being more difficult overall by the participants in debriefing questionnaires, the first analyses were concerned with assessing the stability and consistency of the ratings. First, the similarity ratings were considered for the 19 “same” trials, in which the two presented faces were identical, so that a similarity rating of 7 would be expected for these trials. The average similarity rating values for the “same” trials were 6.424 in the visual and 5.962 in the haptic condition; this difference was significant, as assessed by a Mann–Whitney test with U = 69.000, p < .001. These results show that participants in both conditions deviated somewhat from the ideal value when rating identical faces, with a larger deviation for the haptic condition.

Next, all similarity ratings were compiled into an upper-triangular 19 × 19 matrix for each participant. The individual matrices were then correlated across all participants separately for the visual and haptic conditions to assess rating consistency, yielding another upper-triangular 11 × 11 matrix of correlation values. The averaged correlations for the two correlation matrices were r vis = .622 and r hap = .505. Again, the difference between these two correlation values was significant in a Mann–Whitney test, U = 4,447, p < .001, showing that interparticipant correlations were higher for the visual condition than for the haptic condition.

Overall, the results confirm the participants’ subjective reports that the task was more difficult in the haptic condition: Both the ratings for “same” trials and the rating consistency across participants were lower in the haptic than in the visual condition. Nevertheless, despite the difficulty of the task, the absolute values of the two indicators did show that participants were able to do the task consistently.

Finally, in order to assess the consistency of the ratings across the visual and haptic conditions, the individual similarity ratings in both conditions were averaged to obtain two 19 × 19 similarity matrices. Correlating the two matrices yielded a high value of r = .797 between visual and haptic similarity ratings, showing that the ratings—when viewed as group averages—were consistent.

MDS analysis

Using the averaged similarity matrices for the visual and haptic conditions, a nonmetricFootnote 2 MDS analysis was conducted using the mdscale command from MATLAB (MathWorks Inc., Natick, MA). Since nonmetric MDS is an iterative procedure, random initial configurations and ten replications of the MDS optimization were used to obtain the final solution. In order to determine the optimal dimensionality of the MDS solution, Kruskal’s Stress1 value was evaluated for one to five dimensions. Stress values below .2 indicate a sufficient reconstruction of the similarity ratings, and stress values of .1 indicate good reconstruction (Borg & Groenen, 2005; Cooke, Jäkel, Wallraven, & Bülthoff, 2007; Gaissert, Wallraven, & Bülthoff, 2010). The stress values for the visual condition were s v(1) = .273, s v(2) = .185, s v(3) = .096, s v(4) = .065, and s v(5) = .048, and the values for the haptic condition were s h(1) = .401, s h(2) = .207, s h(3) = .126, s h(4) = .092, and s h(5) = .067. Although the haptic solutions fall off more slowly than the visual ones, both analyses show two to three dimensions as being sufficient for both conditions. In the following discussion, three dimensions are used as a compromise between reconstruction quality and potential overfitting.

Figures 3a and b show the reconstructed perceptual spaces obtained from the MDS solutions. In both cases, the average face (“A”) lies at the origin of each perceptual space, with the original six faces being located mostly at extreme positions. Overall, morphed faces (between individual faces and with the average face) also seem to be located closer to the average face. The latter observation is an important prediction made by the face space framework (Busey, 1998; Johnston et al., 1997; Valentine, 1991): Morphing faces should make these faces less distinctive, and hence reduce their overall distance in the perceptual space, as compared to that of the corresponding original faces. To quantify this effect, Fig. 4 shows the results of calculating normalized distances for the two types of morphed faces that were tested. First, the Euclidean distances in the 3-D spaces shown in Figs. 3a and b were determined between the original faces and the average face (d orig). Then the distances for all morphed faces (d morph) were determined and divided by the distance of the corresponding original face to the average face in order to normalize the data. If d morph < 1, then the distance for the morphed face is smaller than the distance of the original face from the average face. As Fig. 4 shows, this is true for all faces in the two conditions. Overall, both visual and haptic perceptual spaces were highly consistent with distance relations, as would be predicted from the face space framework.

Fig. 3
figure 3

Two-dimensional projection of the three-dimensional perceptual spaces, using multidimensional scaling, of the (a) visual and (b) haptic similarity ratings. Units on the x- and y-axes are arbitrary, and the numbers in the plot correspond to the numbers shown in Fig. 1. Colors are used to help identify different faces more easily, with the six original faces shown in fully saturated colors, and all other morphed faces shown as half-saturated colors. The projection is made onto the first two dimensions of the MDS solution

Fig. 4
figure 4

Normalized distances for the visual and haptic conditions for the two types of morphed faces. “x/A” indicates morphs of individual faces with the average face, and “x/y” indicates morphs between two individual faces

Although the overall topology of the visual and haptic spaces conformed to what is expected, the two perceptual spaces were not identical. In particular, the locations of the original faces around the perimeter differed in the two cases. In order to quantify this difference, a procrustes analysis was used to determine a rotation and scaling between the two spaces, such that the final distances of the transformed spaces would become minimal. This analysis yielded a final goodness-of-fit measure, for which a value of 0 would indicate perfect fit, and higher values would indicate a less good fit. In this case, the goodness-of-fit measure was gof = .445, which is notably higher than the typical values of gof = .2 obtained using the same experimental design and settings with novel objects instead of faces (Cooke et al., 2007; Gaissert et al., 2010). Hence, although the two perceptual spaces do share some characteristics, the difference between visual and haptic face space seems to be larger than the difference between visual and haptic shape space.

Next, an attempt was made to determine the characteristics of the face space dimensions for both conditions. For this, the 3-D coordinates of each face were projected onto one of the three main dimensions and sorted from low to high values. The resulting continua were then rated by two independent groups of eight raters each, who were asked to describe the most likely attribute along which the faces varied. Raters in the visual group were only allowed to see, and those in the haptic group were only allowed to touch, the faces, which were lined up on table. Raters in the haptic group were additionally blindfolded and acquainted with the stimuli prior to rating, in the same way as has previously been described. The most often-named attributes for the three visual dimensions were age (7/8 raters), size/aspect ratio (8/8 raters), and mouth shape/expressivity (5/8 raters). The most often-named attributes for the three haptic dimensions were nose shape (5/8 raters), size/aspect ratio (5/8 raters), and texture (3/8 raters). For the visual domain, the first two dimensions conform well to those that were found important by Busey (1998) and Johnston et al. (1997). Although the faces were scanned with a neutral expression, the mouth shapes on two scans were slightly asymmetric, which may explain the mentioning of expressivity as a candidate for the third dimension. Overall, raters were less consistent for the haptic dimensions; nevertheless, important local shape features (shape of the nose) as well as global shape features (size/aspect ratio) also came up here. The third dimension may also have had something to do with age, since the older faces in the database also have more texture. Finally, nose shape as a feature is an interesting case, since all participants in the haptic condition did “scan” the nose during exploration, whereas the frontal presentation in the visual condition made nose shape less easily accessible in that condition (although shadows cast by the overhead lighting in the room did provide cues as to nose shape). Overall, some agreement was apparent between the face space dimensions in vision and haptics: Most notably, the second dimensions were similar in terms of stimulus characteristics, emphasizing the global shape. The first and third dimensions, however, differed for the two conditions, which in the case of haptic exploration seemed to emphasize more tactile stimulus properties.

General discussion

This study was designed to investigate the perceptual spaces of face shapes explored visually and haptically. Although the task seemed somewhat more difficult and less consistent in the haptic modality, the similarity ratings overall correlated well on a group level. Multidimensional scaling analyses showed that two to three dimensions were enough to capture most of the variance in the similarity data. Importantly for this discussion, the topology of both perceptual spaces preserved distance relationships, as predicted by the face space framework; most notably, the average face was located at the origin of the face space, and morphed faces were closer to the average face than were their corresponding original sources.

Although the overall topology for both modalities conformed to the requirements of a face space, differences emerged in the precise locations of each tested face between vision and haptics. The underlying dimensions of the two spaces shared one stimulus property (size/aspect ratio), which was related to the global shape, but they differed in the interpretation of the other two dimensions. Interestingly, the dimensions recovered from the visual similarity ratings did conform to those found in other studies using textured faces (Busey, 1998; Johnston et al., 1997). This was true despite the fact that different stimulus materials were used here, which did not include hair variation or variation in skin color. Nevertheless, it seems as if in this case age (which was apparent mainly through wrinkles on the face) could reliably also be extracted from the coarser shape data alone.

In the haptic domain, one local shape feature (nose shape) was the most prominent feature for the face set. This may have been due to a difference in the exploration “views” that participants had in the visual versus the haptic condition. Indeed, previous studies have shown that one of the benefits of haptics in object exploration is that the hand can explore the back of an object, which is otherwise occluded from the visual frontal viewpoint (Newell, Ernst, Tjan, & Bülthoff, 2001). In future studies, it would be interesting to allow more freedom in the visual condition to offer different viewpoints, in order to see how much the dimensions of face space would be affected (see, e.g., the studies using active exploration by Gaissert et al., 2010, and Lee & Wallraven, 2013).

The differences between the two perceptual spaces may also be due to modality-specific biases (such as exploration mode or the fact that haptic processing may have reached its limit, in terms of stimulus complexity) or to a lack of training for haptic face recognition. Concerning the first point, haptic processing is serial and comparatively slow, whereas visual processing is parallel and fast. Indeed, several studies have demonstrated that, though haptic recognition of face shapes is readily possible, its untrained performance is lower than that of visual face recognition (e.g., data using the same face shapes as in the present study showed a d' difference of around 1 between visual and haptic face recognition; Dopjans et al., 2009). Interestingly, however, when visual processing is made similar to haptic processing through the use of a small aperture that is moved across the visual environment, such visually restricted recognition performance drops significantly, and even ceases to show a face inversion effect (Dopjans et al., 2012). Hence, exploration mode may provide an important restriction on the efficient access to face-specific information in the untrained case. Interestingly, however, in a follow-up study, Wallraven and Dopjans (2013) showed that performance with only a few hours of training in this unusual exploration mode recovered rapidly, and even started to show signs of an inversion effect again. Transferring these results to the haptic modality, it may be that similar training with haptic face recognition would improve haptic processing strategies in a similar fashion. In particular, given enough experience at individuating faces haptically, important dimensions such as age and gender might start to appear more prominently in the haptic domain as well. Indeed, inasmuch as expertise is an important part of Valentine’s (1991) original face space framework, additional training may be a crucial ingredient of forming a more stable face space. Further experiments using haptic face training will be necessary to elucidate these factors and their impact on the present results.

Another issue concerns the degree to which the results in the haptic data are driven by purely haptic processing—that is, whether there is evidence for a unique haptic face space. An alternative explanation may be that participants in the haptic condition simply mapped their visual face space onto the haptic input. As was reported earlier, information transfer in face processing is less efficient from haptics to vision than in the opposite direction (Dopjans et al., 2009), which would explain the more noisy “haptic” face space. Aside from the current debate on whether haptic and visual processing share one representation or combine multiple, separate representations (Gaissert et al., 2010; Lacey, Tal, Amedi, & Sathian, 2009; Norman, Clayton, Norman, & Crabtree, 2008), a critical test for the influence of visual information would be to repeat the present experiment with congenitally blind and late-blind individuals. Performance differences between the two groups of blind individuals and the sighted individuals tested here would serve to elucidate the importance of prior visual experience with faces for forming a face space (see, e.g., Norman & Bartholomew, 2011, and Wallraven & Dopjans, 2013, for studies that highlight both similarities and differences between vision and haptics using blind and sighted participant groups).

Finally, one may ask how much the results reported here are actually specific to faces. As was stated in the introduction, in theory the idea of a face space is a specific instantiation of a representational space in the sense of Shepard (1987), and hence, nothing is “special” about a face space per se.Footnote 3 High-level adaptation effects such as those observed for faces in the visual domain (Webster et al., 2004), however, have yet to surface for other objects, making the visual face space at least different from that of other object spaces. In addition, as was stated in the introduction, haptic face recognition shares several critical behavioral characteristics, and also neurophysiological substrates, with visual face recognition, providing further support for the notion of special processing of faces even in haptics.

In summary, these results add one more potential piece of evidence for a core characteristic of face perception in the haptic domain—that of a well-structured perceptual space compatible with the predictions of the face space framework. Further experiments will be needed, however, to probe for additional characteristics of haptic face space (e.g., investigating caricatures (Valentine, 1991), adaptation to antifaces (Leopold et al., 2001), or exemplar-based vs. norm-based encoding (Ross, Deroche, & Palmeri, 2013)), the influence of visual face processing on the haptic results, as well as the capabilities and limits of face and complex-shape processing in haptics in general (Gaissert et al., 2010; Lacey et al., 2009; Norman et al., 2008).