Active, embodied distance judgments are constrained by nonoptical factors (Loomis & Beall, 2004; Loomis, da Silva, Fujita, & Fukusima, 1992; Mittelstaedt & Mittelstaedt, 2001; Proffitt, Stefanucci, Banton, & Epstein, 2003; Rieser, Pick, Ashmead, & Garing, 1995; Ziemer, 2012). Rieser et al. decoupled optical information about forward self-motion from actual walking rate (and thus from proprioceptive information about forward motion) by having participants walk on a treadmill towed by a tractor. When the treadmill speed was slower than the tractor speed (optic flow suggested faster walking), participants underestimated a target distance when later asked to blind-walk it. Likewise, participants overestimated when the treadmill speed was faster than the tractor speed. This result was explained in terms of visuomotor calibration between optical and proprioceptive information about self-motion.

Other nonoptical influences on distance reports resist visuomotor calibration explanations. Energy expenditure influences prospective (verbal) distance reports even when environmental (i.e., optical) cues remain constant (Proffitt et al., 2003; Witt, Proffitt, & Epstein, 2004). Although these claims have stirred controversy (Durgin, Baird, Greenburg, Russell, Shaughnessy, & Waymouth, 2009; Durgin, Klein, Spiegel, Strawser, & Williams, 2012; Proffitt, Stefanucci, Banton, & Epstein, 2006; Proffitt & Zadra, 2011; Zadra, Schnall, Weltman, & Proffitt, 2010), not all of the findings can be attributed to response bias or artifacts (Witt & Proffitt, 2008; Witt, Schuck, & Taylor, 2011). The metabolic cost of a future action appears to influence how perceivers apprehend the environment’s spatial layout.

We propose a common framework to account for the effects of visuomotor calibration and energy expenditure on action-based judgments of environmental layout. We hypothesized that perceivers are sensitive to a cross-modal informational variable (e.g., Mantel, Bardy, & Stoffregen, 2010; Stoffregen & Bardy, 2001; Streit, Shockley, & Riley, 2007) that captures the relation between information (proprioceptive and interoceptive) about the metabolic cost of locomotion and the coincident optical information about distance traversed (i.e., optically specified distance). We tested the prediction that action-based distance judgments (i.e., when an action—walking—is entailed by the perceptual reporting method) are a function of the energy required to traverse an optically specified distance. This quantity is captured by the multimodally specified energy expenditure (MSEE),

$$ \mathrm{MSEE}=\frac{\mathrm{Energy}\kern0.5em \mathrm{Expenditure}}{\mathrm{Optically}\kern0.5em \mathrm{Specified}\kern0.5em \mathrm{Distance}}. $$
(1)

Optically specified distance is the distance depicted as being traversed in a virtual environment (VE) used for testing, which is determined by the optic flow rate and the duration of walking.

Energy expenditure can be quantified by the amount of O2 consumed to complete a task (McArdle, Katch, & Katch, 2008), in this case walking an optically specified distance. We independently manipulated both determinants of MSEE. We presented participants with a referent target distance in a scene depicted in a head-mounted display (HMD). In a reporting phase, participants attempted to reproduce that distance by walking. It may require 3 L of O2 to walk 12 m at 3 mph (12 m actually walked corresponds to 12 optically specified meters when the optic flow rate—controlled in the VE independently of actual walking speed—is 3 mph) in the referent phase. In this case, MSEE = 0.25 Lm–1. If, during the reporting phase, the walking rate is decreased to 2 mph while holding the optic flow rate at 3 mph, it may then take only 2 L of O2 to walk 12 optically specified meters (because the optic flow rate is faster than the walking rate). In this case, the MSEE decreases to 0.17 Lm–1, meaning that it appears easier to walk a given distance relative to the referent—that is, “easier” in the sense that more optical distance is traversed per unit of O2 consumed. Thus, if the target distance (12 m) is perceptually coded as MSEE = 0.25 Lm–1, then in order to reproduce that same relation between walked distance and MSEE, the distance reported by walking would have to be shorter than the referent (i.e., it would have to be 8 m). MSEE could be manipulated similarly using parameters other than walking rate that influence the energetic cost of walking (e.g., body mass or grade of inclination; Givoni & Goldman, 1971). Alternatively, MSEE could be manipulated independently of energy expenditure by changing the optic flow rate while holding walking rate constant. If optic flow rate were increased such that it required 2 L of O2 to walk 12 optically specified meters at an optic flow rate of 4 mph (MSEE = 0.17 Lm–1), then if a participant reproduced the relation between walked distance and an MSEE of 0.25 Lm–1 (the MSEE presented in the referent), the participant would walk only 8 m.

Perceptual sensitivity to MSEE may account for the previously observed influences of visuomotor mapping manipulations on action-based distance judgments (Mohler et al., 2007; Rieser et al., 1995), and also for findings implicating energetic influences on distance perception (Proffitt et al., 2003). MSEE makes the same predictions about the influences of walking and optic flow rate manipulations, as is predicted by internal model accounts of visuomotor mapping manipulations (Loomis & Beall, 2004; Rieser et al., 1995). Likewise, MSEE accounts for the influences of changes in effort associated with changes in action-based distance judgments.

In this experiment, we tested the basic prediction that action-based distance judgments depend on MSEE. We additionally manipulated different parameters of MSEE to produce either an identical increase or decrease in MSEE magnitude relative to the referent. If embodied distance perception is constrained by MSEE, identical changes in MSEE via different lower-order parameters should yield identical changes in action-based distance judgments, but those judgments should be insensitive to how the MSEE is manipulated.

Method

Participants

Seventeen healthy University of Cincinnati undergraduates (23.5 ± 5 years; 10 males, seven females) participated for course credit.

Materials and apparatus

The VE was presented using a Cyber Mind hi-ResVGA + HMD in SVGA format with a 42º field of view. Each display screen’s resolution was 800 (horizontal) × 600 pixels (vertical) × 3 color elements. The VE was dynamically refreshed at 100 Hz. Left and right screens displayed identical images. Head position and orientation were measured at 24 Hz using a magnetic tracker (FasTrak II; Polhemus, Inc., Colchester, VT) attached to the top of the HMD. A Pentium IV PC (Microsoft Windows XP, 768 MB RAM, ATI Radeon X1300 PRO video card) running customized C++ software used head position and orientation to render the VE through the HMD. The VE transformed veridically with participants’ vertical movement and head rotations (i.e., the optic flow rate manipulation was applied only to linear translation through the VE). The optic flow rate was not a function of participants’ actual head or body movements; it was independent of walking rate and of forward/backward head displacements. The VE (Fig. 1) was a corridor resembling a tunnel with gray brick walls and a ground surface resembling a road with yellow center lines. The initial viewpoint was from the center of the virtual corridor at a height proportional to the participant’s height.

Fig. 1
figure 1

Methods of presenting the referent (target) distance (left) and of judging the target distance via magnitude production (right)

Participants walked on a motorized treadmill (Fitnex Fitness Equipment Inc., Model #4821). The speed range reflected normal walking rates (1.7–3.3 mph; cf. Cavagna, Saibene, & Margaria, 1963). Treadmill speed and grade were controlled by the computer that generated the VE.

A Biopac (Goleta, CA) real-time gas analysis system consisting of a face mask with tubes connecting to a gas chamber provided oxygen consumption (VO2) measurements at 20 Hz.

Procedure

Experimental task

During each trial for conditions involving action-based distance judgments, the participants walked on a treadmill while viewing the VE through the HMD. Participants walked a target (referent) distance indicated by a pair of “starting position” cones and a pair of “ending position” cones, and then attempted reproduced the target distance by walking during the reporting phase. The target distance was always 12 m during testing. Participants were discouraged from using strategies such as counting footsteps or using visual landmarks and were told instead to simply indicate when they felt that they had traversed the target distance. At the beginning of each trial, each participant stood on the inactive treadmill (grade at 6º) wearing the HMD. The purpose of the 6º grade was to allow for a decrease in MSEE in the reporting phase relative to the referent. Once participants were comfortable, the treadmill was started with the walking and optic flow rates set at 3.0 mph. After participants experienced the VE for approximately 20 s, a pair of start-line cones and a pair of finish-line cones representing the target distance appeared on the virtual road. After participants had walked the target distance initially, a pair of start-line cones appeared, at which time a stopwatch was started, and participants were asked to reproduce the target distance by indicating when they had traversed the target distance (relative to the start-line cones) by saying “stop,” at which time the stopwatch was stopped.

Overview of experiment

The experiment included three periods (Fig. 2): (1) training (calibration to the VE and familiarization with the experimental task); (2) baseline testing and VO2 measurement (assessment of participants’ baseline distance perception ability and energy expenditure); and (3) testing (MSEE manipulations were introduced and action-based distance judgments were obtained).

Fig. 2
figure 2

a Training. During the referent phase, the participant experienced two randomized trials using distances of 10 and 14 m, with a grade of 6º and walking at 3 mph. During the reporting phase, the trial ended when the participant said “stop,” and feedback was given. b Baseline testing. The participant walked a distance of 12 m with a grade of 6º while walking at 3 mph. The amount of time during the reporting phase determined the amount of VO2 measured. c VO2 measurement. Participants walked at four different rates (1.5, 2, 4, and 4.5 mph) with a constant grade of 6º. Additionally, they walked at four different grades (0º, 2º, 10º, and 12º) with a constant walking rate of 3 mph. A control condition was recorded with a 3-mph walking rate and a grade of 6º. d Testing. The manipulations occurred during the reporting phase of all trials

Training

Training familiarized participants with the task and calibrated them to the VE and treadmill by providing feedback on action-based distance judgments without MSEE manipulations. To ensure that participants were calibrated to the VE, but not trained on the specific distances used for testing, two different target distances (10 and 14 m) were used for the two training trials. After each judgment was obtained, the experimenter advised participants whether they had overestimated or underestimated (beyond a 2-m tolerance) the target distance.

Baseline testing and VO2 measurement

Participants walked the 12-m target distance in the referent phase and were asked to reproduce it in the reporting phase (without MSEE manipulation) three times. The average reported distance during baseline testing was chosen as the optically specified distance to be used during the testing phase (i.e., this was the optically specified distance that corresponded to what the participant reported as being 12 m without manipulating MSEE).

Because participants wore the HMD during baseline testing, VO2 could not be measured during those trials. After baseline testing, participants removed the HMD and donned the Biopac facemask in order to permit VO2 measurement. VO2 was measured during a time interval equal to the average reporting period for each participant. Once the participant was walking at the proper speed for 20 s, VO2 was measured for 100 s (Fig. 2c). The time interval corresponding to traversing the target distance was sampled from the last section of each 100-s VO2 trial. This VO2 value was used to determine target MSEE values for the manipulations. Target MSEE values corresponding to a 25 % increase and 25 % decrease were determined from these baseline MSEE measurements. VO2 measurement included the walking rate for the referent phase (3 mph) along with walking at four speeds (1.5, 2, 4, or 4.5 mph) with a constant grade of 6º, and four grades (0º, 2º, 10º, or 12º) at a constant speed of 3 mph, for a total of nine trials. Participants rested for 2 min between trials.Footnote 1

For each participant, two regression equations were generated on the basis of the measured VO2 for the walking rate and grade manipulations, respectively. These equations were used to determine the walking rate and grade changes required to achieve ±25% MSEE changes relative to the referent. The VO2 for the baseline distance was used to determine the target MSEE (i.e., a ±25 % change relative to the referent) for the optic flow rate manipulation of MSEE (Table 1).

Table 1 Illustration of a strategy to manipulate different parameters of multimodally specified energy expenditure (MSEE) to achieve a −25 % (low-MSEE) and a +25 % (high-MSEE) change

Testing

After the VO2 measurement was completed, distance judgment trials were implemented in random order. Each trial consisted of presentation of the referent (12 m between the cones, with walking and optic flow rates of 3 mph and a 6º grade), followed by the reporting phase. Immediately prior to the reporting phase, the experimental manipulation (walking rate, grade of inclination, or optic flow rate) was implemented. Action-based distance judgments were determined as the product of walking rate and the time elapsed between the start-line cone and the participant’s “stop” command.

Participants performed two grade manipulation trials (±25 % change in VO2 relative to the VO2 for the referent), two walking rate manipulation trials (±25 % change in VO2 relative to the referent), and two optic flow manipulation trials (±25 % change in optically specified distance relative to the referent). The strategy behind producing an identical increase or decrease in MSEE magnitude relative to the referent by independently manipulating each of the three parameters (Table 2) was to demonstrate that perceivers were sensitive to MSEE itself, rather than simply showing the same general pattern across the parameters (i.e., a main effect of MSEE was predicted). If perceivers were not sensitive to MSEE, per se, but rather to the lower-order parameters that defined MSEE, we should find either a main effect of mode of manipulation or an interaction.

Results

During training (matching walking and optic flow rates, with no manipulation), on 18 % of the trials participants overestimated, and on 3 % of the trials they underestimated the target distance. For the 10-m target distance, participants reported a mean distance of 10.44 m (SD = 2.45 m), and for the 14-m target distance, participants reported a mean distance of 14.96 m (SD = 3.63 m).

The mean values needed to produce the −25 % changes in MSEE relative to the referent were a walking rate of 1.96 mph (SD = 0.32 mph), a grade of inclination of −0.11º (SD = 2.43º), and an optic flow rate of 4.68 mph (SD = 0.62 mph). The mean values to produce the +25 % increases in MSEE relative to the referent were a walking rate of 3.65 mph (SD = 0.43 mph), a grade of 12.61º (SD = 4.97º), and an optic flow rate of 2.21 mph (SD = 0.23 mph). To evaluate whether manipulations of the MSEE parameters modulated MSEE as expected, the computed MSEE values for each participant were submitted to a 2 (MSEE change: +25 % and −25 %) × 3 (mode of manipulation: walking rate, grade, or optic flow rate) repeated measures analysis of variance (ANOVA). As is shown in Fig. 3a, we observed a significant main effect of MSEE change, F(1, 32) = 104.78, p = .001, η p 2 = .87: MSEE was significantly greater in the +25 % than in the −25 % condition. Mode of manipulation was not significant, F < 1, η p 2 = .03: The magnitudes of change for the different modes of manipulating MSEE were identical. No significant interaction was evident, F(2, 32) = 1.42, p = .26, η p 2 = .08.

Fig. 3
figure 3

a Results for multimodally specified energy expenditure (MSEE) the dashed line represents MSEE in the referent condition (MSEE = 0.28). b Reported distances for a +25 % (high) and a −25 % (low) change in MSEE via three modes of manipulation; the dashed line here represents the reported distance in the referent condition (13.08 m). Error bars represent one standard error

The mean distance reported in the referent condition during testing was 13.08 m (SD = 1.27). The action-based distance judgments were submitted to a 2 (MSEE change) × 3 (mode of manipulation) repeated measures ANOVA. The results (Fig. 3b) confirmed our predictions: A significant main effect emerged for MSEE change, F(1, 16) = 25.06, p = .01, η p 2 = .61; shorter distances were reported for −25 % than for +25 % MSEE; and we found no mode-of-manipulation main effect, F < 1, η p 2 = .06, and no significant interaction, F(2, 32) = 1.05, p = .36, η p 2 = .06.

Discussion

Manipulating walking rate and grade changed MSEE by changing the VO2 required to traverse a constant optically specified distance. Manipulating optic flow rate held energy expenditure constant but changed the visual consequences of expending energy to walk, thus also changing MSEE. Each manipulation increased or decreased MSEE by an equivalent amount. The changes in MSEE resulted in the predicted changes in action-based distance judgments, which were obtained by walking in the treadmill VE with the eyes open until participants felt that they had reproduced the target distance. Increasing MSEE was associated with an increase in reported distance, and decreasing MSEE with a decrease in reported distance. The modes of manipulating MSEE had equivalent effects. Action-based distance judgments were thus influenced by the macroscopic, cross-modal variable MSEE, but transparent to the lower-order variables that determined MSEE.

We do not propose that MSEE accounts for all types of distance perception. In particular, our experimental task situated distance perception in the context of the embodied experience of traversing and then reproducing a distance (while vision was available; cf. Mittelstaedt & Mittelstaedt, 2001), so the judgments that we obtained involved a number of factors, including perceived self-motion and memory of the target distance. This initial test of the MSEE model shows, however, that perceiver sensitivity to a single cross-modal variable may provide a common framework for the previously observed action-related influences on action-based distance judgments. Perceivers may be directly sensitive to higher-order informational variables, defined as patterns extending across sensory–energetic media—that is, to patterns in the global array (Stoffregen & Bardy, 2001). Consistent with the present study, Mantel et al. (2010) suggested that egocentric distance perception may be specified not only by optic flow but also by the nonoptical consequences of observer motion. In the present study, direct sensitivity to structure in the global array was suggested by the fact that distance reports were sensitive to changes in MSEE but insensitive to the mode of manipulation of MSEE.

The MSEE model offers novel predictions regarding how other energetic manipulations should influence action-based distance judgments. For example, the finding that the grade of inclination influenced distance reports is, to our knowledge, a novel finding that is consistent with energetic-cost accounts of distance perception (Proffitt et al., 2003; Stefanucci, Proffitt, Banton, & Epstein, 2005) and is specifically predicted by the MSEE model. No existing models of distance perception predict that the particular optic flow rate manipulation should be perceptually equivalent to walking rate and grade manipulations, but this finding is also predicted by the MSEE model. The present influences on distance reports have been replicated in subsequent experiments (White, 2012). In addition, compliance of the locomotor substrate (Givoni & Goldman, 1971) and gait symmetry (White, 2012) influence the energetic cost of walking, and action-based distance judgments have been shown to conform to the predictions of the MSEE model under these manipulations (White, 2012). Similar manipulations that have energetic consequences should allow us to distinguish sensitivity to MSEE from factors such as the perceived speed of linear self-motion (cf. Mittelstaedt & Mittelstaedt, 2001).

It is unlikely that our results can be accounted for by a cognitive strategy (e.g., counting steps or visual landmarks) or by a response bias based on guessing the nature of the manipulations. Unimodal strategies such as counting steps or time would be unlikely to show an influence of optic flow rate. Likewise, counting landmarks would not be expected to show an influence of walking rate or grade. Thus, participants would have had to change their strategy for every type of manipulation and also have been able to cognitively modulate their reports in such a way as to conform precisely to our predictions. Indeed, we selected the specific instruction for participants to report on the basis of how far it felt that he or she had walked in order to minimize the possibility that participants could adopt simple unimodal strategies. A similar framing of the present experiment’s instructions was used by Harrison and Turvey (2009) during a blind-walking task. They told participants not to overthink the task, but to comply with a similar strategy of stopping when it felt as if they had traveled to the target location. Durgin et al. (2009; Durgin et al., 2012) and Woods, Philbeck, and Danoff (2009) have criticized the methodology of several of Proffitt’s studies about the influence of energetic variables on verbal reports of perceived distance, concluding that the results may reflect response bias rather than a change in perception. Proffitt (2009, 2013) has acknowledged that it is still uncertain whether or when response bias may play a role in such results, but a proper avenue of research to continue this line should have manipulations that are not intuitive to participants, such as the gait symmetry manipulation in White (2012). For the present experiment, had participants guessed the nature of the manipulations, they would have had to guess the multimodal nature of our hypothesized variable (i.e., that it was a function of both energy expenditure and optic flow rate), perceive the energy expenditure precisely in each condition, and modulate their reports to conform to how their perception should have changed, given the change in energy expenditure or optic flow rate, in order to conform to our predictions.