The size–weight illusion (SWI) refers to the phenomenon whereby, when two objects of the same physical weight but different sizes are lifted, the smaller object usually feels heavier than the larger object (Charpentier, 1891; see also Nicolas, Ross, & Murray, 2012). The material–weight illusion (MWI) refers instead to the phenomenon whereby, when two objects of the same physical weight but different surface materials are lifted, the object with the less dense surface material (e.g., polystyrene) usually feels heavier than the object with the denser surface material (e.g., iron; Harshfield & DeHardt, 1970; Seashore, 1899).

Perhaps the most widely accepted explanation for the SWI and MWI is provided by the so-called expectation model (Flanagan, Bittner, & Johansson, 2008; Ross, 1969), according to which the SWI and the MWI subtend top-down perceptual processes. Over their lifetime, people learn that larger objects are typically heavier than smaller objects, and that objects made of denser materials are typically heavier than objects made of less dense materials. Therefore, people expect larger objects to weigh more than smaller objects, and they expect objects with denser surface materials to weigh more than objects with less dense surface materials. Perceived weight thus results from the contrast between the sensory information about the actual weight of an object and the cognitive information about its expected weight.Footnote 1 Because of that contrast, if two objects of the same physical weight are expected to have different weights, then the object that is expected to be heavier will be perceived to be lighter. Support for the expectation model has been provided by studies showing that weight expectations generated prior to the lifting action can directly cause the SWI (Buckingham & Goodale, 2010; cf. Masin & Crestoni, 1988) and the MWI (Buckingham, Ranger, & Goodale, 2011), as well as by studies showing that the magnitude of the SWI is inversely related to the amount of participants’ experience with objects that reverse the natural size–weight correlation (Flanagan et al., 2008; Nakatani, 1985).

The expectation model provides a unified explanation for various weight illusions (see Buckingham, 2014, for a review), such as the SWI, the MWI, and the so-called golf-ball illusion. Ellis and Lederman (1998) asked golfers and nongolfers to judge the expected weight of regular and practice golf balls. Golfers correctly expected the regular balls to be heavier than the practice balls, whereas nongolfers, who lacked experience with the two kinds of balls, believed that they had approximately the same weight. The participants were then asked to judge the perceived weight of regular and practice golf balls that were artificially equated in physical weight. The results showed that golfers, who expected the regular balls to be heavier, perceived them to be lighter than the practice balls (i.e., the golf-ball illusion), whereas nongolfers, who expected the two kinds of balls to weigh the same, perceived them to weigh the same. The golf-ball illusion constitutes an outstanding example of the top-down effects of prior weight expectations on the perceived weights of objects.

Early researchers attempted to ground the contrast between actual and expected weight on relatively simple physiological mechanisms. Specifically, some students (Davis & Roberts, 1976; Gordon, Forssberg, Johansson, & Westling, 1991) hypothesized that objects that are lifted more easily would be perceived as lighter than objects that are lifted with more difficulty. Because objects that are expected to be heavier are lifted with greater force than objects of the same physical weight that are expected to be lighter, the former would be lifted more easily, and thus, they would be perceived to be lighter than the latter. Unfortunately, this relatively straightforward explanation of the effects of prior weight expectations on weight perception is probably incorrect. It has indeed been found that the lifting force rapidly adapts to the actual weight of objects; that is, it ceases to be affected by prior weight expectations after a few lifting trials. Despite this, weight illusions persist even after several lifts of the same set of objects (Brayanov & Smith, 2010; Flanagan & Beltzner, 2000; Flanagan et al., 2008; Grandy & Westwood, 2006). This finding suggests that the contrast between actual and expected weight is probably due to central perceptual–cognitive processes, rather than to peripheral physiological mechanisms.

From manipulation of one to manipulation of two nonweight stimulus properties: implications for the expectation model

Studies on illusory weight perception have typically been conducted by varying the physical weight of stimuli and one of their nonweight properties (e.g., size or surface material). However, researchers have recently started exploring the relationship between expected and perceived weight in studies in which two or more nonweight properties of the stimuli are jointly manipulated (Buckingham & Goodale, 2013; Buckingham & MacDonald, 2016; Dijker, 2008). Let us denote by w the actual weight of objects, by ε their expected weight, and by π their perceived weight. Furthermore, let us suppose that x and y are two nonweight properties of the stimuli (i.e., properties different from w) that may be conveniently manipulated in experiments. Let ξ ε (x) and υ ε (y) be real-valued functions (to be estimated) quantifying how the properties x and y influence the expected weight ε of the stimuli. Similarly, let ξ π (x) and υ π (y) be two real-valued functions quantifying how the same properties influence the perceived weight π of the stimuli. We presume the following additive combination functions: ε(x, y) = ξ ε (x) + υ ε (y) and π(w, x, y) = ω(w) + ξ π (x) + υ π (y), where ω(w) is a real-valued function quantifying the contribution of w to perceived weight. Consistent with the basic features of the expectation model, we also presume π(w, x, y) = ω(w) − ε(x, y) = ω(w) − ξ ε (x) − υ ε (y); that is, perceived weight depends on the contrast between the sensory information about actual weight and expected weight. The two equations above for π(w, x, y) imply the following equation:

$$ {\xi}_{\pi }(x)+{\upsilon}_{\pi }(y)=-{\xi}_{\varepsilon }(x)-{\upsilon}_{\varepsilon }(y). $$
(1)

In real experimental contexts, testing whether Eq. 1 holds true may constitute a valuable test of the validity of the expectation model. In this regard, we highlight two important considerations. The first is that the pair of functions ξ π (x) and υ π (y) (left-hand side of Eq. 1) and the pair of functions ξ ε (x) and υ ε (y) (right-hand side of Eq. 1) refer to different psychological variables (i.e., perceived and expected weight, respectively), and thus they should be estimated from different data sets. The second consideration is that the estimates of these functions may reasonably be defined, at best, as scale values on interval measurement scales—that is, scale values well determined “up to linear transformations.” Taken together, the two considerations imply that the measurement unit and the origin of estimated scale values ξ π (x) and υ π (y) may be somewhat arbitrary, and may be different from those of the estimated scale values ξ ε (x) and υ ε (y). For the sake of argument, we note that testing whether ξ π (x) = −ξ ε (x) and υ π (y) = −υ ε (y) would not constitute a reliable test of Eq. 1 because the outcome of this test would also depend on arbitrary features of the scale values involved. In the following paragraphs we will discuss three possible tests of Eq. 1 that are independent of such arbitrary features. We will discuss these tests in terms of distinct predictions, that should be met if Eq. 1—and thus the expectation model—are correct.

Prediction 1

For any fixed w, ξ π (x) and υ π (y) should be decreasing monotone functions if and only if ξ ε (x) and υ ε (y) are increasing monotone functions. In other words, the perceived weight of stimuli should decrease with x and y if and only if the expected weight of the stimuli increases with x and y.

Prediction 2

For any fixed value of w, any pair (x 1, x 2) of values of x, and any pair (y 1, y 2) of values of y, the relative contribution of x and y to perceived weight should be the same (within the two pairs of values) as their relative contribution to perceived weight (i.e., [ξ π (x 1) − ξ π (x 2)]/[υ π (y 1) − υ π (y 2)] = [ξ ε (x 1) − ξ ε (x 2)]/[υ ε (y 1) − υ ε (y 2)]).

Prediction 3

In a set of stimuli that vary in x and y, if two stimuli have the same physical weight, then the stimulus that is expected to be heavier should be perceived to be lighter than the other stimulus (i.e., for any two stimuli A and B, if w(A) = w(B) and ε(A) > ε(B), then π(A) < π(B)).

To our knowledge, Predictions 1–3 have never been formalized and tested in a single study, though they have been separately tested in different studies. Consistently with Prediction 1, Dijker (2008) showed that whereas the perceived weight of toy dolls decreased with their apparent size and with their apparent physical strength, their expected weight increased along with both factors. Similarly, Buckingham and Goodale (2013) found that whereas the perceived weight of cubic boxes decreased with their size and with the density of their surface material, their expected weight increased along with both factors. To our knowledge, a clear violation of Prediction 1 has never been reported in the literature on illusory weight perception. Prediction 2 has been previously tested and disconfirmed in Dijker’s (2008) study, which showed that whereas the expected weight of toy dolls was affected more by their apparent size than by their apparent physical strength, their perceived weight was affected more by their apparent physical strength than by their apparent size. Prediction 3 has typically been confirmed in studies in which only one nonweight property of the stimuli was manipulated, such as the density of the surface material (Harshfield & DeHardt, 1970) or the brightness (Walker, Francis, & Walker, 2010). However, Buckingham and MacDonald (2016) presented the participants in their study with three objects that had approximately the same physical weight, but that differed in various nonweight properties (i.e., a golf ball, a foam soccer ball, and an inflated beach ball). The results showed that, inconsistently with Prediction 3, the golf ball was both expected and perceived to be heavier than the foam soccer ball, and the latter was both expected and perceived to be heavier than the inflated beach ball.

Apart from Predictions 1–3, the expectation model has also been tested through the comparison of the algebraic form of functions describing how size and surface material combine in affecting expected and perceived weight of objects. Buckingham and Goodale (2013) presented the participants in their study with four cubic boxes of the same physical weight, varying in surface material and size according to a 2 × 2 factorial design (cf. Ross, 1969). They found that the expected weight of the stimuli increased with the product of their size and the density of their surface material, whereas their perceived weight decreased with the sum of the two experimental factors. This finding was inconsistent with the predictions from the expectation model, according to which, if expected weight increases with the product of two experimental factors, then perceived weight should decrease with the product, rather than with the sum, of the two experimental factors.

In sum, the results of empirical studies appear to suggest that, when two or more nonweight properties of stimuli are jointly manipulated in the same experiment, some predictions from the expectation model may fail to be met. This finding has cast doubts on the idea that the perceived weight of objects may simply be explained in terms of the contrast between the actual and the expected weight. Various alternative hypotheses have been proposed (see Buckingham, 2014; Buckingham & Goodale, 2013; Buckingham & MacDonald, 2016; Dijker, 2008, 2014), to which we will return below. We wish to emphasize here that, when two nonweight properties of the stimuli are jointly manipulated in a single experiment, testing Predictions 1–3 appears to constitute a promising strategy for defining the extent and the limits of the expectation model.

In the present study, we varied the surface material and the size of the stimuli (i.e., cubic boxes) according to a 3 (surface material) × 5 (size) factorial design, while keeping constant their physical weight. We used a variant of the random conjoint measurement paradigm (RCM; Falmagne, 1976; Ho, Landy, & Maloney, 2008; Knoblauch & Maloney, 2012, chap. 8) in order to obtain subjective interval scales of the contribution of surface material and size to the expected and the perceived weight of the stimuli, which allowed us to test Predictions 1–3 from the expectation model. Our contribution bears some resemblance to Buckingham and Goodale’s (2013) study, though here we tested crucial predictions of the expectation model that are independent of the correct assessment of the additive versus multiplicative forms of the combination functions.Footnote 2

Experiment 1

Method

Participants

Ten participants (six females, four males) took part in the experiment. They were undergraduate or graduate students at the Faculty of Psychology, University of Padua, ranging in age from 20 to 30 years (M = 24.4, SD = 4.17). All had normal or corrected-to-normal vision and normal function of their upper limbs. They were naive as to the purposes of the experiment and received €20 for their participation.

Stimuli

The stimuli were 15 cubic boxes each weighing 1.2 kg, which were built according to a 3 (Surface Material) × 5 (Size) factorial design. The surface material of the stimuli could be polystyrene (density = 0.017 g/cm3), wood (0.44 g/cm3), or black-painted solid clay (1.44 g/cm3). The sides of the stimuli varied from 14 to 26 cm in five uniform, 3-cm steps (see Fig. 1). The boxes were made of an internal thick cardboard structure, which was uniformly filled with a variable amount of cotton wool, plasticine, and lead shots so that it had a definite weight. Six square pieces of the corresponding surface material were glued on the faces of the cardboard structure. Because tiny slits appeared between the six pieces of the surface material, we attached white scotch tape strips on the sides of the boxes to cover the slits. In the case of the solid clay boxes, the scotch tape strips were invisible because they were painted black (see Fig. 1).

Fig. 1
figure 1

a A picture of the 15 experimental stimuli. From front to back, the surface materials are polystyrene, wood, and clay. b A detailed view of the three 20-cm stimuli. From left to right, the surface materials are polystyrene, wood, and clay

Design

There were two experimental conditions, which we called expected-weight condition and perceived-weight condition. The two conditions were further divided into two separate sessions lasting 30–40 min each. Overall, each participant took part in four experimental sessions, which took place at least two days apart. All the participants first completed the two experimental sessions relative to the expected-weight condition and then the two sessions relative to the perceived-weight condition.

In both conditions, the participants were presented with pairs of boxes (stimuli), and they were asked to indicate which of the two stimuli in each pair appeared to be heavier or whether they appeared to weigh the same. In the expected-weight condition, the comparative judgment had to be based only on the visual appearance of the stimuli; that is, touching the stimuli was not permitted. This was meant to quantify the contributions of surface material and size to the expected weight of the stimuli (see also Buckingham & Goodale, 2013; Buckingham & MacDonald, 2016; Dijker, 2008; Ellis & Lederman, 1998; Harshfield & DeHardt, 1970; Walker et al., 2010). In the perceived-weight condition, the participants used both hands together to lift each weight in turn. We could thus quantify the contributions of surface material and size to the perceived weight of the stimuli.

In both conditions, the participants were repeatedly presented with 30 pairs of stimuli. These pairs, which are represented by the black cells in Fig. 2, were conveniently selected among the set of all the possible 120 nonordered pairs of the 15 experimental stimuli (15 × 14/2 + 15). Pilot trials showed that, when two boxes with the same surface material but a different size were compared in the expected- or the perceived-weight judgment condition, the larger box was always expected to be heavier, and it was always perceived to be lighter than the smaller box. Similarly, for two boxes with the same size but a different surface material, the box with the denser surface material was always expected to be heavier, and it was always perceived to be lighter than the box with the less dense surface material. These preliminary results were fully consistent with the basic features of the SWI and the MWI, and they indicated that the comparisons between two boxes differing in one single variable (i.e., surface material or size) would produce obvious outcomes. We thus excluded these kinds of comparisons, corresponding to the dark gray cells in Fig. 2, from the final experimental design. For similar reasons, we also excluded the comparisons in which one stimulus had both a larger size and a denser surface material than the other stimulus (the light gray cells in Fig. 2). Finally, we excluded self-comparisons (the striped cells in Fig. 2). In sum, we included in the final experimental design only the 30 pairs of stimuli characterized by a “cue conflict,” meaning that the stimulus in the pair that had the larger size had the less dense surface material (black cells in Fig. 2).

Fig. 2
figure 2

A representation of all of the 120 possible nonordered pairs of the 15 experimental stimuli. The black cells correspond to the 30 pairs of stimuli that were included in the experimental design. The striped, light gray, and dark gray cells correspond to the pairs of stimuli that were excluded from the experimental design (see the text for explanations)

In each of the two conditions, the participants were randomly presented with four repetitions of the 30 nonordered pairs of stimuli thus selected. For each of the two conditions, the resulting 120 trials were split into two experimental sessions of 60 trials each, which lasted 30–40 min and took place at least two days apart. The left–right order of the stimuli in each pair was counterbalanced across repetitions. All of the participants first completed the two sessions relative to the expected-weight condition and then the two sessions relative to the perceived-weight condition. We note that our methods differ from the original version of the RCM paradigm (Ho et al., 2008) in two respects. First, we presented the participants with a subset of all the possible pairs of stimuli, whereas in the original version of the paradigm, the participants were presented with all possible nonordered pairs of the (visual) stimuli. Second, rather than using a classical two-alternative forced choice task, we used a “two-alternative nonforced choice” task, which means that the indifference response was always allowed. Factors that make the latter kind of task preferable to the former are indicated by, for example, García-Pérez and Alcalá-Quintana (2011, p. 2349).

Procedure

In each experimental session, the participants were seated at a square table (height 85 cm) with their elbows leaning on it. Before starting the first experimental session, they read and signed an informed consent form approved by the local ethics committee (Department of General Psychology, University of Padua). The experimenter sat in front of the participant. A screen was located approximately in the middle of the table, so that the participants could not see the positioning and removal of the stimuli, which were manually performed by the experimenter. The stimuli were placed on a thin layer of soft rubber in order to muffle the noise of the contact of the boxes with the hard surface of the table. Once the stimuli were set into place by the experimenter, the occluding screen was removed and the participants were allowed to evaluate the weights of the stimuli either visually or both visually and haptically, as we discuss below. After the participants had responded, the experimenter reset the occluding screen in its position, removed the previous pair of stimuli from the table, and placed a new pair of stimuli on the table.

At the beginning of the two sessions of the expected weight condition, written instructions informed the participants that they would be presented with pairs of cubic boxes differing in size and material, and that their task was to indicate which of the two boxes could be expected to be heavier or if they presumably had the same weight. The instructions further specified that the boxes could not be touched in any moment during the experiment, and that the participants had to base their judgment only on the visual appearance of the boxes. Finally, the instructions specified that the material of the stimuli could be polystyrene, wood, or black-painted solid clay.

At the beginning of the two sessions of the perceived-weight condition, written instructions informed the participants that they would be presented with pairs of cubic boxes differing in size and material and that their task was to lift the two boxes by using both hands together to lift each weight in turn, and to indicate which of the two boxes appeared to be heavier, or if they appeared to have the same weight. The instructions specified that the lifting action had to be performed by grasping each box on opposite sides and that the box on the left always had to be lifted first. The participants were further instructed to keep their elbows on the table during the lifting action, not to shake the boxes, and to respond only after the second lifting action had been completed.

Additive conjoint measurement model

We obtained estimates of the contributions of surface material and size to the expected and the perceived weights of the stimuli by fitting an additive conjoint measurement model to the individual data (see Ho et al., 2008). To define the model, let us denote by m 1, m 2, m 3 the three levels of the surface material variable (polystyrene, wood, and clay), and by s 1, . . . , s 5 the five levels of the size variable (14, 17, 20, 23, and 26 cm in side of the cubic boxes). Then, each stimulus object may be described as a pair (m i , s j ) of a level m i in surface material and a level s j in size, and simply denoted by t ij . For example, t 1,4 is the symbol of the 23-cm polystyrene box.

The relevant psychological property was expected weight in the expected-weight condition (only visual information available), and perceived weight in the perceived-weight condition (visual and haptic information available). In both cases, consistently with the basics of the additive conjoint measurement paradigm, we presumed that the surface material and size of a stimulus would contribute to the psychological property in the additive form—that is, for any stimulus t ij , the combined contribution would be the sum μ(m i ) + σ(s j ), where μ(m i ) and σ(s j ) are (as yet unknown) scale values associated with level m i of the surface material and level s j of size. Our choice of the additive response model was somewhat arbitrary, because we could equivalently use, for instance, a multiplicative response model. However, this arbitrary choice does not have any substantial implications for the outcomes of the study.

In each trial of the experiment, two stimuli, t ij and t kl , were presented in a two-alternative nonforced choice task. Our model presumes that the response in the trial was guided by the following rule:

$$ \begin{array}{l}{t}_{ij}\kern0.5em \mathrm{is}\kern0.5em \mathrm{judged}\kern0.5em \mathrm{lighter},\kern0.5em \mathrm{heavier},\kern0.5em \mathrm{or}\ \mathrm{in}\mathrm{different}\ \mathrm{in}\ \mathrm{weight}\ \mathrm{from}\kern0.5em {t}_{kl},\hfill \\ {}\mathrm{depending}\kern0.5em \mathrm{on}\kern0.5em \mathrm{whether}\hfill \\ {}\mu \left({m}_i\right)+\sigma \left({s}_j\right)-\mu \left({m}_k\right)-\sigma \left({s}_l\right)+Z\kern0.5em \mathrm{is}\kern0.5em \mathrm{smaller}\kern0.5em \mathrm{than}-\theta, \kern0.5em \mathrm{larger}\kern0.5em \mathrm{than}\kern0.5em \theta, \kern0.5em \mathrm{or}\kern0.5em \mathrm{lies}\kern0.5em \mathrm{between}-\theta \kern0.5em \mathrm{and}\kern0.5em \theta, \hfill \end{array} $$

where Z is a noise random variable with standard normal distribution, and θ > 0 is a (as yet unknown) decision criterion. In other words, we presume that in comparing stimulus t ij with stimulus t kl , the corresponding additive measures μ(m i ) + σ(s j ) and μ(m k ) + σ(s l ) are taken into account, and t ij is judged to be lighter than t kl if the difference [μ(m i ) + σ(s j )] − [μ(m k ) + σ(s l )], perturbed by noise variable Z, falls below a negative cut-point −θ, heavier if it falls above the symmetric positive cut-point θ, and indifferent if the difference lies between the two cut-points. This response rule is itself consistent with the random version of additive conjoint measurement (Falmagne, 1976; Ho et al., 2008). Note that in the rule, no mention is made of stimulus property w—that is, the physical weight of the boxes. This is because, in the expected-weight condition, this property was not made available to the participants, and it was constant over the whole set of stimuli in the perceived-weight condition.

The additive model thus implies a set [μ(m 1), μ(m 2), μ(m 3), σ(s 1), . . . ,σ(s 5), θ] of 3 + 5 + 1 = 9 unknown parameters, which may take different values in different experimental conditions (expected vs. perceived weight) and for different participants, and which we estimated by the method of maximum likelihood. Besides the estimates, the method also renders the statistic −log(max likelihood), which is a measure of the lack of fit: the greater the statistic, the worse the fit of the model to the data. This statistic is useful for judging the empirical suitability of the model by comparing it with analogous statistics implied for the same data by alternative models.

Saturated and single-factor models

For the sake of comparison, besides the additive model now described, we considered three other models to be tested on the same experimental data. One is the saturated model—that is, a model having as many parameters as the combinations of levels in our experimental design (plus the decision criterion parameter). When referred to the comparison between a stimulus t ij and a stimulus t kl , the model presumes the following response rule:

$$ \begin{array}{l}{t}_{ij}\kern0.5em \mathrm{is}\kern0.5em \mathrm{judged}\kern0.5em \mathrm{lighter},\kern0.5em \mathrm{heavier},\kern0.5em \mathrm{or}\kern0.5em \mathrm{in}\mathrm{different}\kern0.5em \mathrm{in}\kern0.5em \mathrm{weight}\kern0.5em \mathrm{from}\kern0.5em {t}_{kl},\hfill \\ {}\mathrm{depending}\kern0.5em \mathrm{on}\kern0.5em \mathrm{whether}\hfill \\ {}{\tau}_{ij}-{\tau}_{kl}+Z\kern0.5em \mathrm{is}\kern0.5em \mathrm{smaller}\kern0.5em \mathrm{than}-\theta, \kern0.5em \mathrm{larger}\kern0.5em \mathrm{than}\kern0.5em \theta, \kern0.5em \mathrm{or}\kern0.5em \mathrm{lies}\kern0.5em \mathrm{between}-\theta \kern0.5em \mathrm{and}\kern0.5em \theta, \hfill \end{array} $$

where the parameter τ ij (for i = 1, 2, 3 and j = 1, . . . , 5) represents the joint contributions of surface material m i and physical size s j to the relevant psychological property of the stimulus t ij , and Z and θ have the meanings described above. Because of its large number of parameters (3 × 5 + 1 = 16), this model was expected to attain optimum fitting to the data—that is, a minimum −log(max likelihood).

The other two models separately express the hypotheses that the relevant psychological property (i.e., expected or perceived weight) only depends on one of the stimulus factors in the experiment. Specifically, the material-only model involves 3 + 1 = 4 free parameters and presumes the following response rule:

$$ \begin{array}{l}{t}_{ij}\kern0.5em \mathrm{is}\kern0.5em \mathrm{judged}\kern0.5em \mathrm{lighter},\kern0.5em \mathrm{heavier},\kern0.5em \mathrm{or}\kern0.5em \mathrm{in}\mathrm{different}\kern0.5em \mathrm{in}\kern0.5em \mathrm{weight}\kern0.5em \mathrm{from}\kern0.5em {t}_{kl},\hfill \\ {}\mathrm{depending}\kern0.5em \mathrm{on}\kern0.5em \mathrm{whether}\hfill \\ {}\mu \left({m}_i\right)-\mu \left({m}_k\right)+Z\kern0.5em \mathrm{is}\kern0.5em \mathrm{smaller}\kern0.5em \mathrm{than}-\theta, \kern0.5em \mathrm{larger}\kern0.5em \mathrm{than}\kern0.5em \theta, \kern0.5em \mathrm{or}\kern0.5em \mathrm{lies}\kern0.5em \mathrm{between}-\theta \kern0.5em \mathrm{and}\kern0.5em \theta, \hfill \end{array} $$

where μ(m i ) and μ(m k ) are the contributions of surface materials m i and m k to the relevant psychological property of stimuli t ij and t kl . Similarly, the size-only model involves 5 + 1 = 6 free parameters and presumes the following response rule:

$$ \begin{array}{l}{t}_{ij}\kern0.5em \mathrm{is}\kern0.5em \mathrm{judged}\kern0.5em \mathrm{lighter},\kern0.5em \mathrm{heavier},\kern0.5em \mathrm{or}\kern0.5em \mathrm{in}\mathrm{different}\kern0.5em \mathrm{in}\kern0.5em \mathrm{weight}\kern0.5em \mathrm{from}\kern0.5em {t}_{kl},\hfill \\ {}\mathrm{depending}\kern0.5em \mathrm{on}\kern0.5em \mathrm{whether}\hfill \\ {}\sigma \left({s}_j\right)-\sigma \left({s}_l\right)+Z\kern0.5em \mathrm{is}\kern0.5em \mathrm{smaller}\kern0.5em \mathrm{than}-\theta, \kern0.5em \mathrm{larger}\kern0.5em \mathrm{than}\kern0.5em \theta, \kern0.5em \mathrm{or}\kern0.5em \mathrm{lies}\kern0.5em \mathrm{between}-\theta \kern0.5em \mathrm{and}\kern0.5em \theta, \hfill \end{array} $$

where σ(s j ) and σ(s l ) are the contributions of sizes s j and s l to the relevant psychological property of stimuli t ij and t kl . These two models are quite simple (small number of parameters), thus they were expected to attain a worse fit to the data than would the additive model.

The four models form a three-level hierarchy, since the material-only and size-only models are nested within the additive model, which in turn is nested within the saturated model. The results we obtained in comparing these models will be commented on at the end of the next section.

Results and discussion

Table 1 shows the individual parameter estimates for the additive model in the expected-weight condition (part A) and in the perceived-weight condition (part B). Actually, the values in the tables are the differences μ(m i ) − μ(m 1) for i = 1, 2, 3 and the differences σ(s j ) − σ(s 1) for j = 1, . . . , 5, because such transformations allowed us to simplify the comparison between the scale values. These transformations are permitted because, within the frame of the RCM paradigm, the parameter estimates are determined up to linear transformations. The solid lines in Fig. 3 show the parameter estimates reported in Table 1, as they were averaged across the ten participants. Specifically, the upper panels show the averaged contributions of the three levels of variable surface material (upper left panel), and of the five levels of variable size (upper right panel), to the expected weights of the stimuli. The lower panels show the analogous averaged contributions to the perceived weights of the stimuli. Note that in Fig. 3 and hereafter in the text, subscript ε or π refers to the parameter estimates in the expected- (or perceived-) weight condition. The comparison between the upper and lower panels in Fig. 3 shows that, consistent with Prediction 1 from the expectation model, the expected weight of the stimuli increased with the density of their surface material and with their size, whereas their perceived weight decreased with both factors.

Table 1 Individual parameter estimates for the additive model in the expected-weight condition of Experiment 1 (A), the perceived-weight condition of Experiment 1 (B), and the perceived-weight condition of Experiment 2 (C)
Fig. 3
figure 3

Solid lines indicate the mean estimates of the material parameters (left panels) and of the size parameters (right panels) in the expected-weight condition (top panels) and the perceived-weight condition (bottom panels) of Experiment 1. Dashed lines (bottom panels) indicate the mean estimates of the material parameters (left panels) and of the size parameters (right panels) in the perceived-weight condition of Experiment 2. The material parameters and size parameters are measures of the contributions of the two stimulus factors (surface material and size) to expected weight and perceived weight (see the text). The symbols m 1, m 2, and m 3 represent the three levels of the surface material variable (polystyrene, wood, or clay), whereas the symbols s 1, . . . , s 5 represent the five levels of the size variable (14, 17, 20, 23, or 26 cm to a side of the cubic boxes). The vertical bars represent the standard errors of the means

Figure 3 also shows that, inconsistently with Prediction 2, the relative contributions of surface material and size to the expected weight of the stimuli were very different from their relative contributions to the perceived weight of the stimuli. In support of this observation, we note that the difference between the contributions of the densest and the least dense surface material to the expected weight of the stimuli [μ ε (m 3) − μ ε (m 1)] was about twice as big as the difference between the contributions of the largest and the smallest sizes to the expected weight [σ ε (s 5) − σ ε (s 1)]. A paired-sample t test showed that the former difference was significantly larger than the latter, as t(9) = 7.157, p < .001, Cohen’s d z = 2.26. In contrast, the difference between the contributions of the densest and the least dense surface material to the perceived weight of the stimuli [μ π (m 3) – μ π (m 1)] was about 2.5 times as small in absolute value as the difference between the contributions of the largest and the smallest size to the perceived weight [σ π (s 5) – σ π (s 1)]. A paired-sample t test showed that the latter absolute difference was significantly larger than the former one, t(9) = 5.821, p < .001, Cohen’s d z = 1.84.

A possible interpretation of these results, which are clearly at odds with Prediction 2, is that in the expected-weight condition the participants overestimated the contribution of surface material relative to that of size. In order to test this hypothesis, we computed the contributions of the two experimental factors to the natural weight of the stimuli, which we determined by emptying the stimulus boxes and weighing their shells. In other words, the natural weight is the weight that a box would have had if it were not filled so as to have the same weight as the other boxes. Let n ij be the natural weight of the stimulus, characterized by the ith level of factor Surface Material and the jth level of factor Size. Also let μ n (m i ) and σ n (s j ) be the contributions of the ith level of factor Surface Material, and of the jth level of factor Size, to the natural weight. For each i = 1, 2, 3, μ n (m i ) was obtained by averaging the natural weights of the five stimuli with the ith level of factor Surface Material. For each j = 1, . . . , 5, σ n (s j ) was obtained by averaging the natural weights of the three stimuli with the jth level of factor Size [i.e., for each i = 1, 2, 3 and j = 1, . . . , 5, μ n (m i ) = (Σ j n ij )/5 and σ n (s j ) = (Σ i n ij )/3]. For the sake of comparison with Fig. 3, Fig. 4 represents the differences μ n (m i ) − μ n (m 1) (left panel) and the differences σ n (s j ) − σ n (s 1) (right panel). The results clearly show that surface material contributed more than size to the natural weights of the stimuli. The difference between the contributions of the densest and the least dense surface materials to the natural weight [μ n (m 3) – μ n (m 1)] was about twice as big as the difference between the contributions of the largest and the smallest sizes to the natural weight [σ n (s 5) – σ n (s 1)]. These results show that the relative contributions of surface material and size to the expected weight were remarkably similar to the contributions of the two factors to the natural weight (see the upper panels in Fig. 3). Therefore, the violation of Prediction 2 cannot be due to unrealistic weight expectations. We will discuss possible alternative interpretations of the violation of this prediction below.

Fig. 4
figure 4

Estimates of the contributions of the three levels of the Surface Material factor (left panel) and of the five levels of the Size factor (right panel) to the natural weights of the stimuli. The symbols m 1, m 2, and m 3 represent polystyrene, wood, and clay, respectively, whereas the symbols s 1, . . . , s 5 represent 14, 17, 20, 23, and 26 cm to a side of the cubic boxes

For each pair of stimuli that we presented to the participants, Fig. 5a shows the probabilities that the stimulus represented in the row was expected to be heavier than the stimulus represented in the column. Symmetrically, Fig. 5b shows the probabilities that the stimulus represented in the row was perceived to be heavier than the stimulus represented in the column. On each trial, the participants’ responses were assigned the values 1, 0, and .5, depending on whether the stimulus in the row was judged to be heavier, lighter, or as heavy as the stimulus in the column. The probability values reported in Fig. 5a and b were obtained by averaging the participants’ responses across the four repetitions of each pair of stimuli. Therefore, in Fig. 5a (or Fig. 5b), a probability larger than .5 indicates that the stimulus in the row tended to be expected (or perceived) to be heavier than the stimulus in the column. The comparison between Fig. 5a and b reveals a surprising pattern of results, which is sharply at odds with Prediction 3 from the expectation model. With a few exceptions, the stimulus in the pair that tended to be expected to be heavier also tended to be perceived to be heavier. For instance, the 14-cm wooden box was both expected and perceived to be heavier than the 23-cm polystyrene box, with probabilities .75 and .94, respectively. We marked in boldface type the probability values corresponding to the few pairs of stimuli (eight) in which the stimulus that was expected to be heavier was perceived to be lighter.

Fig. 5
figure 5

a In each cell, the probability (estimated from the data of Exp. 1) shows the likelihood that the stimulus in the row was expected to be heavier than the stimulus in the column. b Probabilities (estimated from the data of Exp. 1) that the stimulus in the row was perceived to be heavier than the stimulus in the column. c Probabilities (estimated from the data of Exp. 2) that the stimulus in the row was perceived to be heavier than the stimulus in the column. For panels a and b, the boldface numbers are those for the pairs of stimuli in Experiment 1 for which the stimulus expected to be heavier was also that perceived to be lighter. For panels a and c, the underlined numbers are those for the pairs of stimuli for which the stimulus expected to be heavier in Experiment 1 was perceived to be lighter in Experiment 2

A meaningful relationship does appear between the results in Fig. 3 (estimated scale values) and those in Fig. 5a and b (estimated probabilities). The upper panels in Fig. 3 show that the expected weights of the stimuli were mainly determined by the densities of their surface materials in a positive way. Indeed, as is shown in Fig. 5a, the stimulus in the pair with the denser surface material tended to be expected to be heavier than the stimulus with the less dense surface material, even if the former stimulus was smaller than the latter (e.g., the 14-cm clay box was expected to be heavier than the 26-cm polystyrene box). Instead, the lower panels in Fig. 3 show that the perceived weights of the stimuli were determined mainly by their sizes in a negative way. Indeed, as is shown in Fig. 5b, the stimulus in the pair with the smaller size tended to be perceived to be heavier than the stimulus with the larger size, even if the surface material of the former was denser than the surface material of the latter (e.g., the 14-cm clay box was perceived to be heavier than the 26-cm polystyrene box). We may say that, for the set of stimuli that we used, the SWI tended to prevail over the MWI, because with few exceptions the participants’ responses in the perceived-weight condition proved to depend on differences between the sizes of the stimuli, rather than on differences between the densities of their surface materials.

We conclude this section with some comments on the results we obtained in comparing the additive model with the saturated and single-factor models. The hypothesis that a model with fewer parameters performs worse than a model with more parameters can be tested by the likelihood-ratio criterion (Dobson & Barnett, 2008, p. 80), which is based on the computation of the log-likelihood-ratio statistic (LLR). Because we were interested in a general comparison between the four models (i.e., saturated, additive, size-only, and material-only), we fitted them to two pooled datasets, which are the set of data from the ten participants in the expected-weight judgment condition and the analogous set in the perceived-weight judgment condition.Footnote 3 The essential statistics that we obtained are shown in Table 2, in which, besides the numbers of free parameters in the models, we give the −log(max likelihood) statistic resulting in each fit of the model to the data. The LLR statistics for the comparison between the additive model (i.e., the nested model) and the saturated model (the reference model) in the expected-weight and perceived-weight conditions were 1.68 and 9.7, respectively, both below the critical point χ 2(.05, 16−9) = 14.067. This signifies that the additive model should be preferred to the saturated model because the former does not perform significantly worse than the latter in terms of goodness of fit, and it is simpler in terms of the number of parameters. The LLR statistics for comparing the size-only versus the additive model were 686.76 and 134.26, which fall well above χ 2(.05, 9−6) = 7.814; the statistics for comparing the material-only versus the additive model were 149.82 and 412.8, which are definitely larger than χ 2(.05, 9−4) = 11.070. We may then conclude that neither the size-only nor the material-only model is suitable to replace the additive model, because of their much worse fits to the data.

Table 2 For each model that we fitted to the pooled data of Experiment 1, the table shows the number of parameters (N Par.) and the values of the –log(max likelihood) statistic for the two experimental conditions

Experiment 2

So far, we have focused only on the contrast between actual and expected weight as a possible cause of weight illusions. However, Amazeen and Turvey (1996) showed that the negative relationship between size and perceived weight (i.e., the SWI) may also depend on people’s sensitivity to the rotational inertia of objects. Rotational inertia is a higher-order physical variable that refers to the resistance of objects to rotation in space, and it depends on a complex interaction between the masses, the sizes, and the shapes of objects (i.e., it depends on the mass distribution of the objects). The effects of rotational inertia on perceived weight appear to depend both on the mode of lifting and on the physical properties of the lifted objects. For instance, the effects of rotational inertia were found to be strong in the case of elongated objects that were rotated in space (Amazeen, 1997, Exp. 1; Amazeen & Turvey, 1996), but to be negligible in the case of spherical objects that were grasped and held in the hand (Zhu, Shockley, Riley, Tolston, & Bingham, 2013).

A tentative interpretation of the results of Experiment 1 is that the perceived weights of the stimuli depended not only on the contrast between the actual and expected weights (i.e., a top-down perceptual process), but also on participants’ sensitivities to rotational inertia (i.e., a bottom-up perceptual process). When an object is grasped and lifted, as in Experiment 1, weight expectations and rotational inertia are both available to the participants, because the former are prompted by the visual appearance of the stimuli (Buckingham & Goodale, 2010), whereas the latter can be “picked up” by the participants through the lifting action (Amazeen & Turvey, 1996). If weight expectations and rotational inertia jointly contributed to the perceived weights of the stimuli, then the negative relationship between the size of the stimuli and their perceived weight would depend upon two distinct and independent perceptual processes: first, the contrast between actual and expected weights, and second, the participants’ sensitivities to rotational inertia. Instead, because rotational inertia is independent of surface material, the negative relationship between the density of the surface material of the stimuli and their perceived weight would only depend upon the contrast between the actual and expected weights. According to this hypothesis, the negative relationship between size and perceived weight should be stronger than that between the density of the surface material and perceived weight, because the former relationship would subtend two distinct perceptual processes rather than one. The results of Experiment 1 are consistent with this hypothesis, which might provide a suitable explanation for the violations of Predictions 2 and 3 from the expectation model. We designed Experiment 2 to provide an empirical test of this hypothesis.

In a different manner from Experiment 1, the participants in Experiment 2 were asked to lift the stimuli by using a string that was attached to the top surface, because this procedure should eliminate or greatly reduce the perceptual information available regarding rotational inertia (Amazeen & Turvey, 1996). If in Experiment 1 the violation of the predictions from the expectation model was due to the bottom-up influence of rotation inertia on perceived weight, then those predictions should be met in Experiment 2, in which rotational inertia could not influence the perceived weights of the stimuli. Moreover, if rotational inertia had affected the perceived weights of the stimuli in Experiment 1, then the relative contribution of size to perceived weight should be smaller in Experiment 2 than in Experiment 1.

Method

Participants

Ten participants (five females, five males) took part in the experiment. They were undergraduate or graduate students at the Faculty of Psychology, University of Padua, and ranged in age from 20 to 26 years (M = 22.4, SD = 1.96). All had normal or corrected-to-normal vision and normal function of their upper limbs. They were naive as to the purposes of the experiment and received €10 for their participation. None of them had participated in Experiment 1.

Stimuli

The stimuli were the same as in Experiment 1, except that a strong 20-cm string was attached to the top surface of each box.

Design

The design of Experiment 2 was the same as that of Experiment 1, except that in the former we tested only the perceived-weight condition. Because the mode of lifting should not affect the expected weight of the stimuli, we compared the perceived weights of the stimuli in Experiment 2 with their expected weights in Experiment 1.

Procedure

The procedure was the same as in the perceived-weight condition of Experiment 1, except for the following details. The participants stood in front of the table and were asked to use their favored hand. They were instructed to lift the stimuli by grasping the string at the top end, and to move their forearm so that the stimuli were raised and lowered vertically, without any lateral motion or rotation.

Results and discussion

The dashed lines in the lower panels of Fig. 3 show the parameter estimates for the additive model that are reported in Table 1 (part C), as they were averaged across the ten participants. The results show that the relative contribution of size to the perceived weight of the stimuli was decidedly larger in absolute value than that of surface material, which means that Prediction 2 was also violated in Experiment 2. Moreover, Fig. 5c shows the probabilities that the stimulus in the row was perceived to be heavier than the stimulus in the column. The comparison between Fig. 5a and c reveals several violations of Prediction 3, in that, with only two exceptions, the stimulus in the pair that tended to be expected to be heavier (in Exp. 1) also tended to be perceived to be heavier (in Exp. 2). We underlined the probability values corresponding to the two pairs of stimuli in which, consistent with Prediction 3, the stimulus that was expected to be heavier was perceived to be lighter. In sum, clear violations of Predictions 2 and 3 also appeared in Experiment 2, which disconfirms the hypothesis that the violation of such predictions could be imputed to the bottom-up effects of rotational inertia on perceived weight.

Though the bottom-up influence of rotational inertia on perceived weight cannot explain the violation of the predictions from the expectation model, it still might have exerted some influence on the perceived weight of the stimuli in Experiment 1. In order to test this hypothesis, we compared the relative contributions of surface material and size in the two experiments by means of the measure [σ π (s 5) – σ π (s 1)]/[μ π (m 3) – μ π (m 1)]. We computed this measure on the individual data of both experiments: The larger the measure, the larger the contribution of size to the perceived weight, as compared with the contribution of surface material. An independent-sample t test showed that the measure in Experiment 2 was not significantly different from the measure in Experiment 1, since t(9.04) = 0.6, p = .56, Cohen’s d s = 0.27. A Bayesian independent-sample t test provided support for the null hypothesis of equivalence between the two experiments in that measure, since the JZS BF01 = 2.21. We thus conclude that rotational inertia exerted a null or negligible influence on the perceived weights of the stimuli in our study. Zhu et al. (2013) obtained a similar result for a set of spherical objects differing in size, mass, and mass distribution. As regards the statistical comparison of the fits of the models to the data, the results were in line with those of Experiment 1: The additive model performed significantly better than the single-factor models, and it did not perform significantly worse than the saturated model.

General discussion

The results of the present study showed some systematic violations of the predictions from the expectation model. These violations proved to be consistent across different modes of lifting (i.e., grasping the stimuli on opposite sides, or lifting them by means of a string attached to their top surface). Inconsistent with Prediction 2 from the model, the expected weight depended more on the surface materials than on the sizes of the stimuli, whereas the perceived weight depended more on their sizes than on their surface materials. The inconstancy in the relative contributions of the two factors between the two experimental conditions was also revealed by the fact that, for 22 out of the 30 pairs of stimuli in Experiment 1, and for 28 out of the 30 pairs of stimuli in Experiment 2, the stimulus expected to be heavier in a pair was also perceived to be heavier. This peculiar result is inconsistent with Prediction 3, and is especially at odds with the idea that the perceived weights of objects can simply be explained in terms of the contrast between their physical and expected weights. Buckingham and MacDonald (2016) recently reported a similar finding, and the results of our study extend and generalize this finding to a large set of objects that jointly differed in two properties—that is, surface material and size.

For technical reasons, the stimuli in our experiments were quite heavy (i.e., 1.2 kg), as compared with those typically employed in most weight illusion studies. Could the outcomes of our study have been somehow dependent on this feature of the stimuli? Weight illusions usually depend on small differences in perceived weight; thus, their magnitude is affected by weight discrimination (i.e., people’s sensitivity to weight differences). It has been shown (by, e.g., Jones & Burgess, 1998; Ross & Gregory, 1970; see Nicolas et al., 2012, p. 124, for a discussion) that weight discrimination is lower when the physical weights of the stimuli exceed the range of weights within which the perceptual system is optimally adapted. In the light of this finding, we cannot exclude that the absolute magnitudes of the SWI and the MWI in our study could have been lower than has been shown in studies in which lighter stimuli were used. Nonetheless, our test of the expectation model was based on hypotheses concerning the relative magnitudes of the two illusions, rather than on hypotheses concerning their absolute magnitudes. Insofar as weight discrimination affects the magnitudes of both illusions in similar ways, as it appears reasonable to presume, the main outcomes of our study can be generalized to a wide range of physical weights.

As we argued above, the violations of Predictions 2 and 3 from the expectation model could not be explained in terms of unrealistic weight expectations, or in terms of the bottom-up influence of rotational inertia on perceived weight. Dijker (2008, 2014) suggested that multiple perceptual processes may contribute to the perceived weights of objects. In Dijker’s (2008) study, the stimuli were toy dolls that varied in apparent size and apparent physical strength; similar to what we found in the present study, the author found that the relative contributions of the two experimental factors to the expected weight were clearly different from their relative contributions to perceived weight. Because the contrast between actual and expected weights, taken alone, could not explain the experimental results, the author hypothesized that the perceived weights of the toy dolls were also affected by the contrast between the force with which the dolls were lifted and their actual weights. However, the results of Buckingham and Goodale’s (2013) study showed that, for objects varying in surface material and size—as was the case with the stimuli in our experiments—the lifting forces do not appear to exert a significant influence on perceived weight. In considering this evidence, we suggest that Dijker’s (2008) explanation cannot be generalized to the results of our study. However, we add a note of caution, because the stimuli in Buckingham and Goodale’s (2013) study were lighter than those in our study, and little is known about the relationship between lifting forces and perceived weights in the case of heavy objects.

Buckingham (2014; see also Buckingham & Goodale, 2013; Buckingham & MacDonald, 2016) recently proposed a revision of the expectation model, according to which weight perception is affected by implicit, rather than explicit, weight expectations. Explicit weight expectations can be directly measured by asking participants to predict the weight of an object on the basis of its visual appearance, as we did in the present study. Implicit weight expectations would instead be impervious to consciousness and would depend on ontogenetic and phylogenetic development. Buckingham (2014) highlighted various differences between explicit and implicit weight expectations; for the purposes of our discussion, here we focus on the hypothesis that explicit expectations are mostly affected by the densities of the surface materials of objects, and only to a lesser extent by their sizes, whereas implicit expectations would mostly be affected by the sizes of objects rather than by the densities of their surface materials. Therefore, whereas the participants may explicitly expect that a small box with a relatively dense surface material (e.g., 14-cm clay box) would be heavier than a large box with a low-density material (e.g., a 23-cm polystyrene box), they may implicitly expect that the latter would be heavier than the former. If the alleged difference between implicit and explicit expectations is correct, and if it is true that perceived weight is influenced by implicit rather than explicit expectations, then the results of our study would be consistent with Buckingham’s (2014) revised version of the expectation model.