Context dependency in risky decision making: Is there a description-experience gap?

Inkyung Park; Paul D. Windschitl; Andrew R. Smith; Shanon Rule; Aaron M. Scherer; Jillian O. Stuart

doi:10.1371/journal.pone.0245969

Abstract

When making decisions involving risk, people may learn about the risk from descriptions or from experience. The description-experience gap refers to the difference in decision patterns driven by this discrepancy in learning format. Across two experiments, we investigated whether learning from description versus experience differentially affects the direction and the magnitude of a context effect in risky decision making. In Study 1 and 2, a computerized game called the Decisions about Risk Task (DART) was used to measure people’s risk-taking tendencies toward hazard stimuli that exploded probabilistically. The rate at which a context hazard caused harm was manipulated, while the rate at which a focal hazard caused harm was held constant. The format by which this information was learned was also manipulated; it was learned primarily by experience or by description. The results revealed that participants’ behavior toward the focal hazard varied depending on what they had learned about the context hazard. Specifically, there were contrast effects in which participants were more likely to choose a risky behavior toward the focal hazard when the harm rate posed by the context hazard was high rather than low. Critically, these contrast effects were of similar strength irrespective of whether the risk information was learned from experience or description. Participants’ verbal assessments of risk likelihood also showed contrast effects, irrespective of learning format. Although risk information about a context hazard in DART does nothing to affect the objective expected value of risky versus safe behaviors toward focal hazards, it did affect participants’ perceptions and behaviors—regardless of whether the information was learned from description or experience. Our findings suggest that context has a broad-based role in how people assess and make decisions about hazards.

Citation: Park I, Windschitl PD, Smith AR, Rule S, Scherer AM, Stuart JO (2021) Context dependency in risky decision making: Is there a description-experience gap? PLoS ONE 16(2): e0245969. https://doi.org/10.1371/journal.pone.0245969

Editor: Darrell A. Worthy, Texas A&M University, UNITED STATES

Received: July 23, 2020; Accepted: January 11, 2021; Published: February 11, 2021

Copyright: © 2021 Park et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files are available from the OSF database (https://osf.io/k93xv/).

Funding: This work was supported by the National Science Foundation (NSF SES 09-61252 awarded to Paul Windschitl; NSF SES-1851738 awarded to Paul Windschitl and Andrew Smith) and the National Institutes of Health T32 pre-doctoral training grant (T32GM108540 awarded to Inkyung Park). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

When making decisions involving risk, one may learn about the potential risk in two different ways—from description or from experience. Decision from description (DFD) is a term that refers to when people make decisions after receiving unequivocal information about the values and likelihoods of possible outcomes, usually expressed numerically. Decision from experience (DFE) refers to when people make decisions without this unequivocal information. Instead, through repeated encounters, they must observe or experience how decision options play out in order to gain knowledge of possible outcome values and likelihoods. The term description-experience gap indicates systematic discrepancies between decisions made under these two formats or ways of learning about risk information [1–3].

Research on understanding description-experience gaps has proliferated over the past two decades. Most of the work has focused on gaps in people’s decisional tendencies relevant to rare events, but other types of gaps have also been addressed. For instance, recent studies have found that gaps may emerge in loss aversion [4,5], preference reversal [6], and ambiguity aversion [7–9]. Despite this growing literature on description-experience gaps, the question of whether a gap would also emerge in how context information influences risky decisions remains relatively unexplored. Here, we explore a possible description-experience gap in such context effects.

By context effects, we refer to situations where reactions to risk information about one hazard are affected by salient risk information about another hazard, even though the latter is not objectively relevant to the task at hand. These context effects can emerge in either one of two directions, assimilation or contrast. Consider a case in which people learn about a hazard, which we will call the focal hazard, that has a 20% likelihood of causing harm. In addition, they learn about another hazard, which we will call a context hazard, that has a 30% likelihood of causing harm. How will people respond when later faced with the focal hazard? The term assimilation refers to a shift in evaluated riskiness of the focal hazard towards the riskiness of the context hazard, whereas the term contrast refers to a shift in evaluated riskiness of the focal hazard away from the riskiness of the context hazard. In this example where the objective likelihoods are 20% for the focal hazard and 30% for the context hazard, an assimilation effect would indicate that the risk information about the context hazard makes the focal hazard seem more dangerous than it otherwise would. A contrast effect would indicate that the risk information about the context hazard would make the focal hazard seem less dangerous than it otherwise would.

A variety of studies using focal and contextual risk information have revealed contrast effects in people’s risky decision making [10–12]. However, previous studies relied on experimental paradigms that mainly involved DFD, and therefore it is unclear if a similar contrast effect would be observed in DFE. For this reason, a systematic comparison of DFD vs. DFE in context-dependent risky decision making is required. To determine if there is a description-experience gap in the degree to which context influences risk perception and decisions, we implemented a computerized game called the Decisions about Risk Task (DART).

Below, we review the current literature on description-experience gaps. We then provide a brief overview of the current DART paradigm and our rationale for determining if context effects differ across DFD and DFE with regard to the theoretical perspectives on context effects. Finally, we report the findings from two experiments that tested if there is a description-experience gap in how context affects risk perception and decision making.

Description-experience gaps

Traditionally, studies on risky decisions were mostly restricted to DFD, using an approach often referred to as a gambling paradigm. In the typical gambling paradigm, individuals make a series of decisions among options that explicitly identify possible monetary outcomes and their probabilities in a numerical format. However, the paradigm does not fully capture the range of behaviors that are evident in our everyday decisions; we often make choices based on past experiences without having a clear outline of the outcomes and probabilities related to the risk. Consequently, the inclusion of DFE in this research has been growing over the last few years [1,2,13,14]. Many DFE paradigms build on classical gambling paradigms, whereby participants are prompted to choose a gamble of their preference. However, in DFE paradigms, participants are not provided with known probabilities nor outcomes associated with gambles. Instead, they obtain information regarding the distribution of a given gamble by either actively making or passively watching iterative choices and the resultant outcomes [3,15].

By implementing DFE paradigms, researchers have looked for description-experience gaps in diverse decision phenomena. The most well-known description-experience gap involves decisions regarding probabilistically rare events, conventionally defined as events with less than a 20% chance of occurring [2,3,15]. In DFD paradigms, individuals show decision patterns in which they are overly sensitive to the possibility of a rare event. However, in DFE paradigms, they show decision patterns in which they are less or even under-sensitive to the possibility of a rare event [3]. This particular form of a description-experience gap has seen a great deal of attention recently. In fact, the term “description-experience gap” is often interpreted as necessarily referring to this particular type of gap. However, it is important to note that other forms of description-experience gaps have been investigated. As noted earlier, such gaps have been found in the context of loss aversion [4,5], preference reversal [6], and ambiguity aversion [7–9].

In particular, the type of description-experience gaps most relevant to the current study are those of the decoy effect [16–18]. The decoy effect—also known as the attraction effect or the asymmetric dominance effect—refers to a change in preference among equally compelling options when a less-attractive ‘decoy’ alternative is added to the decision context [19]. Specifically, the choice share of the option that is similar to, but dominates, the decoy, increases.

Recent findings suggest the presence of a description-experience gap in the decoy effect [16–18]. Ert & Lejarraga [16] studied the decoy effect in choices among gambles. In their studies, the decoy effect was observed in DFD, but not in DFE. It is important to note that the decoy effects in DFE were absent mainly because it was harder for participants to identify the dominated status of a decoy option when the option sets were learned from experience rather than from description [16–18]. Similarly, Hadar et al. [18] also demonstrated the description-experience gap in the decoy effect, but when the differences between the options were made prominent and easier to identify, participants who were able to recognize the dominance relationship among the options exhibited a decoy effect in DFE. Participants who failed to recognize the dominance relationship did not exhibit a decoy effect. To summarize, the current literature suggests that the presence of a decoy may differently affect choices among gambles depending on whether the choice options were learned from experience vs. description, but only to the extent that people fail to identify and distinguish a key difference between the options learned from experience.

Although research on the decoy effect provides some evidence about how description-experience gaps might or might not be relevant to context effects, the decoy effect is one specific example within a broad range of phenomena influenced by context. We aimed to investigate the influence of the DFD vs. DFE formats on the directionality and the magnitude of a basic form of a context effect—namely, how the probabilistic risk level of one hazard would influence decisions about another risky hazard. Our approach to studying context effects shares some broad features with the studies on decoy effects in choices among gambles, but our operationalizations and paradigm are distinct. Our project was not designed as a specific extension of findings on the decoy effect, but it does speak to whether those findings about description-experience gaps generalize across different forms of context effects.

Overview of the decisions about risk task

We used a specific task—the Decisions about Risk Task (DART)—to study this context effect. The DART was originally developed to instantiate a virtual experimental environment where people learn about risks/hazards and then make decisions in light of those risks/hazards [20]. In the current version of the DART, participants played a virtual character with the goal of accumulating as many points as possible throughout the task. Participants first learned about how often two hazards they would encounter might explode. Then they had to make a series of risky decisions related to those hazards.

Critically, two between-participant manipulations were used to test for a description-experience gap in context effects. First, we manipulated the learning format in which participants acquired likelihood information about two hazards. Some participants were given numerical descriptions about the explosion rates of the hazards (i.e., DFD format), whereas other participants could only learn about the likelihood information from a series of demonstration trials from which they could estimate the explosion rates of the hazards (i.e., DFE format). Second, the context risk rate was manipulated. The explosion rate of one of the hazards (hereafter called the focal hazard) was always fixed at same level (i.e., at 33%), whereas the explosion rate of the second hazard (hereafter called the context hazard) was varied across groups (i.e., at 20% or 46.7%).

After they learned about the risk information, participants encountered a series of test trials. In each trial, they needed to choose between a safe path and a risky path (i.e., a path on which a hazard’s explosion would cause harm). Taking the risky pathway could result in either points gained from retrieving a coin or points lost if a hazard exploded. Meanwhile, the safe pathway always offered neither a gain nor loss of points.

On each trial, we recorded the participants’ decisions, allowing us to calculate the proportion of risky choices per hazard by conditions. Our key interests were 1) how the proportion of risky choices for the focal hazard varied as function of the context risk rate, and 2) whether the direction or magnitude of this effect varied as a function of learning format. Our study also included measures of the perceived likelihood of the hazards exploding.

Predictions and theoretical approaches for contrast vs. assimilation in context effects

We began this study with a clear prediction about how the context-risk-rate manipulation would influence behavior toward the focal hazard in a DFD format. Specifically, we expected to observe contrast effects. The grounds for this prediction come from previous evidence that when people interpret the meaning of numeric risk information, they often compare that information to other salient risk information, thereby causing contrast effects [10–12,21]. For example, Windschitl et al. [11] reported that even when participants in two groups were both informed that women had a 12% risk for a target disease, they held different intuitive beliefs about how vulnerable women were, as a consequence of one group being told that men’s risk of the same disease was 4%, and the other being told that it was 20%.

There are several theoretical perspectives that explain contrast effects in DFD formats. One group of accounts posits that a context stimulus or value can be used as a comparison standard and can shift the evaluation or categorization of the target. When the focal stimulus or value is distinct and sufficiently different from a contextually salient stimulus or value, the context serves as a comparison standard (causing contrast) rather than as informational context that could have an assimilative influence [22,23]. A closely related notion is that the contextual stimulus changes the categorization of a focal stimulus. In the present study, the focal hazard might be categorized as the “more” or “less” dangerous one as a function of the explosion rate of the contextual hazard. This categorization itself, which might have gist-like properties, may impact the response to the focal hazard, even when one is aware of the more precise and verbatim rate at which the focal hazard causes harm [24–27].

A selective accessibility account proposed by Mussweiler [28] posits that the directional influence of context information is shaped by the initial holistic assessment of focal-context similarity or dissimilarity. This holistic assessment influences individuals to generate different hypotheses that guide further information search and interpretation. Assuming that an explicitly stated hazard-explosion rate of 33% is immediately viewed as different from a rate of 20% or 46.7%, this immediate assessment might shape expectations and what is noticed about the focal hazard that was said to have the 33% rate.

Range-frequency theory [29] assumes that evaluations of virtually any stimuli, even explicit cardinal quantities, are shaped by contextual stimuli. This theory highlights the influence of both the range and the frequency distribution of contextual stimuli. Applied to the present study, one could argue that the subjective riskiness of a focal hazard that explodes at 33% is evaluated differently according to how the context rates shape where 33% falls in the range of possible rates, and where 33% falls within the rank order of possible rates.

Lastly, a theory called decision by sampling [30] shares some similarities with range-frequency theory in how it accounts for contrast effects. According to decision-by-sampling, evaluating a particular attribute of a focal stimulus involves pairwise comparisons within the sample distribution of that attribute. That is, the focal attribute is compared in terms of its relative rank among other attributes that are mainly sampled from the memories of past encounters or external contexts that are mentally available to the decision-maker. Following this logic, the model predicts a contrast effect because the presence of a salient context rate would influence the result tallies of the pairwise comparisons used to drive decisions about the focal hazard.

In summary, there are a number of theoretical perspectives that attempt to explain and predict contrast effects in people’s evaluations and responses to a focal hazard when risks are communicated explicitly in a numerical format (i.e., in DFD). With this in mind, a key goal of the current study was to compare how context would influence decisions in DFE vs. DFD. To evaluate whether the specific theoretical perspectives mentioned above predict differences in context effects across DFE vs. DFD, we must first appreciate the following fundamental difference between DFE vs. DFD: unlike DFD, learning about the risk rates in DFE is an incremental process. It requires additional cognitive operations, such as encoding outcomes of experienced events and updating previously held beliefs about a given hazard’s riskiness. As such, when a person is early in the process of learning risk rates in DFE, there would be no means by which the person could confidently judge the relative risk rates from two hazards. Only after numerous trials (or iterative learning opportunities) could they make a confident judgment about how often each hazard causes harm, whether the threat levels are approximately the same or different, and, if different, which hazard holds the greater threat.

Given the incremental learning involved in the DFE format, some theoretical perspectives would predict reduced or absent contrast effects in DFE. Specifically, some perspectives would suggest that if differences between focal and context stimuli cannot be initially detected, robust contrast effects will not be observed. For instance, if we apply a selective accessibility account [28] in DFD, the two hazards are known to be different in their explosion rates—triggering further “difference” expectations and biased processing among most or all participants. However, in DFE formats, there is no immediate sense of the overall difference or similarity. Given noise in the early experienced trials, many participants might initially think of the two hazards as exploding at similar rates, or they might even have an inverted impression of the direction of difference in the explosion rates. Furthermore, some perspectives would lead to the assumption that if there is ambiguity in the evaluation of focal and context rates—as there would be in DFE formats—this might lower the chances that a context rate would be used as an evaluation standard, or that it might not trigger the two rates to be categorized differently [31–33]. This would reduce any contrast effect, if not promote assimilation.

There is another potential source of bias that we have not yet mentioned that is relevant to only DFE. It offers another reason why context effects might have different strength or direction in DFE relative to DFD. Namely, in DFE, error prone memory may lead to failure in distinguishing or inhibiting the irrelevant and independent context events when assessing the riskiness of focal events. This may ultimately bias the perceived rate of the focal risk [34,35]. One may incorrectly recall the explosion of the context hazard as an event related to the focal hazard or vice versa, blurring the distinction between explosion rates related to two different types of hazards. This blurring would essentially yield a pattern of results that fits the assimilation direction. For example, when a context hazard has a higher explosion rate compared to the focal hazard, the focal hazard may also be perceived to have a similarly high explosion rate—higher than it otherwise would. Again, this source confusion only happens in DFE, and it therefore provides a potential reason why context effects might have a different strength or direction in DFE relative to DFD.

Not all of the accounts described assume different results for DFE and DFD. Specifically, both decision-by-sampling and range-frequency theory suggest that information acquired through experience will generate an internal distribution of the attribute (i.e., the explosion rate), facilitating ordinal comparisons between the focal and context hazard, and thus a contrast effect. Specifically, decision-by-sampling suggests that the judgments about the focal attribute are made by comparing the relative rank of the value of the focal attribute to an attribute value selected from the context distribution. Similarly, range-frequency theory emphasizes that the focal and the context attributes are compared in terms of frequency and range of the possible values. Both accounts acknowledge that the distribution of context attributes can emerge not only from descriptive information about the relevant event, but also from a series of encounters with the event. Following this logic, these models predict contrast effects regardless of the learning format.

Recent findings using neurophysiological approaches are also consistent with an expectation of contrasts effect in DFE [36–38]. That work suggests that the value of an option is encoded relatively rather than absolutely. For instance, in Palminteri et al. [36], participants were shown choice pairs that either held overall positive or negative expected values. When participants succeeded in avoiding a loss in a choice set with an overall negative expected value, such behavior was reinforced similar to when receiving a reward in a choice set with overall positive expected values. This relativity in valuation is reflected at the neural level, whereby encoding of a negative outcome could engage brain areas involved in either reward vs. punishment processing, depending on the context.

In summary, we predicted a robust contrast effect in DFD. However, the existing literature on DFE is equivocal. Categorization accounts and the selective accessibility account predict that there will be an attenuated contrast effect, or even an assimilation effect in DFE. Furthermore, the source confusion account specifies mechanisms that might be in play in DFE that would bias responses in an assimilative, rather than contrastive, direction. Meanwhile, decision-by-sampling and range-frequency theory, as well as recent findings from literature using the neurophysiological approach, point to contrast effects in DFE as in DFD.

Overview of the studies

To investigate the effect of learning format on the direction and magnitude of the context effect in risky decision making, we conducted two studies. The studies used the DART paradigm, which instantiates a virtual environment that allows people to learn about probability information (i.e., hazard risk rates) and outcome information (i.e., coin related points) and make decisions based on the acquired information. In both studies, we examined how decisions about the focal hazards were influenced by manipulations of context-risk rates and learning formats. For the learning-format manipulation in Study 1, some participants had to learn risk rates only incrementally from iterative experiences (i.e., experience condition), whereas other participants were given a summary description of the risk rates after also having iterative experiences (i.e., description condition). In Study 2, which was largely a replication of Study 1, this learning-format manipulation was made a bit purer—with iterative experiences being completely removed from the learning phase within the description condition. Studies 1 and 2 also differed somewhat in how likelihood information was solicited. Finally, in both studies, the values of coins that could be gained by making a risky decision were varied per trial. While the impact of coins is not a main focus of this paper (but is briefly reported), the variation in coin values kept the task interesting and challenging to participants, because it varied the expected values for risky vs. safe options in the task.

Study 1

Methods

Participants and design.

The participants (N = 303) were students from introductory psychology courses at the University of Iowa. Participants provided verbal informed consents after reading the consent form approved by the University of Iowa Institutional Review Board and they were recorded via data entry. The written consent was waived by the board. The design was a 2 (Learning Format: Experience vs. Description) x 2 (Context Risk Rate: 20% vs. 46.7%) x 3 (Coin Amount: 70, 85, or 100) mixed factorial. The first two factors were between-participant manipulations.

The target sample size was set at 300. Although the target size was set a priori, it was not based on a formal power analysis. Instead, it was a rough estimate of what would provide reasonable power, shaped by our experience with other studies using the DART paradigm. The final sample size of 303 allowed for 80% power to detect a medium sized contrast effect (d = 0.45) within a given learning format condition or to detect a small-to-medium sized interaction (f = 0.16) between learning format and context risk rate [39].

Procedure and task.

Participants were tested at individual computers. After a consent process, participants were introduced to DART. The introduction included the following information.

Participants would be playing a virtual game in which their goal was to earn as many points as possible before reaching the end of a journey. If they earned more points than the average player, they would win a candy bar.
Participants would control an avatar representing themselves that traveled along a path, and there were opportunities to collect coins that were worth varying amounts of points.
In a given trial, the avatar would encounter a crucial point where the path split in two, and in the middle of the two pathways there would be an “abandoned device” called a gurg. A gurg would either crumble or explode when the avatar traveled past it.
Participants had to decide which side of the split the avatar should travel on. They were told that one side of the split would be lined with a wall that protected an avatar from getting caught in a possible gurg explosion (i.e., safe pathway) while the other side would have no protective wall, but it would have a coin worth a specified amount of points (i.e., risky pathway). More specifically, participants were instructed that if they choose to travel along an unprotected pathway, they would have the opportunity to collect coin that is worth either 70, 85, or 100 points. However, they would leave themselves open to damage from a potential explosion. They were also told that if they are damaged by an exploding gurg, they would lose 250 points.
Participants would have to make a decision by the time their avatar reached the split (5 seconds), otherwise they would lose 20 points repeatedly until the choice was made.
There were two types of gurgs that could be encountered, which had different tendencies regarding how often they would crumble or explode. The overall layout of a given trial is provided in Fig 1A.

Download:

Fig 1. Decisions about Risk Task (DART).

(A) Illustration of a trial in the DART used in Study 1 and 2. (B) Illustration of between-participant context risk rate manipulation. Focal hazards had identical risk rate across groups while context hazards differed across groups.

https://doi.org/10.1371/journal.pone.0245969.g001

At this point, the instructions provided more information on how the participants would learn about the tendencies of the two gurgs. In the experience group, participants read: “In preparation for the journey, you will now watch several trials in which these devices crumble or explode. By watching what a particular type of device tends to do (i.e., crumble or explode), you learn useful information about what that type of device will do on the journey.” They then passively viewed 50 automated demonstration trials played back-to-back. In a given demonstration trial, participants would see the avatar passing by a gurg that either exploded or harmlessly crumbled. Overall, the rates of exploding vs. crumbling in the demonstration trials were proportional to the actual explosion rates of the hazards in the main trials. Both types of gurgs were presented an equal number of times across 50 trials.

Participants in the description group were also introduced to the same 50 demonstration trials. However, upon the completion of the demonstration trials, they were given additional information—namely explicit, numeric information about the explosion rate of each hazard (e.g., “The explosion rate for the Orange gurg is 20%. In other words, when the Orange gurg is shown, it will explode 20% of the time.”).

Crucially, the explosion rate for one of the two gurgs was manipulated between participants. For half the participants (Group 1 in Fig 1B), the rates for the two gurgs were 33% and 20%. For the other participants (Group 2 in Fig 1B), the rates were 33% and 46.7%. We call the gurg that exploded at a 33% rate the focal gurg. We call the other gurg the context gurg. Note that in the perspective of participants, the two gurgs would appear to differ only in explosion rates, and that the distinction between the focal and context gurgs would not be known to them.

After learning about the explosion rates of the gurgs, and before the start of the main test phase of the DART, participants were informed about mini-games that would be included between test trials (one mini-game per interlude). The mini-games were very simple, brief, and conceptually unrelated to the purpose of the DART and to the test trials described below (e.g., using the computer mouse to guide one’s avatar on the chase of a floating coin). They were inserted to simply break up what otherwise may have been a monotonous series of trials.

The test phase contained 60 trials (Fig 1A). Each trial proceeded in a manner consistent with the introductory instructions seen by the participants. The avatar approached the split at a fixed speed. Participants could see where the coin and protective wall were located, and what coin amount was offered in that trial. To select their path at the split, participants would click on one of the arrows that pointed toward the left and right pathways. If the response time exceeded 5 seconds, the avatar would bounce against the split and a “-20” would drift out from that area to signify the loss of 20 points. Once participants indicated their decision, the feedback was provided in a form of animation displaying the gurg either exploding or crumbling, with the points lost or earned shown on the side, respectively. In case of a gurg explosion, participants who chose the risky pathway kept the points from the coin but also suffered loss of 250 points. The total points that participants had accrued throughout the task were shown at the corner of the screen.

Total of 60 trials were divided into two blocks with 30 trials each. On a given trial in the test phase, participants would encounter one of the two gurgs and a coin worth one of three amounts (70, 85, or 100 points). The coin values were independent of explosion rate. Consequently, a given participant experienced repetitions of six unique trial types. Table 1 provides the summary of the six trial types and their expected value. The six trial types were repeated ten times respectively. These repetitions were block randomized such that within the first block, all six trial types happened five times. On all trials, the coin appeared opposite the protective wall, and the relative positions of the safe and risky paths were counterbalanced across trials.

Download:

Table 1. Summary of expected values (EV) offered in risky and safe choice in Study 1 and 2.

https://doi.org/10.1371/journal.pone.0245969.t001

Additionally, for twelve random trials within the second block, participants were asked to give a likelihood judgment upon the presentation of the trial. In those trials, participants had to indicate how likely it was that the gurg would explode. They responded with a slider scale marked every tenth percent that ranged from ‘0%, Definitely will not explode’ to ‘100%, Definitely will explode’. After they submitted the response, the trial resumed as normal, and participants submitted their choice between the risky or safe pathway.

After completing the DART, participants were asked to answer demographic questions and exploratory measures before being debriefed. Those who earned points above the average received a candy bar of their choice.

Results

Preliminary notes.

We analyzed the data from the trials involving the context gurg separately from the trials involving the focal gurg. Again, the explosion rate of the focal gurg was the same across all participants (33%), whereas the rate for the context gurg was 20% for half of the participants but 46.7% for the other half. Our primary interest was in decisions and likelihood judgments about the focal gurg. However, for each outcome measure, we start by describing the results for the context gurg. Descriptive statistics are provided in upper panel of Table 2.

Download:

Table 2. Summary of descriptive statistics for Studies 1 and 2.

https://doi.org/10.1371/journal.pone.0245969.t002

Preliminary analyses revealed that the counterbalancing of gurg color across focal vs. context assignments and the position of the protective wall did not substantially change interpretation of any key results, so we have collapsed across the counterbalancing for all subsequent analyses. In the ANOVAs reported below, coin amount always had main effects in a sensible direction. That is, people were more likely to choose the risky path when the trial involved a high-value vs. low-value coin. Aside from a few minor exceptions, coin amount did not interact with other key factors. Given that our main hypotheses did not differ as a function of the coin amount, we will only briefly report on the effects of coin below.

For each set of ANOVAs described in the sections below, full statistical reporting can be found in the supporting information (S1-S4 Tables in S1 Appendix). Given our main dependent variables were proportions (per cell), we also checked if arcsine square root transformations of the proportion data would substantially influence any of our main results from the ANOVAs [40]. They did not. Therefore, we report the ANOVAs on the untransformed data. A logistic regression approach to analyzing the data yields the same conclusions (see S5 Table in S1 Appendix for the logistic regressions).

Decisions about context gurgs.

For each combination of block and coin, we calculated the percentage of times (out of five) a participant chose the risky path for trials involving a context gurg. We submitted these values to a 2 (Block) x 3 (Coin) x 2 (Context Risk Rate) x 2 (Learning Format) ANOVA, with the first two factors as repeated measures (Fig 2A). This analysis may be considered as a manipulation check to make sure participants were sensitive to the relevant outcome and probability information on a given trial. The results revealed that participants were indeed sensitive to the information. A main effect of context risk rate showed that participants encountering a context gurg were more likely to take a risk when the explosion rate for the context gurg was 20% as opposed to 46.7%, F(1, 299) = 42.59, p < .001, η_p² = .13 [41]. Also, the main effect of coin revealed that participants were more likely to take risks as the amount of coin increased, F(1.81, 298) = 83.37, p < .001, η_p² = .22 (Fig 3A). These effects of explosion rate and coin were not significantly qualified by the format (experience or description), nor block. No other notable effects were significant (see S1 Table in S1 Appendix for full statistics).

Download:

Fig 2. Summary of Study 1 results.

C and F indicate risk rates of context and focal hazard, respectively. Note that focal risk rates (i.e., F = 33%) are identical across all context risk rate conditions. (A) Mean proportion of risky choices in context-hazard trials, (B) Mean proportion of risky choices in focal-hazard trials, (C) Mean likelihood judgment for context hazards, (D) Mean likelihood judgment for focal hazards. Black bars indicate the experience condition and white bars indicate the description condition. The numbers above the bars indicates the mean for each condition. Error bar indicates ± 1 S.E.

https://doi.org/10.1371/journal.pone.0245969.g002

Download:

Fig 3. Summary of Study 1 results regarding coin manipulation.

C and F indicate risk rate of context and focal hazard, respectively. Note that focal risk rates (i.e., F = 33%) are identical across all context risk rate conditions. (A) Mean proportion of risky choices in context-hazard trials across varying amount of coins, (B) Mean proportion of risky choices in focal-hazard trials across varying amount of coins. Error bar indicates ± 1 S.E.

https://doi.org/10.1371/journal.pone.0245969.g003

Decisions about the focal gurgs.

Our primary interest was in how the context risk rate would impact decisions made about the focal gurg—which always had the same explosion rate (33%)—and whether this impact would vary as a function of learning format. For each combination of block and coin, we calculated the percentage of times (out of five) a participant chose the risky path. We submitted these values to a 2 (Block) x 3 (Coin) x 2 (Context Risk Rate) x 2 (Learning Format) ANOVA (Fig 2B). As expected, coin amount again significantly affected risky choices; people made more risky choices when the larger coin amounts were offered, F(1.89, 298) = 94.02, p < .001, adj = .24 (Fig 3B). More importantly, the main effect for context risk rate was also significant, F(1, 299) = 19.25, p < .001, adj = .06. Participants were more likely to take risks in focal gurg trials when the context gurg exploded at a 46.7% rate than at a 20% rate. This pattern confirms a contrast effect; participants’ risk taking tendency in focal gurg trials (which maintained an identical explosion rate across the groups) was influenced by the explosion rate of the context hazard. Risk tasking on these focal trials was higher when the context risk rate was high rather than low. However, this pattern did not significantly differ as a function of learning format, F(1, 299) = .28, p = .598. The contrast effect was about the same among participants in the description group as it was among participants who only learned about the likelihood information incrementally from experiences (i.e., experience group). Given the potential importance of this null effect, we computed the Bayes factor for it (BF₀₁ = 3.61). BF₀₁ values at 1 are considered as no evidence, whereas BF₀₁ values of 1 to 3 are considered as showing anecdotal evidence for the null hypothesis, 3 to 10 as substantial, 10 to 30 as strong, 30 to 100 as very strong, and values greater than 100 as decisive evidence against the alternative [42]. Therefore, the BF₀₁ of 3.61 implies substantial evidence for the null.

The contrast effect did not significantly interact with the coin amount, but there was a small, significant interaction with the block, F(1, 299) = 5.45, p = .020, adj = .02. Namely, we found significant contrast effects both in the first and the last block, and the magnitude of contrast effect was slightly larger in the first blocks than in the last.

Likelihood judgments about the context gurgs.

For each level of coin amount, we calculated the average likelihood judgment from the trials involving a context gurg. We submitted these rates to a 3 (Coin) x 2 (Context Risk Rate) x 2 (Learning Format) ANOVA, with the coin factor as a repeated measure (Fig 2C). The main effect of coin was not significant, F(2, 298) = .84, p = .434. This result was not surprising considering that the likelihood judgment trials solicited the assessment of the explosion rate of the gurg independently from the outcome information such as the coin amount. The significant main effect of the explosion rate indicated a sensible pattern where participants assessed the context gurg with 46.7% explosion rate to be more likely to explode than the one with the 20% explosion rate, F(1, 299) = 64.88, p < .001, adj = .18. Lastly, we also found a significant main effect of learning format, F(1, 299) = 39.68, p < .001, adj =. 12. Specifically, overall likelihood judgments from the description group were lower than that of the experience group.

Likelihood judgments about the focal gurgs.

As with likelihood judgments for the context gurg, we submitted likelihood judgments for the focal gurg to a 3 (Coin) x 2 (Context Risk Rate) x 2 (Learning Format) ANOVA, with the coin factor as a repeated measure (Fig 2D). The main effects for coin, F(2, 298) = .99, p = .371, and the explosion rate, F(1, 299) = 1.91, p = .167, were not significant. We found significant main effect in the learning format, F(1, 299) = 32.31, p < .001, adj = .10. Again, participants in the description group had lower likelihood judgments compared to the experience group.

Discussion

The main goal of Study 1 was to investigate the direction and the magnitude of influence that a context hazard exerts on the decisions about a focal hazard, and whether this influence is affected by the format by which people learn risk information. We found a significant effect of the context risk rate. Specifically, participants’ decisions regarding the focal gurg differed as a function of the context gurg. The pattern of responses indicated a contrast effect; participants made choices as if the riskiness of the focal gurg was in contrast to that of the context gurg. Participants in the condition where the context gurg’s explosion rate was at 20% made fewer risky decisions on focal-gurg trials than did participants in the condition where the context gurg’s explosion rate was at 46.7%. This contrast effect was not moderated as a function of the learning format—i.e., whether people learned about gurgs’ explosion rates only by observing the demonstration trials vs. by also receiving explicit, numeric rate information. Said differently, there was no description-experience gap. These results broadly favor the perspectives that predict contrast effects in both DFD and DFE (i.e., decision-by-sampling model and range-frequency theory) over the accounts that predict attenuated/null contrast effects in DFE (i.e., categorization accounts or selective accessibility account).

Alternatively, the lack of description-experience gap could be explained by a methodological issue in the current design. Namely, in the “description” condition, participants were given descriptive information about the hazards only after they had iterative experiences in the demonstration trials. A more complete name for this condition might be the “experience and description condition.” This methodological feature perhaps weakened the operationalized difference between our DFD and DFE manipulation. It is still instructive that the numerical information presented in current description condition did not seem to alter the observed contrast effect. However, these results leave open the question of whether the description-experience gap would be observed if the descriptive information in the description condition was presented immediately, rather than after experiential learning had started. This issue was addressed in Study 2 by refining the manipulation of learning format.

An unsurprising feature of the results from Study 1 was that coin amounts influenced choices. Participants made more risky choices when the coin amount on the risky path was larger rather than smaller. This indicates that the participants were, in general, capable of discerning advantageous vs. disadvantageous options based on the expected values. At the same time, these distinctions made between the advantageous vs. disadvantageous options were far from perfect; we did not observe fully-consistent risk-seeking behavior for advantageous options nor fully-consistent risk-averse behavior for disadvantageous options. This is understandable given the dynamic nature of the DART paradigm which does not easily allow participants to explicitly calculate the expected values offered in a given trial.

An additional issue is whether participants’ likelihood judgments showed patterns that matched those on the behavioral choice measures. They did not. First, there was a significant main effect of the learning format for both the context- and focal-gurg trials. That is, participants in the experience condition judged the gurgs more likely to explode overall, compared to those in the description condition. We speculate that this merely reflects the consequences of numeric anchoring. Participants in the description condition were given an explicit numeric risk rate, on which their later probability estimates tended to be anchored. There was no such anchor for participants in the experience condition. Without that anchor, and with uncertainty about the actual risk levels, they were more likely to use mid-level probabilities, which were overestimates.

Second, we did not find context effects in likelihood judgments as we did in risky choices. That is, participants’ likelihood judgments about the focal gurg did not differ as a function of the context gurg’s explosion rate. The possible reasons for such inconsistency could be related to when and how we solicited the likelihood judgments. The likelihood judgment trials were presented only in the second half of the test phase. As reported, the context effect on behavior measures was larger in the first block of trials than in the last block. Therefore, it is possible that participants’ impressions of likelihood were initially affected by context information, but this effect faded by the time we solicited them. Furthermore, we solicited likelihood judgments on numeric scales, which may have encouraged a response strategy that precludes sensitivity to the contextual effects [11,43–45]. For instance, when providing a numeric likelihood judgment for the focal hazard, participants may have deliberatively dwelled on proportion information they had seen or experienced. This is a reasonable strategy for giving the most accurate numeric estimates. But because of this strategy, the numeric scale might have missed the possibility that the context rate shifted participants’ more intuitive, vaguely formed expectations about what the focal gurg would do on the current trial [43,44]. One of the changes introduced in Study 2 is relevant to this issue.

Study 2

In Study 2, we aimed to replicate the findings from Study 1, with the additional goal of addressing possible concerns about the study design, which we describe below. One of the main issues from Study 1 was that the operationalization of the description-vs.-experience factor may have been insufficient to create the strong experimental difference required for a description-experience gap. Participants in the description condition of that study learned about the hazards not solely based on the description of their explosion rate, but also through the 30 demonstration trials. In Study 2, we strengthened the manipulation by removing the demonstration trials from the learning phase for participants in the description condition. Their learning phase consisted of reading about the summarized risk rates.

A second change was in the time and manner in which we measured likelihood judgments. Recall that we suspected that using the numerical likelihood judgment scale in the latter half of the trials may have led to the lack of contrast effects on likelihood judgments in Study 1. In Study 2, we counterbalanced the order in which likelihood judgments were made—early or late. We also used a verbal likelihood scale instead of a numerical one, since verbal scales may be better at reflecting participants’ intuitive expectations about how hazards would behave [43,44].

Additionally, minor changes were made to the instructions, hazard stimuli, and point system. In Study 2, the hazards took the form of tower buildings instead of gurgs. This change was devised to minimize a potential misinterpretation where participants perceive the gurgs as animate objects that blow up intentionally to hurt them. By using towers instead of gurgs, we could convey to participants that the hazards are inanimate, and the harms afflicted by the hazards are unintentional. There were also changes to the coin amounts and the penalty calculations (see Table 1 Study 2). The instructions were modified to incorporate such changes in the task. However, we did not expect these modifications to be consequential to the overall setup of the DART task nor to the main results.