From scientific discoveries, developing technologies, and producing art, to resolving everyday problems, some of the most impactful human capabilities require problem solving and creative thinking. It is generally believed that people can solve problems in at least two broadly different ways: either through straightforward, methodical analytic processing or by sudden insight, in which the solution is reached in a sudden breakthrough, occurring by means of a reorganization of the mental representation of the problem (Sternberg & Davidson, 1995). When people solve problems analytically (or sometimes by trial and error), solvers are aware of the steps and processes and of their closeness to solution. In contrast, when people solve by sudden insight, they were previously unaware they were approaching solution (Metcalfe & Wiebe, 1987), are surprised when they achieve it, and are also typically unaware of how they reorganized the problem structure to do so; yet they are immediately confident that the newfound solution fits the whole problem. Most problem-solving researchers maintain that solving by insight involves at least some differences in cognitive processing, as compared to analytical problem solving, (e.g., Schooler & Melcher, 1995; Schooler, Ohlsson, & Brooks, 1993; Sternberg & Davidson, 1995). However, others have noted that at least some facets of solving classic insight problems (i.e., problems more likely to require cognitive restructuring or sudden insight) can be attributed to slight modifications of the same processes involved in analytic solving (e.g., Perkins, 1998; Seifert, Meyer, Davidson, Patalano, & Yaniv, 1995; Weisberg, 1986; Weisberg & Alba, 1981). The two perspectives are sometimes called the “special-process” (Knoblich, Ohlsson, Haider & Rhenius, 1999; Knoblich, Ohlsson, & Raney, 2001; Sternberg & Davidson, 1995) and “business-as-usual” (Chronicle, MacGregor, & Ormerod, 2001; MacGregor, Ormerod, & Chronicle, 2001) views.

In general, solving problems by sudden insight is thought to be an important form of creative cognition. Gestaltists were among the first to emphasize the study of insight problem solving. Until the last decade, research into insight had often selectively used classic insight problems, many originally introduced by researchers from the Gestalt school (e.g., the nine-dot problem; the two-string problem, Maier, 1930, 1931; and the radiation problem, Duncker, 1945). On the one hand, classic problems are appealing because they are thought by some to tap purely into insight processing. Thus, these problems often are used as if solving them absolutely requires insight. On the other hand, such research has been criticized on this ground, since the strategy used to solve such problems has been defined a priori (Bowden, Jung-Beeman, Fleck, & Kounios, 2005). As Metcalfe (1986) has shown, the classic insight problems have also been reported as being solved via analysis (or an increasing feeling of “warmth,” or being closer to the solution). Again, any teacher who has presented these problems to a class or audience has likely encountered people who solved these classic insight problems but who claimed to do so in a straightforward analytic manner. Although the use of classic insight problems has contributed greatly to the understanding of problem solving (e.g., Duncker, 1945; Maier, 1930, 1931; Wertheimer, 1959), they have important limits: The problems are elaborate; require time and space to present; yield few and/or slow solutions, such that few can be presented in each experiment; participants often need hints to reach the solution; and the problems do not have a homogeneous corpus. Thus, these problems have low solving reliability (Bowden et al., 2005) and are difficult to use with techniques that require a large number of trials, such as priming and neuroimaging.

An alternative to the use of classic insight problems is to use shorter, somewhat simpler problems. For instance, in the 1960s the remote associates test (RAT; Mednick, 1962) was used to study creative thinking generally. Problems included in the RAT contain three words (e.g., tennis head same), and solvers must generate a solution word which “could serve as a specific kind of associative connective link between these disparate words” (Mednick, 1962, p. 227; e.g., MATCH, because tennis match and match head are compounds, and same and match can be synonyms). Since the solving rates on the RAT correlated somewhat with vocabulary and general intelligence, the RAT fell out of favor as a pure test of creative ability. However, since the 1980s, RAT problems have been used to study insight (Bowers, Regehr, Balthazard, & Parker, 1990), incubation effects (Cai, Mednick, Harrison, Kanady, & Mednick, 2009), and the persistence of fixation or blocking in problem solving (Smith & Blankenship, 1989, 1991). RAT and similar problems have been used as a creativity measure to examine associations with attention (e.g., Ansburg & Hill, 2003; Ash & Wiley, 2006) and mood, with similar effects occurring for RAT and classic insight problems (e.g., Isen, Daubman, & Nowicki, 1987).

Another problem type used in a similar fashion is rebus puzzles, which present a word or words in an informative pictorial fashion, from which people are supposed to derive a common expression (e.g., M CE / M CE / M CE is solved as “three blind mice,” because the three MICE have no Is; or SDRAW, solved as “backwards”). These problems are relatively easy to present and have well-constrained solutions but ill-constrained strategies for solving. That is, there are numerous analytic strategies, so many that the chances are slim that one of the first few used will lead to an easy solution for any given problem.

Other types of problems have also been used. Anagrams can be useful in numerous paradigms, including neuroimaging (Aziz-Zadeh, Kaplan, & Iacoboni, 2009), although individuals may develop analytic strategies that can be successful for a relatively high percentage of attempts. Others have used short riddles (Luo & Niki, 2003). One nice stimulus set contains matchstick algebra problems, in which the problem solver has to correct an incorrect arithmetic equation expressed in Roman numerals constructed out of matchsticks, by moving a single matchstick from one position in the equation to another (e.g., IV = III + III is solved by moving the vertical stick on the left of the V to the right, creating the correct statement VI = III + III). The level of difficulty and amount of restructuring (or overcoming impasse) is fairly well defined for each problem (Knoblich et al., 1999), but a somewhat limited number can be used with each participant, because people can quickly learn each “trick” needed to solve the different types of problems.

Relative to the classic problems, the new ones have both strengths and weaknesses. The newer, shorter problems allow researchers to apply a variety of research techniques that require many trials or many instances of solving, such as visual-hemifield presentation, priming, neuroimaging, and electroencephalography (EEG; e.g., Bowden & Beeman, 1998; Jung-Beeman et al., 2004; Kounios et al., 2006; Subramaniam, Kounios, Parrish, & Jung-Beeman, 2009), which is useful for investigating the neural network of insight. Even though they may not require insight as “purely” as some of the classic problems, as a trade-off, the new class of problems adopted by this approach has the following advantages (Bowden et al., 2005):

  1. 1.

    They are easier than classic insight problems, so that many of them may be correctly solved in the same session, garnering more data.

  2. 2.

    A short amount of time can be enough to solve them. For most sets, participants solve about half within a time limit of 30 s per problem, and a third within 15 s per problem.

  3. 3.

    They can be easily presented in a small visual space or on a computer screen.

  4. 4.

    Solutions can be reported with a single word (or short phrase), which can be used in precise response time paradigms (such as solution priming).

  5. 5.

    Different sets can be classified according to the skills required to solve them (e.g., linguistic, visuospatial, or both), to better isolate the cognitive functions involved.

Unfortunately, only a few of this more recent class of insight problems can be used in languages other than English (e.g., “The matchstick arithmetic” of Knoblich et al., 1999). Since many of these (RAT, Mednick, 1962; compound remote associate problems, Bowden & Jung-Beeman, 2003b; rebus puzzles, MacGregor & Cunningham, 2008) involve linguistic processes, they are contextualized by the knowledge of language and semantics inherent to, for instance, compound words (i.e., “pineapple,” pine + apple) or conventionalized cultural idioms (e.g., “it is raining cats and dogs”). Consequently, the development of insight tests in languages other than American English cannot be performed solely through literal translation. Instead, it is necessary to develop an entirely new set of problems that are grounded in the specific characteristics of the chosen language’s vocabulary and its colloquial conventions.

In order to increase the corpus of problems usable in languages other than English, here we developed Italian versions of the compound remote associate (CRA) problems of Bowden and Jung-Beeman (2003b) and the rebus puzzles of MacGregor and Cunningham (2008). Both types of problems present the benefits described above. In addition, these two specific tasks offer a substantial number of problems that are sortable by multiple levels of difficulty, are easy to explain, and do not require domain-specific knowledge to be solved. Moreover, both of these tests have been demonstrated to be solvable with both insight and noninsight strategies (Bowden & Beeman, 1998; MacGregor & Cunningham, 2008), which we will discuss after describing the problems in more detail.

CRA problems

The CRA problems (Bowden & Jung-Beeman, 2003b) were inspired by the RAT, developed by Mednick (1962). The RAT was created to study creativity without requiring domain-specific knowledge. It consists of two sets of 30 items (Mednick, 1968; Mednick & Mednick, 1967). Each item is composed of three words that can be associated with a fourth word—specifically, by creating a compound word, by semantic association, or because the words are synonyms. Considering the example we reported before: The three words same, tennis, and head can be associated with the word match by creating the compound word match head, by semantic association (i.e., tennis match), and because of synonymy (i.e., same = match). Success in solving RAT problems correlates with success in solving classic insight problems (Dallob & Dominowski, 1993; Schooler & Melcher, 1995).

In the CRA problems, Bowden and Jung-Beeman (2003b) created a larger set of problems using a more rule-consistent task: The solution word always forms a compound word or a common two-word phrase with each problem word (e.g., for the problem CRAB PINE SAUCE, the solution apple forms the compounds crabapple, pineapple, and apple sauce). Because they are more rule-consistent, the CRA problems may be more amenable to analytic solving than are the RAT problems, yet participants generally report achieving just over half of their solutions with insight, and less than half with analytic processing. Both tests have been consistently used in the study of problem solving, cognitive flexibility, and creative thinking (e.g., Ansburg & Hill, 2003; Beeman & Bowden, 2000; Bowden & Beeman, 1998; Bowden & Jung-Beeman, 2003a; Dallob & Dominowski, 1993; Jung-Beeman et al., 2004; Schooler & Melcher, 1995). They have also been used in a variety of studies, including studies of attention (Wegbreit, Suzuki, Grabowecky, Kounios, & Beeman, 2012), psychopathology (e.g., Fodor, 1999), and affect (e.g., Mikulincer & Sheffi, 2000). Furthermore, the CRA problems have been used to identify the neural circuit specific to insight solutions (Bowden & Jung-Beeman, 2003a; Bowden et al., 2005; Jung-Beeman et al., 2004).

The RAT has already been successfully translated into Japanese, Jamaican, and Hebrew (Baba, 1982; Hamilton, 1982; Nevo & Levin, 1978) in order to allow the study of creativity and insight problem solving in other languages. Our study is the first attempt to create a large set of CRA problems for Italian speakers.

Rebus puzzle

Insight requires reorganizing how the problem elements are initially cognitively or perceptually interpreted (Sternberg & Davidson, 1995). The initial interpretation is created and reinforced by past experience and frequency (e.g., after learning how to write and read, we often interpret words or sentences according to a language grammar), and it might overly constrain solving efforts. What makes the rebus puzzles an interesting pool of problems for the study of insight is that solving them often requires people to overcome the learned grammar rules of word composition to reinterpret the meanings of words. In other words, the solver has to “restructure” the formal interpretation of reading by relaxing their ingrained constraints, in order to shift how the problem elements are cognitively or perceptually represented (MacGregor & Cunningham, 2008). For instance, one common way to solve rebus puzzles is to verbally interpret the visual–spatial relationships of the problem components (e.g., location, the spacing between the letters or words, color, font size, or style) and incorporate these into the solution. For example, in BROTHER (solution: “big brother”), the visual attribute of the font has to be interpreted verbally, which is not done in usual reading. The problem “/R/E/A/D/I/N/G/” is solved (“reading between the lines”) by decoding the relative positions of components spatially, rather than grammatically as in normal reading.

Indeed, as MacGregor and Cunningham (2008) demonstrated, the difficulty of each rebus is related to the number of principles used to encrypt a phrase or saying; thus, it depends on the number of implicit assumptions that have to be relaxed to solve a rebus. Therefore, solving rebus puzzles may require relaxing one or more of the constraints necessary to process text in a standard fashion, and relaxing constraints is considered an important component of solving by insight (Ohlsson, 1992).

Self-reports of insight

Neither insight nor analysis exists within a problem; they are each a set of processes, or different ways of engaging processes, that lead to solution. Or, if one prefers: Insight is sometimes used to refer to the subjective experience that occurs as the hallmark of solving by insight—which begs the question of why that experience is felt when solving certain types of problems (Metcalfe & Wiebe, 1987) or is felt after particular brain processes are more involved in solving (Jung-Beeman et al., 2004). Either way, insight has to do with the process; it is just that certain types of problems (classic insight problems) are more likely to require those processes than are other types of problems (classic analytic problems).

Like other recently developed problems sets, CRA problems were initially used as insight-like problems: problems that often—but not always—were solved with insight (Bowden & Beeman, 1998). It was observed informally that people solving CRA problems could sometimes report a strongly analytic solving process (taking known steps, being aware of approaching the solution, and not feeling any surprise at achieving it), whereas the same people reported solving other problems by sudden insight.

Thus, soon after the initial use of CRA problems, a new approach was used (Bowden & Jung-Beeman, 2003b): Problem solvers were asked to self-report how they solved each problem—relatively more by analysis, or relatively more by insight (see Bowden & Jung-Beeman, 2007; Bowden et al., 2005). The criteria for that self-report particularly focus on participants’ subjective experience of each solution (Was it surprising? Did it come as a whole? Were you immediately confident that it was correct, despite being surprised? Did it seem like the solution was quite different from what you were thinking just prior to the solution?). Instead of categorizing the solutions (or solving processes) as being insight or analytic according to the problems, participants’ self-reports about solving were used as the discriminative criteria to define the solving processes. Several studies have shown that the insight and analytic strategies, as reported by the solvers, were associated with specific patterns of behavioral responses, different patterns across the visual hemifields (Bowden & Jung-Beeman, 2003b), different eye movements and attention allocation (Salvi, Bricolo, Franconeri, Kounios, & Beeman, 2015), and distinct neural activity—both at the moment of solving (Aziz-Zadeh et al., 2009; Jung-Beeman et al., 2004; Subramaniam et al., 2009) and in the brain states that preceded problems solved by one versus the other set of processes (Kounios et al., 2006). Indeed, even individual differences in resting-state EEG are associated with a proclivity to solve in one way versus the other (Kounios et al., 2008).

It could be argued that individuals are unable to report the processes that lead to solutions. However, experimenters generally do not ask people to state which specific cognitive processes were used. Rather, participants focus on their subjective experience (see above) of the solution when it emerges into consciousness, and their awareness of the ideas that preceded it. Ultimately, we believe that people are able to make use of various cues to effectively label how they solve problems. The overwhelming data from numerous experiments using multiple techniques are consistent with that belief (for a review, see Kounios & Beeman, 2009, 2014).

Therefore, besides presenting the new Italian CRA and rebus puzzles and their solving rates and latencies, we also present participants’ self-reports of how they solved each problem, via insight or analysis. The problems themselves should be very helpful for multiple paradigms and investigations of problem solving, regardless of any understanding of insight. For those who are interested in insight, the analytic of insight rates for each problem may also prove useful.

Overview of the studies

In order to develop the two pools of problems described above, three studies were performed. In Study 1, we selected a pool of common phrases that constituted the solutions of candidate rebus puzzle items. In Study 2, we developed a pool of problems based on the English CRA problems (Bowden & Jung-Beeman, 2003b) and a pool of rebus puzzles based on the original rebus puzzle (MacGregor & Cunningham, 2008) principles. We ran the first test for both the CRA problems and the rebus puzzles in order to define the final pools of problems (initial test of solvability and solutions). In Study 3, the final set of problems was administered in order to identify the solving characteristics, solving rates, and norms for each item.

Study 1

Rebus puzzles use a set of principles to encrypt a phrase or saying that is well known to the participants (MacGregor & Cunningham, 2008). A pool of 109 candidate rebus puzzles were patterned after the rebuses in MacGregor and Cunningham’s problems, by combining verbal and visual clues to produce a common phrase, such as T + U + T + T + O (tutto sommato “all summed up,” a common Italian phrase that could be translated “all things considered”). Different principles may be necessary to encrypt the meaning of a rebus; therefore, the difficulty of rebuses may depend on the number dimensions of restructuring and of the overlearned constraints that must be relaxed to decrypt the meaning (MacGregor & Cunningham, 2008). Therefore, two judges analyzed the encrypted meanings concealed in the relationship of the problem components of the rebus puzzles that we created. Disagreements were solved by a discussion between the judges and, if necessary, by recruiting a third judge. Some rebus puzzles are similar in the ways that they combine verbal and visual clues. On the basis of those similarities, we sorted them into 20 categories based on, for instance, trend (i.e., growing, decreasing, etc., as in LUNA, luna calante “waning moon”), counting (e.g., CICLO CICLO CICLO, for triciclo; cycle cycle cycle, “tricycle”), and interpreting colors as words (e.g., ESSERE [presented in green], essere al verde “to be at green,” a common Italian phase that means to have no money); for details, see Appendix A. This subdivision was used to keep similar items in different experimental blocks. The aim of Study 1 was to ascertain that the phrases and sayings that were encrypted (i.e., the solutions of the problems) were widely known, to prevent performance from being dependent on (the lack of) familiarity with the solutions.

Method

Participants

Twenty-nine undergraduate students at University of Milan-Bicocca (mean age = 23.6; SD = 2.9; 11 females) volunteered as independent raters. All of the participants were native, fluent Italian speakers.

Stimuli and procedure

The participants were asked to assess the commonness of the 109 common phrases that constituted the solutions of the candidate rebus puzzles. They had to specify how often they had heard each phrase, on a scale from 1 (never) to 5 (often).

Results

The interrater reliability (i.e., the agreement among the raters’ judgments of the popularity of the stimuli) was computed as an intraclass correlation (McGraw & Wong, 1996; Shrout & Fleiss, 1979) using the R package irr (Gamer, Lemon, Fellows, & Sing, 2012), and the result was satisfactory, ICC(2, 29) = .851, 95 % CI = [.801, .891]. The commonness of the sentences was averaged across raters in order to obtain a popularity score; only the sentences with a score above 3 were selected, and the candidate rebus puzzles whose solutions were not common enough were discarded (a total of ten). Three additional puzzles were discarded for various reasons;Footnote 1 therefore, the final set of problems included 96 items.

Study 2

We developed 150 Italian CRA problems that included three words each. The solution of each problem was a fourth word that could be associated with all three words of the triad through the formation of a compound word or phrase (e.g., scuola, tutto, and domani form the compounds doposcuola, dopotutto, and dopodomani, with the solution word dopo). Words were sometimes repeated across problems (e.g., libero was repeated four times), but the solution words were never repeated. Study 2 constituted an initial test of the tasks, to identify problems (both CRA and rebus puzzles) that could trigger more than one valid solution and problems that might never or always be solved.

Method

Participants

A total of 110 undergraduate Italian students from University of Milan-Bicocca (mean age = 21.2, SD = 4.8; 81 females, 29 males) participated in the experiment in exchange for course credits.

Procedure

We administered the 96 rebus puzzles from the phrases selected in Study 1, as well as the 150 Italian CRA problems. Both the rebus puzzles and the CRA problems were divided into three blocks. The rebus puzzle blocks included 32 problems balanced by levels of familiarity (measured in Study 1), and each CRA problem blocks included 50 problems randomly selected. Each participant was administered one block of rebus puzzles and one of CRA problems. The order of block presentation (CRA–rebus puzzles or rebus puzzles–CRA) and the order of presentation of the problems within each block were randomized. The problems were administered using the Inquisit software (2012; Millisecond Software, Seattle, WA). Three and four practice trials preceded the rebus puzzle and the CRA problem sessions, respectively.Footnote 2 Each trial began with a response prompt screen. Once participants were ready, they had to press the keyboard spacebar for each of the CRA problems or rebus puzzles to be presented individually on the screen.Footnote 3 Participants were given a time limit of 15 s to respond; when they ran out of time, a message invited them to proceed to the following problem. No feedback was given regarding whether the solution was accurate or inaccurate (see Fig. 1 for details). Subjects were instructed to press the space button on the keyboard as soon as they found a potential solution. Following the production of a solution, the item was erased, and participants had to typewrite the solution and declare how they had solved the problem: via insight or via analysis. Instructions regarding how to distinguish insight from analysis problem solving were given prior to the experiment.Footnote 4 We explained to participants that insight and analysis are two ends of a continuum, and that we were asking them whether their solution was more insight-like or more analytic-like. Only two participants, both in Study 2, asked for further clarification; in these instances, the instructions were elaborated until the participants understood. Participants were instructed that neither solving style was any better or worse than the other, and that there were no right or wrong answers in reporting insight or analysis. The experiment took approximately 25 min to be administered.

Fig. 1
figure 1

Procedure used for Studies 2 and 3. Note that participants first had to press the spacebar when they were ready for each problem to appear on the screen. They were given 15 s to solve each problem. If they found a solution, participants had to press the spacebar, type the solution word, and report how they had solved the problem, either via insight or via analysis

Results

For both the rebus puzzles and the CRA problems, we discarded problems for which more than one possible valid solution was found (nine CRA problems and seven rebus puzzles). Moreover, we discarded problems that were too easy (i.e., ten CRA problems were solved by all of the participants) or too difficult (i.e., nobody solved nine CRA problems and one rebus puzzle). The final pool of problems therefore included 88 valid rebus puzzles and 122 valid CRA problems that were extracted from the initial pool.

Study 3

We administered the final pool of problems to identify for each item: the average solution time, the percentage of participants who would solve the items, errors and timeout rates, and the percentages of solvers reporting a solution via insight and via analysis. We performed morphosyntactic analyses in order to rule out linguistic confounds (e.g., the order of prefixes and suffixes) and biases specifically related to Italian grammar rules (e.g., gender and number concordances). We investigated the relationship between the percentage of participants solving an item and the solution strategy preferred for that item: insight versus analysis.

Method

Participants

We collected valid data from 467 participants (mean age = 24.9, SD = 7.6, min = 15, max = 65; 348 females, 119 males). All of the participants completed a set of eight to 11 rebus puzzle problems, and 317 of the participants also completed a set of 40 or 41 CRA problems (mean age = 25.3, SD = 8.3, min = 16, max = 65; 269 females, 48 males). An additional 39 participants took part to the study, but their data were excluded because they either stopped the experiment early or declared that they did not complete the experiment seriously.

Procedure

The 122 CRA problems were split into three blocks (of 41, 41, and 40 items each). The 88 rebus puzzles were split into nine blocks (with eight to 11 items each), balanced for categories.Footnote 5 Participants attempted to solve only one block of each kind of problem. Thus, the three blocks of CRA problems were attempted by groups of 98, 103, and 116 participants. The nine blocks of rebus puzzles were administered to groups of 42, 43, 50, 52, 53, 54, 56, 56, and 61 participants. The order of presentation and pairings of the blocks were randomized. The experiment was run online using the Inquisit (2012) software. The instructions given and the procedure adopted were the same of for Study 2. Moreover, participants were asked to perform the test alone and to isolate themselves from any source of distraction or noise, and they were motivated to take the test seriously by doing their best on each problem. At the end of the test, a set of questions investigated whether the participants had solved the problems alone or not, whether they had any previous familiarity with these problems, and whether they gave random answers.Footnote 6 In total, the experiment took approximately 25 min.

Results and discussion

Exclusion of outliers

To guarantee that the 15-s time limit would be respected, and to prevent subjects from continuing to think about the answer after they had pressed the spacebar, we applied the Hampel identifier,Footnote 7 a robust method for outlier detection, as a criterion to identify the answers that took too long to be typewritten. The times required to typewrite the answers varied substantially among problems, since some required typing only a short word as the answer and others required longer sentences; therefore, we applied the Hampel criterion independently to each problem. Figure 2 represents the times required for typewriting the answers. Latencies longer that 60 s are not represented, so as to allow a better visualization: There were 60 such latencies for the CRA problems and eight for the rebus puzzles, and all were excluded. The longest latencies were 347 s for the CRA problems and 183 s for the rebus puzzles. The generally longer latencies for the rebus puzzles than for CRA problems in Fig. 2 are explained by the fact that whereas the CRA solutions were single words in most cases, the rebus puzzle solutions were often short sentences. The dark parts of the histograms represent latencies that were considered unusually long and that led to the exclusion of a trial from the analyses: 7.9 % of the CRA trials and the 7 % of the rebus puzzle trials were excluded.

Fig. 2
figure 2

Latencies in typing the answers to the compound remote associates (CRA) problems and the rebus puzzles (RP)

Descriptive results

For each CRA problem (Appendix B) and rebus puzzle (Appendix A), we report (a) the grouping category, (b) the number of participants who attempted to solve each problem, (c) the number of participants who solved the problem correctly, (d) the number of participants who solved the problem incorrectly, (e) the mean, standard deviation, median, and 10th and 90th percentiles of the solution times, and (f) the percentages of participants who solved the problem correctly and incorrectly, timed out, and provided correct responses using the insight and analytic strategies. One rebus problem and one CRA problem did not receive any correct responses. Notice that some values were computed only on correct responses: For some problems that were very hard to solve, these values either could not be computed or were estimated on very small sample sizes. Thus, in addition to indicating in Appendixes A and B the numbers of correct and error responses, we note specifically whenever a value was computed on a sample of less than ten participants. Considering that there are research settings in which one needs very difficult or very easy problems, or in which one is not interested in precise estimates for some of the values that we reported, we decided to retain these problems in our set, as well.

Correctness and solution strategy

Participants did not receive feedback on the correctness of their responses and were always asked to indicate their solution strategy, independently of correctness. We inspected whether the self-reported solution process, insight or analytic, was associated with the correctness of the solution. The association was significant for both the CRA problems, χ 2(1) = 183.7, p < .001, and the rebus puzzles, χ 2(1) = 49.1, p < .001. The distribution of the responses (Table 1) is coherent with previous studies that have shown that insight responses were more often associated with correct solutions (Metcalfe, 1986): 82 % of the insight responses to the CRAs and 80 % of those to the rebus puzzles were correct, versus 66 % of the analytic responses to the CRAs and 69 % to the rebus puzzles.

Table 1 Numbers of correct and incorrect responses in the CRA and rebus puzzle problems, classified by solution strategy

In the case of errors, the interpretation of the responses regarding the solution process was not unequivocal. On the one hand, it is possible that the participants reported the process that led to the wrong answer; on the other hand, one could also argue that these responses reflect a bias toward one of the strategies, independent of the process. We used a binomial test to inspect whether the answer “insight” was reported significantly more or less often than the answer “analytic” in error trials. The test revealed no significant difference for the CRAs (p = .53) and a preference for “insight” responses to the rebus puzzles (p < .001). Even if this is interpreted as a response bias, a bias was absent for the CRA problems and was not large in magnitude for the rebus puzzles (see Table 1). Such bias can be controlled in experimental contexts—for instance, by planning control conditions.

Order effects

To test order effects, we performed a series of logistic regressions, one for each single problem, in which the dependent variable was the solution (correct vs. incorrect) and the independent variable was the order of presentation of the problem. The systematic presence of significant effects in this analysis could mean, for instance, that the problems were facilitated by being presented later—that is, after other problems had been presented. For the CRA problems, a significant effect (p < .05) of the order of presentation emerged only for six out of 122 problems (i.e., 4.9 %); four of these were facilitated by being presented later, and two by being presented sooner. For the rebus puzzles, a significant effect emerged only for three out of 88 problems (i.e., the 3.4 %); two of them were facilitated by being presented later, and one by being presented sooner. These significant effects were not more frequent than would be expected by chance alone under the null hypothesis of no order effects (i.e., 5 %). As an additional test of the absence of order effects, we inspected the distributions of the 122 p values obtained from the logistic regressions for the CRAs and of the 88 p values obtained in the logistic regressions for the rebus puzzles: In the case of no order effect, the distribution of the p values would be expected to be uniform (e.g., Simonsohn, Nelson, & Simmons, 2014). The Kolmogorov–Smirnov test revealed that the distributions of the p values did not deviate from the uniform distribution for both the CRA (p = .619) and the rebus puzzle (p = .721) problems, therefore confirming the absence of an order effect and that the few significant values obtained in the logistic regressions could be ascribed to sample error.

Morphosyntactic validity

Following the procedure of Bowden and Jung-Beeman (2003b), we divided the CRA problems into two types: homogeneous and heterogeneous. For the homogeneous problems, the solution word was a prefix (or suffix) to all three words of the triplet: These were 56 in number, and the average solution percentage was 39.5 % (SD = 24). For the heterogeneous problems, the solution word was a prefix (versus suffix) to at least one of the words and a suffix (vs. a prefix) to the other word(s) of the triplet: These were 66 in number, and the average solution percentage was 38.6 % (SD = 22.3). A t test revealed no significant difference in the difficulties of these two kinds of problems [t(120) = 1.16, p = .81].

Unlike the English language, Italian morphosyntax requires the consistency of number and gender: If a noun is singular versus plural, an adjective referring to that noun also has to be singular or plural, and if a noun is masculine versus feminine, then an adjective referring to that noun also has to be masculine or feminine. These grammatical peculiarities might provide hints to the solution of the CRA problems, in which the inclusion of an adjective as a stimulus word is very common. The issue connected to the number consistency could easily be prevented by using only singular names as the solutions of CRA problems. However, we could not avoid the consistency of gender, since most CRA problems could only be created by using adjectives, and a gender inconsistency (e.g., using a masculine adjective as a stimulus word to refer to a feminine solution word) would have constituted a hint against the correct solution for an Italian speaker. Therefore, we evaluated post hoc the impact of gender consistency by comparing the solution percentages for triads that contained at least one adjective–noun to those for triads that did not include adjectives. For the 53 problems with a gender match and the 69 problems without a gender match, the mean percentages solved were, respectively, 39.4 % (SD = 25.7) and 38.7 % (SD = 20.9) [t(120) = 1.50, p = .85], indicating that the gender concordance offered no particular advantage to the solvers. In conclusion, the analyses presented in this section allowed us to exclude potential confounds related to language in the problem solving.

General conclusions

In the last decade, research on insight problem solving has availed itself of a new class of problems that are conducive to use with cutting-edge research techniques that require numerous observations per condition. As compared to the classic insight problems, this new generation of problems has several advantages: They are shorter; more compact to present and easier, and thus generate more data; and can be classified by the cognitive functions involved. Unfortunately, only problems that are not linguistically contextualized (e.g., math problems) have been used in languages other than English. As a consequence, most of the studies in this field cannot be replicated in many languages. To expand the study of insight problem solving to the Italian language and culture, we created Italian versions of the CRA problems and rebus puzzles, and tested their validity and solving rates. These two sets of problems could work better than the classic insight problems (a) as stimuli for EEG or fMRI studies (for full reviews, see Kounios & Beeman, 2009, 2014) and (b) to develop short tests aimed to assess individual tendencies for insight versus analytic strategies. It was important to select a pool of problems that were not either too easy or too difficult to solve for the population of interest: the higher the number of solvers, the higher the amount of information that would be available about the preferred strategies of the participant.

For each problem we provide normative data in Appendixes A and B, in descending order of participants’ solving rates within the 15-s time limit. Appendix C presents the rebus puzzle stimuli themselves.Footnote 8 Some of the rebus puzzles require similar strategies to be solved, and it is important to consider this aspect when the puzzles are administered. If several such problems are administered sequentially, one could incur order effects, in that the solution of the problems administered later could be affected by the solution of those administered earlier. These effects can be reduced by keeping similar problems in different blocks, as we did in Study 3. Indeed, in addition to randomization, we divided similar problems into mutually exclusive blocks, and therefore two similar problems (of the same category; see Appendix A) were never in the same block. In many contexts, the ideal strategy would be to compose blocks of stimuli without problems that required similar strategies. For those interested in composing new blocks of stimuli, we provide a list of similar items (under the same category number; see Appendix C, which shows the similarities in the ways that rebuses combine verbal and visual clues).

These studies and materials fill a gap between, on the one hand, a plethora of studies of creativity and problem solving in English (Baer & Kaufman, 2006) and, on the other hand, a lack of studies of problem solving in Italian and other languages than English. Indeed, we know only of sets of problems in Hebrew, Japanese, and Jamaican (Baba, 1982; Hamilton, 1982; Nevo & Levin, 1978). With the Italian version of the CRA problems and rebus puzzles, we aimed to provide a useful apparatus to extend the findings concerning the mental processes underlying creativity to languages other than English.