Pictures are often used as visual stimuli to access or even improve psychological processes (e.g., Brady et al., 2008; Caramazza & Konkle, 2013). However, pictures are complex stimuli, and their characteristics may influence several cognitive and affective processes (Boukadi et al., 2016; Reppa & McDougall, 2015). Therefore, their careful production and validation are essential to guarantee the quality of experimental and interventional designs and to provide comparable results across studies (see Snodgrass & Vanderwart, 1980). Specifically, the assessment of pictures and their characteristics permits the control of their impact on psychological processes, enabling the systematic manipulation of their relevant properties while reducing bias introduced by similar/correlated dimensions (Brodeur et al., 2010; Snodgrass & Vanderwart, 1980).

Critically, validation endeavors require time and precise procedures. In order to overcome this time-consuming task, several databases have been produced and made available to the scientific community. The seminal work by Snodgrass and Vanderwart (1980) constitutes one of the first/most well-known databases. Subsequently, several studies replicated and extended this work to different cultures and languages (e.g., Rossion & Pourtois, 2004; Sanfeliu & Fernandez, 1996), to increased numbers and types of pictures (e.g., Cycowicz et al., 1997; Rossion & Pourtois, 2004) and to different age groups (e.g., Pompéia et al., 2001; Yoon et al., 2004). Recently, the MultiPic dataset presented an extensive open-access sample of normalized colored line drawings of common items from the same source, evaluated in name agreement and visual complexity, in six different languages (Duñabeitia et al., 2018).

The importance of using pictures somewhat closer to the real world in experimental studies has also been acknowledged (e.g., Felsen & Dan, 2005). This concern has motivated the production of more realistic databases (e.g., Foroni et al., 2013; Garrido et al., 2016), which include real-world pictures with vivid and realistic details (e.g., photos) that are suitable for research and intervention.

Common items refer to items of common name concepts that are easily found in our daily lives. Therefore, pictures of common items are particularly useful for research, such as in semantic memory studies with a focus on semantic properties/structure or dissociation of categories, as well as in the evaluation of amnesic conditions (e.g., Caramazza & Shelton, 1998; Farah et al., 1989; Rogers et al., 2015). Despite the existing norms for pictures of the common items, normative studies that produced and validated real-world pictures of common items are still scarce (e.g., Brodeur et al., 2014; Moreno-Martinez & Montoro, 2012; Shao & Stiegert, 2016). One of the best known databases of real-world pictures of common items is the Bank of Standardized Stimuli (BOSS) developed by Brodeur et al. (2010, 2012, 2014). This database includes a wide range of pictures (930 validated images) of different categories, rated on several attributes (e.g., familiarity, manipulability, visual complexity) and is freely available online. Another validated ecological database was offered by Moreno-Martinez et al. (2011) and Moreno-Martinez and Montoro (2012), and includes real-world pictures of common items, evaluated, among others, for typicality and manipulability.

Critically, the systematic and simultaneous examination of measures from affective, semantic/linguistic and perceptive dimensions of the same set of pictures is not yet available. For example, the BOSS database (Brodeur et al., 2010, 2012, 2014) extensively explored semantic and perceptive dimensions, but the affective ones were not investigated. Moreno and colleague’s databases (2011, 2012) present picture norms by categories but do not address category agreement or any affective dimensions.

In addition, databases with improved ecological validity require careful consideration of important image properties related to their ecological richness (e.g., size, view, color parameters). An example of this concern is provided in FRIDa [Foodcast Research Image Database] (Foroni, et al., 2013), which controlled surface parameters (e.g., brightness and color) while producing norms for real-world pictures of foods and common objects in several important and little explored dimensions, such as aesthetic appeal, valence, arousal, typicality and ambiguity. Rossion and Pourtois (2004) have already shown the advantage in accuracy and reaction times for naming colored line drawings (vs. black-and-white and grayscale ones) on a timed vocal naming task. Overall, ignoring such properties implies overlooking additional variables that might affect picture processing.

Another important feature to consider in the validation of real-world pictures is the linguistic and/or cultural context in which the data are produced. Cross-cultural comparisons have shown that some picture attributes, particularly those related to semantic dimensions (such as familiarity, category agreement, conceptual agreement and name agreement), are culturally based (Duñabeitia et al., 2018; Kremin et al., 2003; Székely et al., 2004; Yoon et al., 2004). For example, Duñabeitia et al. (2018) provided subjective ratings of name agreement and visual complexity for colored line drawings in six different European languages across seven European countries. Their findings demonstrated that linguistic similarities are not enough to guarantee the absence of variations in naming (Duñabeitia et al., 2018), since differences were observed for the same language in different cultural contexts (e.g., Dutch speakers from different countries did not provide the same name for all pictures). Thus, inspecting cultural-based differences is crucial for a better understanding of the way some features of picture processing depend on the cultural background.

To the best of our knowledge, the BOSS is the only real-world pictures database of common items that has been extensively examined in different cultures and languages (Brodeur et al., 2012, 2014; Clarke & Ludington, 2017). These studies provided interesting inputs regarding culturally based (i.e., English, French, Chinese and Thai) and also linguistic-based differences (i.e., French vs. English speakers living in Canada). In the Portuguese context, there are some recently validated picture databases, although they mainly report affective dimensions, and none of them focused on real-world pictures of common items (e.g., Garrido et al., 2016; Prada et al., 2016, 2017; Rodrigues et al., 2018). Importantly, the referred studies did not explore cross-cultural differences, nor relevant dimensions, such as typicality, name agreement or category agreement as well as their interaction.

The current work presents a comprehensive, culturally based, normative study of real-world pictures of common items and includes a systematic validation of several dimensions of picture processing conducted with a Portuguese sample. Specifically, RealPic establishes subjective norms for real-world pictures of 596 common items, selected from existent normalized databases, in nine measures from affective, semantic and perceptive dimensions. These dimensions were selected based on the need to extend existing norms to traditionally less studied dimensions (i.e., arousal, valence, picture-name agreement, and aesthetic appeal) in addition to the most commonly explored ones (e.g., name agreement, familiarity, visual complexity; for a review see Souza et al., 2020).

Dimensions of interest

Category agreement provides information about how category membership is processed (see Clarke & Ludington, 2017). The influence of category has been observed across several variables, such as familiarity, lexical frequency and typicality (Brodeur et al., 2012; Foroni et al., 2013; Moreno-Martinez, et al., 2011; Rossion & Pourtois, 2004). Categorization may also depend on domain specificities, with living things processed differently from non-living ones (Caramazza & Shelton, 1998; Warrington & McCarthy, 1987). Domain effects reflect evolutionary aspects (Caramazza & Shelton, 1998) that are expected to influence several variables, such as typicality (Moreno-Martinez et al., 2011) and arousal (Foroni et al., 2013) or even present cultural variance (see Na et al., 2017). Therefore, it seems critical to normalize the stimulus regarding category agreement and to explore the relation that such semantic content presents with other dimensions in a culturally based manner.

Name agreement refers to the consensus of an individual semantic representation in capturing the most appropriate name as a label for each picture (Pompéia et al., 2001; Snodgrass & Vanderwart, 1980). Name agreement appears to be a consistent measure that is relatively independent of pure language variations, as suggested in studies conducted in different languages within the same cultural environment (Brodeur et al., 2012). However, other measures of naming abilities were shown to be affected by linguistic (Kremin et al., 2003; Yoon et al., 2004) and cultural variations (Boukadi et al., 2016; Cycowicz et al., 1997; Duñabeitia et al., 2018). Given its importance to several aspects of pictures and related concept processing (e.g., naming time: Dell’Acqua et al., 2000; reading aloud: Boukadi et al., 2016), the identification of the most common name of the pictures and its variability in a given language assumes particular relevance in picture normalization studies.

Familiarity reflects the degree to which someone interacts or thinks about a specific concept or item-concept in everyday life (concept frequency; Snodgrass & Vanderwart, 1980) and seems to be influenced by characteristics of the respondents such as age, native language and social context (Pompéia et al., 2001). Previous studies suggest that familiarity influences several psycholinguistic measures of picture processing, being positively related with lexical frequency, percentage of name agreement, and typicality, although inversely correlated with visual complexity (see Brodeur et al., 2014; Moreno-Martinez et al., 2011; Snodgrass & Vanderwart, 1980). Familiarity is also a good predictor of affective ratings, showing positive correlations with valence and arousal (Garrido & Prada, 2017; Prada et al., 2016). This dimension has been largely addressed across line-drawing normative studies and may be particularly relevant for real-world pictures of common items.

Typicality refers to how well a given exemplar represents a category (Medin et al., 2007; Murphy et al., 2012). It is dependent of the number of features shared between the item and its own category (e.g., “having feathers”, “having beaks,” into the category “Birds”). Previous studies have shown that less typical items (i.e., items that share less features with their categories) are perceived as less familiar (Moreno-Martinez & Montoro, 2012; Moreno-Martinez et al., 2011; but see Dell’Acqua et al., 2000 for other results), more ambiguous (Foroni et al., 2013), more complex (Moreno-Martinez & Montoro, 2012) and are named slower (Dell’Acqua et al., 2000). Although not well explored, typicality is a valuable dimension, and examining its interaction with other dimensions may be beneficial to avoid confounding effects.

Arousal represents the emotional activation elicited by an item usually reported in a scale varying from calm to excitatory levels (Foroni et al., 2013; Russell, 1980). In previous studies evaluating symbols, arousal ratings presented a positive correlation with familiarity, aesthetic appeal, visual complexity, concreteness and valence (Prada et al., 2016). Furthermore, previous studies using pictures of food, objects and natural items showed that, overall, arousal presented a positive correlation with valence and also with typicality for natural items but a negative one with familiarity for objects (Foroni et al., 2013). However, normative studies with real-world pictures of common items from different categories have often neglected this dimension.

Aesthetic appeal refers to the ability of an item in attracting interest through visual liking experience (Prada et al., 2016; Reber et al., 2004). It is a multidimensional variable that plays an important role in visual tasks since it entails several features of the aesthetic experience (Reppa & McDougall, 2015), such as surface details of the picture, meaningfulness of the concept or even self-preferences. However, aesthetic appeal is one of the least explored dimensions in picture norms studies.

Valence indicates to what extent an image elicits different degrees of pleasant-unpleasant emotionality (Prada et al., 2014; Russell, 1980). Valence is positively correlated with familiarity, typicality and arousal (Foroni et al., 2013; Prada et al., 2010, 2018)—independently of the item category—and also with aesthetic appeal and visual complexity (Prada et al., 2016), emphasizing the relevance of its inspection in real-world pictures.

Visual complexity is an image-based measure focused on surface features of image quality parameters (i.e., color, shape, brightness, luminosity, contrast, size, complex/simple lines). Snodgrass and Vanderwart (1980) have shown that visual complexity varies as a function of category specificity. It is also recurrently negatively correlated with familiarity (Brodeur et al., 2012; Brodeur et al., 2014; Pompéia et al., 2001; Prada et al., 2016; Snodgrass & Vanderwart, 1980). Highly complex items modulate category agreement and naming abilities (Brodeur et al., 2014), and are perceived as more appealing, positive and arousing (Prada et al., 2016). It is, therefore, a mandatory dimension in the validation of pictures, particularly real-world pictures due to their realistic surface parameters.

Picture-name agreement refers to the agreement between a concept and its related pictures, often indicated as a viable alternative to measure picture effectiveness in representing the intent concepts (Snodgrass & Vanderwart, 1980). Picture-name agreement is particularly relevant because it allows a direct (based on the concept) way of capturing the agreement between an image and its mental representation (Johnston et al., 2010; Sanfeliu & Fernandez, 1996; Snodgrass & Vanderwart, 1980). Picture-name agreement is positively correlated with categorization (see Sanfeliu & Fernandez, 1996), name agreement (Morrison et al., 1997) and with image agreement (Snodgrass & Vanderwart, 1980), although negatively correlated with familiarity (Sanfeliu & Fernandez, 1996). Its standardization is crucial in real-world pictures as these pictures may not be equally good in visually representing the concepts (e.g., due to different angles and details).

The inspection of such dimensions across languages and cultures may provide important cues about the consistency and generalizability of the norms produced (see Moreno-Martinez & Montoro, 2012; Prada et al., 2017). Therefore, the adaptation of the stimulus sets to different countries enables a more appropriate selection of stimuli regarding linguistic and culturally dependent aspects, assuring an effective manipulation of stimuli for further empirical or interventional purposes.

The main goals of this research were therefore to (1) establish culturally based norms of pictures of common items for the Portuguese context; (2) expand and increase the diversity of parameters standardized in previous studies, namely simultaneously examining affective, semantic and perceptive dimensions using systematic procedures; and (3) inspect the consistency of such norms through cross-cultural comparisons.

Methods

Participants

Participants were recruited online through social networks (e.g., Facebook). Participants had to meet all the following criteria: (1) a native speaker of European Portuguese, (2) older than 18 years, (3) having a minimum of four years of formal education, and (4) having vision preserved or corrected. A sample of 759 participants volunteered to participate in the study. Fifty-nine participants were excluded for not completing at least 50% of the survey and 16 for not meeting the inclusion criteria. Overall, the final sample included 684 participants (472 female), with 72.1% completing the entire survey. Participants’ age ranged from 18 to 65 years old, the majority (72.95%) being young adults (age range: 18–34), 20.18% mid-aged adults (age range: 35–54) and 6.9% older adults (above 55 years old). The sample reported high education levels (25.4 % post-graduation; 42.1% undergraduates; 32.5% other).

Stimuli

The stimulus set consisted of 718 pictures: 357 were selected from the BOSS database (version 1, Brodeur et al., 2010; and version 2, Brodeur et al., 2014), 183 from Moreno-Martinez and colleagues (2011, 2012) databases, 127 from the Konklab database (Brady et al., 2008) and 51 from other free databases licensed for noncommercial usage (e.g., Flirk, Pixabay, Wikipedia). The stimuli were divided into 12 previously defined categories from living (mammal, fruit, vegetable, birds, insects) and non-living (clothing, vehicles, kitchen utensils, musical instruments, furniture, desk materials, tools) domains based on their occurrence in everyday life, their diversity and their application potential (see Moreno-Martinez & Montoro, 2012, for a similar procedure).

Pictures were resized to 500 × 500 pixels and depicted against a white background. The pictures were previously inspected for their quality during two independent phases using subjective and objective procedures. First, in a pre-selection phase, the most culturally suitable Portuguese name for the item original name was established. Subsequently, four independent raters, native speakers of European Portuguese and completely naïve to the goals of the study, were asked to provide the most appropriate name for the pictures (i.e., two raters named half of the items, and the other two the remaining half). Inter-rater agreement was high for both pairs of ratersFootnote 1 (84% and 79%, respectively). Disagreements between raters were resolved by the first two authors. Overall, these evaluations established the appropriateness of the previously defined name for each item. These two judges also confirmed the suitability of the items for the target categories (see the final distribution of pictures per categories in Table 1). Additionally, the first sample of naïve judges was also asked to rate all items regarding their visual quality on a 10-point scale ranging from 1 (very poor quality) to 10 (very good quality). These procedures lead to the exclusion of 98 pictures (13.64%) that were overall unrecognized/unnamed either due to cultural inadequacy (e.g., the fruit “pecan” or the animal “nyala” are rare or unknown in the Portuguese context), the suitability of the picture in representing the concept (e.g., an image of a “crib” that was not named by any judge) or redundancy (e.g., image of a daddy long leg spider and image of a widow spider being always named as spider). Additionally, 24 pictures (3.35%) from the overall sample evaluated as having low quality (i.e., rated below 6 on the quality scale) were excluded. Based on these evaluations, 596 (83.01%) out of 718 photographs (119 from BOSS v.1; 175 from BOSS v.2; 158 from Moreno-Martinez & Montoro, 2012; and 144 from other sources) were selected. Each category included about 50 pictures. In a second phase, the color parameters (i.e., RGB and luminance) were also examined to ensure that the visuo-perceptual characteristics were consistent across pictures and to minimize their effect on the ratings of other dimensions. Therefore, a random sample (about 60% of the items) of 356 photographs (from 596) was examined regarding the uniform distribution of RGB and perceived luminance parametersFootnote 2 in order to confirm the quality of the selected pictures across domains.

Table 1 Distribution of items by category and domain

Procedure

The study was conducted using Qualtrics software. After giving informed consent (including general information, inclusion criteria and ethical information), participants provided sociodemographic information (i.e., age, education, gender and native language). The task instructions were presented, followed by a brief description of each of the dimensions in which pictures should be evaluated. Participants were asked to rate, in seven dimensions, a subset of 40 picturesFootnote 3 from different categories, randomly selected from a pool of 596 (see Alario & Ferrand, 1999; Brodeur et al., 2014; Cycowicz et al., 1997; Tsaparina et al., 2011 for similar procedures). Additionally, participants were asked to provide a name (name agreement task) and a category (category agreement task) to each picture.

A minimum of 30 evaluations per picture was established, in line with several normative studies using visual stimulus (Brodeur et al., 2010: N = [33, 39]; Brodeur et al., 2014: N = [32, 42]; Johnston et al., 2010: N = [25, 31]; Garrido et al., 2016: N = 30). After treating the data, the number of ratings per picture in each of the seven dimensions ranged from 27 to 34 (M = 30.61, SD = 1.783 to M = 31.20, SD = 1.890). For name agreement and picture name agreement, responses per picture ranged from 29 to 57 (M = 32.35, SD = 1.890).

The task was divided into three blocks. Block A included the object-based measures: familiarity, arousal and valence ratings. Block B contained the image-based measures: visual complexity and aesthetic appeal ratings. Block C consisted of conceptually based measures including name agreement, category agreement, picture-name agreement and typicality. Blocks A and B were randomly presented between participants as well as the order of the dimensions in each block. Block C was always presented at the end, with a fixed order of dimensionsFootnote 4. The dimensions were rated on a seven-point scale (see Table 2), except the naming and the categorization tasks that required a written response (Snodgrass & Vanderwart, 1980). The definition, the scales and the main references for each dimension are presented in Table 2.

Table 2 Instructions and their references for each dimension

Results

In this section, we present (1) data preprocessing, (2) item norms, (3) descriptive results by evaluative dimension and correlations between dimensions, (4) linguistic attributes analysis, and (5) cross-cultural/linguistic data.

Data preprocessing

Data preprocessing of all rated dimensions included the examination of biased inputs and transformations from absolute frequencies to proportional scores. Outliers’ analysis followed a criterion of 2.5 standard deviations above or below the mean rating per picture in each dimension (Garrido et al., 2016). Since the occurrence of outliers in all dimensions was very low (range: 1% to 3%), and there was no overall indication of systematic or extremely biased responses, no data were excluded. Missing values were below 5% of the entire database across all rated dimensions. After data treatment, the analysis was run by item (instead of by participants). The mean ratings (i.e., sum of ratings/N of evaluations per image) and standard deviations were obtained for each image in each dimension. Additionally, a normality test based on the curves’ peaks and extremities of the distributions indicated that all rated dimensions followed a normal distribution with acceptable values of kurtosis and skewness (between ±2; Gravetter & Wallnau, 2014).

Data preprocessing was also conducted for the two linguistic dimensions (i.e., name agreement and category agreement). These dimensions were obtained with free response which provided several linguistic attributes (i.e., modal name agreement, modal category agreement, alternative valid names/categories, percentage of correct responses and modal responses, and h-value of agreements). Each response was analyzed regarding qualitative (written response) and quantitative (number of references to a given response) parameters. The number of different acceptable responses was quantified for each picture. This procedure included a first inspection for basic variants of the same name (e.g., plural, gender, hyphen, composite names with different order, presence of determinants/adjectives/verbs) and spelling mistakes/errors (see Brodeur et al., 2014 for similar procedure). Basic-level concepts (e.g., “bird” in reference to “cardinal”) and regional variants (“robe”, in English robe, or “roupão”, in English gown) were considered as correct. Complete descriptions (e.g., “red orange”) were considered different descriptions from summarized ones (e.g., “orange”). Incorrect, don’t know and tip-of-the-tongue responses were not considered for further analysisFootnote 5.

Item norms

The entire RealPic dataset of norms is provided (Supplemental Materials, Table 1). Detailed information for each item is presented, including: item original database, item original name (i.e., from the original database), item Portuguese target name and item target category. For the seven rating scales, the means and standard deviations, frequencies (number of ratings for each item) and confidence intervals (CI) at 95% are also presented. Additionally, the CIs were used to classify the stimuli as low, moderate or high in each measure (Prada et al., 2016; Rodrigues et al., 2018). Whenever the CI included the scale midpoint (i.e., 4), the items were considered “moderate”; when the upper bound was lower than 4, the items were considered “low”; and when the lower bound of the CI was higher than 4, the items were considered “high” (see Supplemental Materials, Table 1). Overall, the obtained normative data is composed of items with considerable variability in arousal (175 high, 271 moderate, 150 low), aesthetic appeal (219 high, 271 moderate, 106 low) and visual complexity (108 high, 277 moderate, 211 low). The variability of the ratings for typicality (493 highly typical items), familiarity (406 highly familiar items) and picture-name agreement (526 high agreement) was lower. Valence ratings (77 low) were moderate to high.

Descriptive results and correlations by evaluative dimension

Descriptive statistics for each of the seven rated dimensions are provided in Table 3. Overall, the means varied in all the dimensions and presented significant differences from the scale midpoint (p < .05; see Prada et al., 2018, for further methodological details), with the dimensions of picture-name agreement presenting the highest mean ratings, and visual complexity presenting the lowest mean ratings.

Table 3 Descriptive statistics for all items in each dimension

The mean ratings of the seven dimensions presented significant correlations (p < .05). Comments on moderate to very strong correlations (Evans, 1996) are provided (see Table 4 for all Pearson’s r results). Significant correlations involving less explored dimensions (i.e., typicality, arousal, valence and aesthetic appeal) in previous normative studies are also reported even if weak.

Table 4 Pearson’s r correlation values for all rated dimensions

The results showed a positive strong correlation (r > .60) between familiarity and picture-name agreement. In line with previous findings for photos and line drawings (Saryazdi et al., 2018), items rated as more familiar also presented increased picture-name agreement. Moreover, moderate correlations (r > .40) between familiarity and visual complexity as well as familiarity and valence were also observed. Specifically, items rated as less visually complex were considered more familiar (Brodeur et al., 2014; Moreno-Martinez & Montoro, 2012; Sanfeliu & Fernandez, 1996; Shao & Stiegert, 2016; Snodgrass & Vanderwart, 1980; but see Brodeur et al., 2010 for different results) and more positive (see Foroni et al., 2013, for a similar result). Although weak (r < .40), some significant correlations presented relevant indicators about the typicality dimension. For instance, typicality was positively correlated with familiarity, confirming previous findings (Moreno-Martinez et al., 2011; Moreno-Martinez & Montoro, 2012), as well as with all the other dimensions (p < .05), except visual complexity (r < .20).

Visual complexity showed a moderate and positive significant correlation with arousal (r = .519). Items rated as complex were also significantly rated as more exciting/arousing. Significant (but weak) correlations between picture-name agreement and valence, typicality, aesthetic appeal (all positive) and visual complexity (negative) were also observed.

The very strong correlation (r > .80) observed between valence and aesthetic appeal indicates that the items rated as more positive were also considered more visually appealing. Even though presenting weak correlations (r < .40), the significant negative correlations between arousal and aesthetic appeal, valence and familiarity contrast with the results from previous studies using other types of stimuli in which these correlations were also weak but positive (see Garrido et al., 2016; Prada et al., 2016; Rodrigues et al., 2018). However, the negative correlation between arousal and familiarity is consistent with previous findings using real-world pictures of natural items (see Foroni et al., 2013). The observed correlation between aesthetic appeal and familiarity has also been reported in previous studies using different types of stimuli (e.g., McDougall & Reppa, 2008; Prada et al., 2016; Rodrigues et al., 2018).

Partial correlations were also obtained to control the influence of categories in the correlations between dimensions (see Table 5). Overall, the significant strong correlations reported remained when controlling for categorical effects. Importantly, the positive correlation between typicality and familiarity increased from small to medium. The weak positive correlation between arousal and typicality previously reported without category control was the only one that was not observed with this new analysis.

Table 5 Partial correlation for all rated dimensions controlled by category

Interestingly, the most powerful correlations were observed among dimensions that were less reported in previous norms of real-word pictures (i.e., aesthetic appeal, valence, arousal and picture-name agreement). Nevertheless, such correlations were reported in normative studies using other types of stimuli (e.g., Prada et al., 2016; Rodrigues et al., 2018), which, together with our findings, emphasize the relevance of exploring these dimensions in real-world pictures.

Additionally, correlational analysis contrasting the arousal and valence scores obtained in our normative study with those obtained by Soares et al. (2012) for the corresponding visually presented words (n =56) revealed that both arousal (r > .60) and valence (r > .80) were significant and positively correlated between the two sources (written words and images).

Linguistic attributes analysis

Name and category agreement included three quantitative measures each: (1) the percentage of correct responses, (2) the percentage of the most common (modal) name/category for the item (e.g., cat/mammal), and (3) the statistic h-valueFootnote 6. Overall results are presented in Table 6.

Table 6 Descriptive statistics for all items in each linguistic attribute

Regarding name agreement, the percentage of correct responses (92%) was above chance. Participants presented high modal name agreement (modal NA: M = 77.94%, SE = 0.92), although considerable variability was observed in valid appropriate names (h-value of NA: M = 0.78, SE = 0.04). The correspondence between the target name and the modal name was observed in 71% of the 596 pictures. From the responses referring to a modal name that was different from the established target name, generally, 75.88% reflected culturally accepted general names (e.g., naming different types of spoons with the general concept “spoon,” in European Portuguese “colher”) or similar names (i.e., naming “tweezers,” in European Portuguese “pinça,” as an alternative for “tongs” that is “tenaz” in European Portuguese).

The category agreement results indicated an above-chance percentage of correct categorization (94%). The modal category agreement was moderate (modal CA: M = 65%, SE = 0.008) and presented high variability in the valid appropriate categories attributed by the participants (h-value of CA: M = 1.40, SE = .03), which was expected for this task procedure (i.e., free response). Additionally, the correspondence between the established target category and modal category agreement was observed for 79% of the pictures, with about 7% presenting different but culturally accepted categories. For example, categorizing “child scooter” as a “toy” instead of “vehicles” or using appropriate non-target categories (e.g., naming “legume” for “vegetables”), more specific categories (e.g., “dry fruits” for “fruits”) or more general categories (e.g., “animals” for “mammals” items).

Detailed information about name and category agreement for the entire database and for each image can be found in Table 2 of the Supplemental Materials.

Cross-cultural/linguistic analysis

The current RealPic norms were divided into subsets according to their source (original dataset). The mean ratingsFootnote 7 per item in each subset were contrasted with the norms reported in the original datasets: the BOSS dataset (v.1: Brodeur et al., 2010; v.2: Brodeur et al., 2014) and the ecological database of Moreno-Martinez and Montoro (2012) obtained with English-Canadian and Spanish samples, respectively (see Tables 3, 4 and 5 of the Supplemental Materials). This analysis was conducted using univariate ANOVAs with 2 Sample (original subsample vs. RealPic) × 2 Domain (living vs. non-living) as factors for each common dimension in both datasets. The variable semantic domain was included in this analysis to provide a more robust inspection of cultural-based effects. Semantic processing involves general knowledge acquired during our life experiences which is related to the environmental context. The processing of non-living items (e.g., tools, furniture, vehicles, etc.) and living ones (e.g., mammals, fruits, birds, etc.) can therefore be influenced by socio-cultural factors, such as cultural values, social needs and evolutionary pressures (see Barbarotto et al., 2002; Na et al., 2017). Domain specificities have been extensively reported in the literature (see Caramazza & Konkle, 2013; Caramazza & Shelton, 1998; Warrington & McCarthy, 1987; Warrington & Shallice, 1984). Bonferroni adjustment contrasts were used for inspecting main effects, and t tests to explore post-hoc interaction effects.

Regarding the comparison of RealPic (Portuguese) versus BOSS v.1 (Brodeur et al., 2010; English-Canadian; item distribution – living items: 31, non-living items: 88), the inspected dimensions were name agreement measures, familiarity and visual complexity. The ANOVA results showed a significant main effect of Sample across dimensions (all ps < .05), except for visual complexity. Specifically, the Portuguese sample presented higher name agreement (BOSS v.1: M = 56.58, SE = 3.01; RealPic: M = 70.14, SE = 3.01) and more consistency in naming (h-value: BOSS v.1: M = 32.01, SE = 2.29; RealPic: M = 21.17, SE = 2.29). The Portuguese sample also rated the items as more familiar (BOSS v.1: M = 60.55, SE = 2.31; RealPic: M = 76.40, SE = 2.31). The main effect of Domain and its interaction with Sample was not significant for any of the dimensions, indicating consistency across samples by Domain. See Table 7 for detailed results.

Table 7 Main effects and interaction effects between Sample and Domain across rated dimensions

The ANOVA results for RealPic (Portuguese) versus BOSS v.2 (Brodeur et al., 2014; English-Canadian; item distribution - living items: 72, non-living items: 103) revealed a significant main effect of Sample across all naming dimensions (all ps < .05, see Table 7 for details). Specifically, the Portuguese sample obtained a higher percentage of name agreement (BOSS v.2: M = 59.01, SE = 1.80, RealPic: M = 73.65, SE = 1.80) and was more consistent in the valid names provided (h-value – BOSS v.2: M = 38.41, SE = 1.54, RealPic: M = 17.11, SE = 1.54). In contrast with the abovementioned comparison with BOSS v.1, the main effect of Domain was observed in all dimensions (all ps ≤ .03). Living things were rated as more visually complex (living: M = 59.33, SE = 1.70; non-living: M = 47.33, SE = 1.42) and less familiar (living: M = 63.76, SE = 1.67; non-living: M = 68.45, SE = 1.39), and presented higher name agreement (% of name agreement – living: M = 71.21, SE = 1.95; non-living: M = 61.45, SE = 1.63) and less variability in naming (h-value – living: M = 23.22, SE = 1.67; non-living: M = 32.30, SE = 1.39) than non-living things. The interaction effect between Sample and Domain was significant for most of the dimensions (all ps ≤ .03; except for familiarity, p = .44), with the Portuguese sample presenting higher name agreement (% of name agreement – BOSS v.2: M = 66.81, SE = 2.77; RealPic: M = 75.61, SE = 2.77, t(142) = −2.44, p = .016) and less naming variability (h-value – BOSS v.2: M = 30.29, SE = 2.36; RealPic: M = 16.15, SE = 2.36, t(136.499) = 5.09, p < .001) for living things. Living items were also evaluated as less complex by the Portuguese sample (BOSS v.2: M = 65.31, SE = 2.41; RealPic: M = 53.34, SE = 2.41, t(142) = 4.89, p < .001). Regarding the non-living domain, the Portuguese sample showed more agreement in naming (% of name agreement – BOSS v.2: M = 51.21, SE = 2.31; RealPic: M = 71.69, SE = 2.32, t(204) = −5.94 , p < .001) and less naming variability in comparison with the English sample (h-value – BOSS v.2: M = 46.54, SE = 1.98; RealPic: M = 18.07, SE = 1.98, t(197.806) = 9.22, p < .001), with no significant differences by sample for the remaining dimensions (all ps > .20).

The ANOVA results for the RealPic (Portuguese) versus the ecological database (Moreno-Martinez & Montoro, 2012; Spanish) inspected the dimensions of familiarity, naming agreement, typicality and visual complexity. The results showed a significant main effect of Sample, for familiarity and typicality (all ps < .005). Portuguese participants rated the items as more typical (ecological: M = 63.98, SE = 1.87; RealPic: M = 76.48, SE = 1.87) and familiar (ecological: M = 62.82, SE = 1.86; RealPic: M = 70.45, SE = 1.86). Significant main effects of Domain (living: 73 items; non-living: 84 items) for visual complexity and familiarity (ps < .02) were also observed, with living things rated as significantly less familiar (living: M = 63.40, SE = 1.92; non-living: M = 69.87, SE = 1.79) and as visually more complex (living: M = 45.03, SE = 1.79; non-living: M = 38.04, SE = 1.67) than non-living things. Moreover, significant interaction effects between Sample and Domain were found for name agreement measures (h-value and percentage of NA with ps ≤ .02). The Portuguese sample presented less variability in naming living things (h-value – ecological: M = 28.45 SE = 2.82, RealPic: M = 17.02, SE = 2.82, t(132.536) = 2.93, p = .004), but no significant differences between samples were observed for non-living things (all ps > .1). No differences across cultures were found in the remaining dimensions for living-things and non-living things (all ps > .1). Statistical details are provided in Table 7.

Discussion

The present study systematically compiled stimuli and extended norms for real-world pictures in nine dimensions comprising the affective, semantic and perceptive domains. RealPic dataset includes a considerable range of pictures distributed across several categories (see Santi et al., 2015). To the best of our knowledge, few normative datasets normed such type of stimuli in the Portuguese context (e.g., Prada et al., 2010; Prada et al., 2014), and none of them includes standards for such a variety of dimensions.

Overall, the results indicated that the RealPic dataset comprises items that are highly familiar, typical, positive, somewhat arousing and visually appealing, medium to low in complexity and presenting high agreement between picture and name. These results are in line with previous studies using real-world pictures of common items, in which those stimuli were rated as relatively complex and presented optimal object agreement (Brodeur et al., 2010; Brodeur et al., 2014). The results also indicate that this type of picture is less subject to negative feelings (see also Prada et al., 2010), likely because they depict well-known and easily recognizable items. Previous research has shown that the most recognizable and meaningful symbols (high valid responses) were also rated as highly arousing, positive and visually appealing (Prada et al., 2016). Furthermore, the overall high ratings obtained for typicality and familiarity do not constitute a critical issue since real-world pictures of common items are actually expected to be typical and familiar (e.g., Adlington et al., 2009; Brodeur et al., 2014; Moreno-Martinez & Montoro, 2012; Shao & Stiegert, 2016). Congruently, it seems that increasing the quality of the pictures and their proximity to the real world is likely to improve their familiarity, and consequently their typicality ratings.

The above-chance scores for linguistic attributes (name agreement and category agreement), together with a moderate to high variation of attributed (target and non-target) names and categories, are in line with previous norms using pictures of common items (Brodeur et al., 2010; Brodeur et al., 2014; Snodgrass & Vanderwart, 1980) and also favor the applicability of those stimuli. Moreover, the high variability in category agreement contrasted to the low variation observed in typicality ratings suggests that both dimensions, although part of the categorization processing, may not be identical, as considered by Clarke and Ludington (2017). For instance, a picture may be typical even if it is not consistently considered as a member of the target category (e.g., “panini grill,” considered a highly typical item, although presenting high variability in categories attributed and with a CA percentage lower than 40%). In examining such findings, the RealPic dataset is likely to be a useful tool in exploring naming abilities, semantic organization and memory skillsFootnote 8.

The correlation results can provide important insights on traditionally less explored dimensions in previous validation studies, namely arousal, aesthetic appeal, picture-name agreement and valence. The contrast between our correlational results (i.e., arousal and aesthetic appeal, valence and familiarity) and those reported in other normative studies might be related to the specific type of stimuli used across studies. For example, the interaction between arousal and other dimensions might depend on the type of stimuli, particularly when they present novelty (see Foroni et al., 2013). In comparison to the distinctiveness of faces (Garrido et al., 2016), symbols (Prada et al., 2016) and emojis (Rodrigues et al., 2018), common items are well-known stimuli related to general knowledge. The high scores for familiarity, typicality and picture-name agreement observed in RealPic are in line with such perspective.

Original results from our study regarding aesthetic appeal and picture-name agreement showed that such dimensions are positively correlated with all the rated dimensions, except for visual complexity and arousal respectively. Specifically, while aesthetic appeal presented positive correlations with valence (very strong), it was negatively correlated with arousal, indicating the qualitative differences between these two affective measures. Indeed, aesthetic appeal seems to capture affective but also the influence of perceptual features (see Reppa & McDougall, 2015). Regarding picture-name agreement, the positive correlation (strong) with familiarity (Brodeur et al., 2014; but see Sanfeliu & Fernandez, 1996 for different results) and the negative correlation with visual complexity (but see Saryazdi et al., 2018) reflect its multiple influence in both visual and conceptual-based processing (Johnston et al., 2010; Sanfeliu & Fernandez, 1996; Snodgrass & Vanderwart, 1980). Taken together, these findings indicate the relevance of exploring other visual-related attributes of pictures aside from visual complexity to further understand their impact on affective and cognitive processes. The weak/absent correlations between typicality and visual complexity as well as between arousal and typicality and valence still require further examination.

Cross-cultural comparisons indicated that the RealPic items were rated as considerably more familiar than the very same items rated by a Spanish subsample (Moreno-Martinez & Montoro, 2012). Nevertheless, familiarity seems to be the least influenced dimension by Portuguese vs. Canadian cultural differences. Accordingly, strong correlations have been observed across different cultures and languages for familiarity (Boukadi et al., 2016; Brodeur et al., 2012). Such conflicting findings may result from the influence of other variables known to influence familiarity and that were examined simultaneously in our study, such as valence and category agreement (see Foroni et al., 2013; Prada et al., 2018). Moreover, such differences in familiarity ratings could be explained by the fact that the compared items are a subsample of the original datasets used for RealPic which was selected based on their cultural occurrence in the Portuguese environment.

Cultural differences between the Portuguese and Spanish context were also found for typicality ratings. Typicality and familiarity have been presenting positive significant correlations in common items studies (Brodeur et al., 2014; Moreno-Martinez et al., 2011; Snodgrass & Vanderwart, 1980), covarying also by the frequency in which an item or its concept occur. Another possibility is that those findings might have been motivated by the differences in the original items subsamples relative to living and non-living domains as well as categories, once familiarity and typicality are known to be influenced by category and domain effects (Brodeur et al., 2012; Foroni et al., 2013; Moreno-Martinez et al., 2011; Moreno-Martinez & Montoro, 2012).

The cross-cultural comparison also indicated that name agreement measures (i.e., percentage and h-value) presented significant differences in the Portuguese vs. Canadian samples. However, these measures showed equivalent results for the comparison between Spanish and Portuguese samples, suggesting that similarities in cultural environments associated to the consistent use of pictures may reduce the influence of linguistic differences in naming (see Brodeur et al., 2012). Likewise, linguistic consistency is expected across near-to-Mediterranean cultures and from languages sharing the same linguistic Latin background (Azevedo, 2005). In fact, a previous study reported high correlations of naming measures across languages and/or countries as well as across clustered languages from the same linguistic family (e.g., Germanic or Romance) confirming a reasonable degree of communalities across languages and cultural context (Duñabeitia et al., 2018).

Finally, the main effect of semantic domain (i.e., living and non-living), observed across samples may be also interpreted within a feature distinctiveness approach in which non-living items share less features and present higher correlations with distinctive features than living items (see Moss & Tyler, 1997; Randall et al., 2004). However, the cross-cultural differences (English-Canadian vs. Portuguese and Spanish vs. Portuguese) observed in name agreement, familiarity and visual complexity suggests that cultural background may influence semantic organization. It has been argued that the animacy of the items implies a complex neural network influencing the various stages (i.e., perceptive and semantic) of processing based on their evolutionary weight (see Caramazza & Shelton, 1998; Nairne et al., 2013). Moreover, the survival issues are susceptible to regions and habits. For instance, it is plausible that cultural characteristics (i.e., climate, accessibility of food, availability and necessity of specific tools or even traditions) may influence the evolutionary-based value of items across the semantic domain in several dimensions which require further cultural examinations. However, the current cross-cultural findings should be interpreted with caution as the current study does not constitute a replication, and any methodological differences (i.e., number of assessments, context of data collection, order of presentation of dimensions, etc.) might have influenced the results.

Despite the relevance of such normed dataset, the current study presents a few limitations, namely regarding the number of evaluations per picture, the sample characteristics and the data collection environment. First, a limited number of respondents in psychological studies has driven the production of conflicting findings across studies (Brysbaert, 2019). However, the number of evaluations per item established for the current study was based on previous normative studies that have produced reliable results (Alario & Ferrand, 1999; Brodeur et al., 2014; Cycowicz et al., 1997; Tsaparina et al., 2011). Second, the sample in our study was fairly homogeneous regarding participants’ high level of education and unequal distribution across age groups, making certain types of comparisons across these variables unfeasible. It is well established in the literature that some of the dimensions (e.g., name agreement) assessed in the current study may be influenced by age and education level (Laiacona et al., 2016; Spezzano et al., 2013). For instance, Laiacona et al. (2016) have already shown that age and educational level are relevant predictors of naming ability. Pompéia et al.’ (2001) also showed differences in normative ratings between children and adults and across different education levels. On the other hand, the demographic characteristics of our sample allowed comparisons with many other normative studies that used highly educated young adults. Future studies might adopt a developmental approach, contrasting young and older adults with different educational backgrounds in an attempt to grasp potential differences in the explored dimensions. Finally, the use of online resources for collecting data may constitute a challenge in maintaining participant engagement in the study and in establishing some control of the data collection environment. Nevertheless, online data collection procedures allow researchers to overcome a set of constraints regarding the recruitment of participants and have been shown to be as reliable as data collected in lab settings (Saryazdi et al., 2018).

The current norms constitute a useful tool for researchers searching for well-characterized pictures in several dimensions, allowing the manipulation of specific dimensions while controlling others. This enables a better selection of stimuli while avoiding possible confounding effects and ultimately enhancing the quality of the experimental designs. Additionally, the RealPic application potential becomes particularly high if we consider all Portuguese-speaking communities (scattered or territorially distributed) around the world (Godinho & Garrido, 2016) and the rank of the Portuguese language as one of the most spoken languages around the world (see Reto et al., 2016). Future studies should consider the cultural and linguistic diversity of Portuguese-speaking communities in non-European Portuguese contexts (i.e., Africa, Asia or South America) as well as expand these norms for additional dimensions (e.g., age of acquisition, Johnston et al., 2010; manipulability, Brodeur et al., 2014; image agreement and/or imageability, Snodgrass & Vanderwart, 1980).

In conclusion, the RealPic dataset comprises images of meaningful stimuli commonly encountered in our daily lives. As a particular general class, common items were examined in a more integrative perspective of validating stimuli across a wide range of dimensions, emphasizing their independent and combined contributions for picture processing. The ecological concern that guided this work and its systematic procedures are likely to make RealPic a promising resource for memory, language and emotion research as well as for interventional settings (e.g., cognitive, linguistic and marketing) requiring more realistic stimuli.