Issues regarding how semantic knowledge is structured and stored in our brains are crucial to our understanding of the processes involved in language comprehension and production, as well as our understanding of semantic deficits in patients with brain damage. Seminal models of semantic organization include the hierarchical network theory (Collins & Quillian, 1969) and spreading-activation theory (Collins & Loftus, 1975), and following these, a number of theories have been advanced, including the feature-based model, to explore the specific influence of semantic features in conceptual representations (see the model of attractor-based connectionist networks; e.g., Cree, McNorgan, & McRae, 2006; McRae, 2004; McRae, de Sa, & Seidenberg, 1997). For example, several studies have investigated the role of featural representation in behavioral phenomena such as semantic priming (e.g., Cree, McRae, & McNorgan, 1999; Vigliocco, Vinson, Lewis, & Garrett, 2004), feature verification (e.g., McRae, Cree, Westmacott, & de Sa, 1999; Solomon & Barsalou, 2001), and categorization (Hampton, 1979; Smith, Shoben, & Rips, 1974).

Interestingly, this theoretical framework introduced a large number of dimensions as descriptors of semantic features, including featural distinctiveness and dominance, which are particularly important to comprehending the organization of semantic structure. Specifically, the distinctiveness value of a feature depends on the number of concepts in which it appears, and this value represents how important a feature is for identifying and distinguishing a certain concept among similar ones. This index plays critical roles in typicality (Rosch & Mervis, 1975) and similarity (Tversky, 1977) judgments, in both feature (McRae, Cree, Cho, & McNorgan, 2003) and category (Smith et al., 1974) verification, and in naming-from-definition tasks (Marques, 2005). Another measure is feature production frequency, also referred to as dominance (when calculated as the proportion of the total sample of participants), which indicates the number of participants who list a given feature for a concept on a feature-listing task and modulates participants’ performance in feature verification tasks (e.g., Ashcraft, 1978; McRae et al., 1997). This index is linked to the importance of a feature for a given concept (Smith & Medin, 1981) and traditionally reflects the accessibility of that feature when a concept is computed from the feature’s name.

Sartori and Lombardi (2004) proposed a new feature-based model of semantic memory, in which they provided a novel index of feature importance, termed semantic relevance. This measure may be construed as a nonlinear combination of the dominance and distinctiveness of a feature (Sartori, Gnoato, Mariani, Prioni, & Lombardi, 2007; Sartori, Lombardi, & Mattiuzzi, 2005). In other words, these authors computed semantic relevance as the result of two components: a local one, comparable to dominance, which reflects the importance of a property for a concept; and a global one, comparable to distinctiveness, which reflects the salience of the same feature for the whole set of concepts. According to the authors, semantic relevance captures the “core” meaning of a concept (Mechelli, Cappa, & Sartori, 2011; Sartori et al., 2007; Sartori & Lombardi, 2004) better than other featural measures, and their model “accounts for many of the known facts in semantic memory disorders and, in particular, in category-specific deficits” (Sartori & Lombardi, 2004, p. 440), suggesting that the relevance of semantic features may be an organizing principle in semantic memory. In order to test their model, Sartori and colleagues used it to investigate category-specific semantic deficits. In particular, they showed that participants’ performance in a naming-to-verbal-description task was better when semantic features with higher relevance values were presented (Mechelli et al., 2011; Sartori et al., 2007; Sartori & Lombardi, 2004), and that performance was predicted better by the semantic relevance of the features (Sartori, Lombardi, & Mattiuzzi, 2005). More importantly, they also showed that when stimuli were balanced for semantic relevance, presumed category effects were annulled at both behavioral (Sartori et al., 2007; Sartori & Lombardi, 2004) and neurophysiological (Sartori, Mameli, Polezzi, & Lombardi, 2006; Sartori, Polezzi, Mameli, & Lombardi, 2005) levels. Therefore, investigating the influence of an overarching featural dimension, such as semantic relevance, in the computation of concept meaning has been fruitful.

In line with this, and to introduce a measure that more comprehensively reflects the importance of a given feature for the conceptual representation, we proposed a new index called semantic significance, which shares the same underlying principle of semantic relevance (Montefinese, Ambrosini, Fairfield, & Mammarella, 2013). Like relevance, this measure is also composed of a local and a global component reflecting, respectively, how much a given attribute contributes to the meaning of (1) the corresponding concept and (2) all of the other concepts in the lexicon. Importantly, the local component of semantic significance is computed as the accessibility of the feature, instead of its dominance, in order to account for the order of production. This last measure accounts for both the dominance and the normalized (centile) order of production, a property of semantic features that represents the priority with which a feature emerges in the participants’ conceptual representation. Interestingly, this measure has been largely neglected in the past literature, yet it may be important in furthering the understanding of semantic memory’s organization.

Indeed, we proposed that the first features that emerge in response to a concept name are the most representative features of that concept. Accordingly, we found a significant correlation between the order and the frequency of production of a concept’s attribute: Features with high dominance were listed sooner, whereas those with low dominance were generated later (Pearson’s r = .27; see Montefinese et al., 2013). We proposed that the readiness with which participants list a feature may reflect additional information about the importance of that feature for the representation of the corresponding concept, as compared to that provided by dominance alone; in other words, features with equal values of dominance, but with different orders of production, have different weights in the representation of the concept. For example, both the features “lungo/long” and “usato per riscaldare/used to warm up” for the concept cappotto/coat have the same dominance, but the latter feature was listed sooner than the former, indicating that it was more readily accessible in semantic memory when the corresponding concept cue was presented. In addition, we suggest that the normalized order of production allows us to grasp the gradedness of the feature-based representation of a concept, whereas this is not possible if we consider the dominance index alone. In fact, since dominance is a binary measure at the individual level, all of the features listed by a given participant in response to a concept assume the same dominance value, whereas the centile order accounts for both the richness of that individual’s representation (i.e., the number of features that he or she listed) and the relative priority with which the different features emerged in response to that concept. This, in turn, yields a clearer picture of the gradedness of that concept’s representation.

Moreover, another advantage of considering both dominance and the normalized order of production in assessing the importance of a feature for the representation of a concept derives from the fact that dominance represents the proportion of participants shown to possess a given feature in their semantic representations of a given concept, representing at group level the probabilistic existence of a given feature in the representation of a concept, or the likelihood that the feature will be activated by the concept name. On the contrary, dominance does not give us any information about when this presumed activation would occur, which is not a trivial issue for most experimental questions in semantic research. Indeed, in a (theoretically) ideal experiment, if we asked a group of participants to list all of the possible features of a given concept, without posing any temporal or quantitative limits, we would end having a very large number of features with a dominance value close or equal to 1. In fact, even the less important feature for that concept would eventually be listed, by most of the participants, making dominance a poor measure of a feature’s importance/salience for a concept representation. Therefore, we hold that the dominance index included in available feature-listing norms, in which strict limits are imposed by researchers to the number of features listed by participants, is actually also influenced by the readiness with which a feature emerges in response to a concept, even though this influence is not explicitly accounted for. Consequently, we introduced the normalized order of production to account for this particular aspect and proposed the accessibility index, a composite index accounting for both the likelihood and the priority of activation of that feature, as a more effective measure of the importance of a given feature for the representation of a concept.

We sought to better describe the importance of a feature for a given concept by using semantic accessibility (in the place of dominance) to represent the local component of the semantic significance, while leaving its global component, comparable to the distinctiveness, unmodified (Montefinese et al., 2013). We posited that semantic significance adds information about the salience of a feature to distinctiveness, defined as the prominence of a feature, or how easily it “pops out” when thinking about a concept (Smith & Medin, 1981). For example, “has a very slow heart rate,” “is the only mammal that cannot jump,” and “has teeth as long as legs” are distinctive features for the concept elephant, but they are not highly evident and are not frequently listed in feature-listing tasks. Consequently, one may suppose that these features do not have a prominent role in the semantic representation of the concept elephant. On the contrary, the feature “has a trunk,” an equally distinctive feature, has a very important role in the semantic representation of the concept elephant, since almost all participants list it among the first features in a feature-listing task.

Here, we intend to validate semantic significance and to investigate the relative roles of parameters of the concepts and semantic features in the representation of concepts. Specifically, we addressed the role of conceptual and featural dimensions in predicting verification latencies in two speeded semantic tasks, a feature verification task (Exp. 1) and an analogous verification task in which the featural description was presented before the concept name (Exp. 2).

Method

Participants

A group of 64 participants (38 females, 26 males; mean age = 26.7 years, SD = 5.8 years) took part in Experiment 1, whereas another group of 64 participants (38 females, 26 males; mean age = 24.6 years, SD = 4.2 years) took part in Experiment 2. All participants had normal or corrected-to-normal visual acuity and were right-handed according to self-report. All participants were naïve to the purpose of the study and gave written informed consent prior to participation.

Materials

We selected the stimulus set and semantic measures from the database of Montefinese et al. (2013). The stimuli were identical for both experiments and consisted of 120 base-level concepts belonging to ten categories (i.e., animals, body parts, clothes, furnishings/fittings, furniture, housing buildings, kitchenware, plants, stationary, and vehicles) in order to obtain 12 exemplars for each category. Each concept was paired with two short featural descriptions. Thus, we constructed a total of 240 concept–feature pairs. From this total set, we created four lists, each containing 60 experimental concept–feature pairs in which the description was truly representative of the concept. Each concept occurred with only one feature in each list. Moreover, in order to avoid cueing responses for participants, in each list we also included 60 filler concept–feature pairs in which the feature did not reasonably describe the corresponding concept (i.e., items in which the feature was not true of a given concept). To create these trials, we combined the features of the “true” concept–feature pairs from List 1 with the concepts from List 2 that belonged to the filler concept–feature pairs (i.e., for which the features were “false”), and vice versa. The same method was used for Lists 3 and 4. A further set of 12 concept–feature pairs was selected for the training session, half of which represented “true” items and half “false” items. Concept and feature names were only presented once across the practice, filler, and experimental trials.

Since each list comprised all of the 120 concepts included in Montefinese et al.’s (2013) norms, the conceptual dimensions were automatically matched between the four lists. Specifically, the measures related to the concepts, collected by an exemplar generation task (see Exp. 2 in Montefinese et al.’s corpus, for a detailed description), were

  • Length: the number of letters composing the Italian word for a given concept;

  • Word frequency: the natural logarithm taken from the COLFIS (Corpus e Lessico di Frequenza dell’Italiano Scritto, the “Corpus and Frequency Lexicon of Written Italian”; Bertinetto et al., 2005) frequency (sum of singular and plural);

  • Familiarity (e.g., McClosky, 1980 ): mean concept familiarity, as rated by 30 additional participants using a Likert-type 10-point rating scale, with 1 corresponding to not familiar at all and 10 to extremely familiar (see Exp. 3a of Montefinese et al.’s, 2013, norms);

  • Typicality (e.g., Hampton, 2007 ): mean concept typicality, as rated by 30 additional participants using a Likert-type 10-point rating scale, with 1 corresponding to extremely atypical exemplar and 10 to extremely typical exemplar (see Exp. 3a of Montefinese et al.’s, 2013, norms);

  • Number of features (e.g., Pexman, Hargreaves, Siakaluk, Bodner, & Pope, 2008 ): the number of features listed for a concept, indicating the semantic richness of its representation;

  • Summed frequency of production (Montefinese et al., 2013 ): the summed frequency of production of all the features listed for a concept (including taxonomic features), another measure of the richness of the conceptual representation;

  • Dominance: the proportion of participants that produces a determinate concept in response to a given category label, representing the importance of a concept for the categorical representation;

  • First occurrence (e.g., Van Overschelde, Rawson, & Dunlosky, 2004 ): the proportion of participants who produced a concept as their first response to a given category label—another measure of the accessibility of a concept, given the category;

  • Lexical availability (López-Chávez & Strassburger-Frías, 1987 ): the readiness with which a concept is produced as a member of a given category; it is calculated by taking into account the concept’s position in a given category (i.e., its order of production) for each participant, its production frequency within a category (i.e., its dominance), the lowest position in which it was produced across participants, and the total number of participants.

In addition, we matched ten featural semantic/lexical variables, shown in the literature to influence (or that could logically influence) featural verification latencies, across lists [all ts(118) ≤ 1.09, all ps .27, two-tailed]. These feature-related measures, collected by a feature-listing task (see Exp. 3b in Montefinese et al.’s, 2013, semantic norms), were

  • Length: the number of letters composing the Italian word(s) for a featural description;

  • Intercorrelational strength (McRae et al., 1997 ): the mean proportion of shared variance (R 2) between that feature and each of the other features occurring in the same concept with which it significantly correlated, excluding taxonomic features; this indicates the degree to which it is correlated with the other features that occur in the same concept;

  • Correlation percentage (McRae et al., 1997 ): the percentage of features occurring in the same concept with which a given feature is correlated, excluding taxonomic features; two features are considered correlated if they tend to appear together in the representation of a given concept;

  • Cue validity (Bourne & Restle, 1959 ): the production frequency of a given feature for a given concept divided by the sum of the production frequencies of that feature for all concepts in which it appears in the norms; this measure represents the conditional probability of a concept, given a feature;

  • Distinctiveness (McRae et al., 2005 ): calculated as the inverse of the number of concepts in which the feature appears in the entire norms; this measure represents the amount of distinguishing information provided by a given feature for a concept;

  • Dominance: the proportion of participants that produce a determinate feature in response to a given concept, representing the importance or salience of a concept for the categorical representation and the importance of a feature for the conceptual representation;

  • Centile order of production (Montefinese et al., 2013 ): the normalized order of production of a feature; we computed this measure as C i,m = [1 – (O i,m – 1)/N m ], where O i,m indicates the order of production for feature i and participant m, and N m indicates the total number of features produced by that participant;

  • Accessibility (Montefinese et al., 2013 ): the ease with which a feature becomes accessible in semantic memory when the corresponding concept cue is presented; we calculated this measure as follows:

    $$ {a}_{ij}=1\hbox{--} \frac{\sqrt{{\left(1\hbox{--} {x}_{ij}\right)}^2+{\left(1\hbox{--} {y}_{ij}\right)}^2}}{\sqrt{2}}, $$

    where x ij and y ij indicate, respectively, dominance and the normalized order of production of a given feature j in a given concept i;

  • Relevance (Sartori et al., 2007 ): we computed the relevance value k ij of a feature j in a concept i as k ij = 10x ij × ln(I/I j ), where x ij indicates the dominance of a feature j in a concept i and represents the local component of k ij , I indicates the total number of concepts in our norms, I j indicates the number of concepts in which the feature j appears, and ln(I/I j ) represents the global component of k ij Footnote 1;

  • Significance (Montefinese et al., 2013 ): calculated in a similar manner as relevance, but the feature’s dominance x ij is replaced with the accessibility of that feature, a ij , described above.

The selection of stimuli was motivated by the desire to cover, as much as possible, the range of values of all of the above-mentioned featural variables while avoiding the choice of similar features, both within and across concepts. The file included in the supplementary materials shows the entire set of 240 experimental concept–feature pairs and the lexical–semantic measures.

Procedure

In Experiment 1, participants performed a speeded feature verification task in which a concept name, such as automobile/car, was presented followed by a target-featural description, such as “ha quattro ruote/has four wheels.” Participants pressed the yes key as quickly and accurately as possible if the feature was reasonably true for the concept, or the no key if it was not. For example, the correct response would be “yes” to automobile/car–“ha quattro ruote/has four wheels,” but “no” to abete/fir–“ha gambe/has legs.” A “reasonably true” rather than an “always true” criterion was necessary because, for instance, in an “always true” decision task, gonna/skirt–“è corta/is short” and coltello/knife–“è tagliente/is sharp” would require “no” responses (McRae et al., 1997). Response keys were balanced across participants.

Participants were seated 60 cm from an LCD 17-in. computer screen (1,024 × 768 pixels), with the center of the monitor aligned with their eyes. Each trial began with a prime concept presented for 500 ms in the center of the screen, followed by a blank screen with a fixation cross (the interstimulus interval) for 500 ms. After the delay, the target feature appeared in the center of the screen and remained until the participants responded or, if they failed to respond, for 4,000 ms. In total, the trial lasted 5,000 ms, and the intertrial interval was 1,500 ms (see Fig. 1). Concepts were presented in blue and features in black (24-point Verdana). Both were written in lowercase fonts on a white background. The procedure for Experiment 2 was identical to that for Experiment 1, except that the feature description was presented prior to the concept name for 2,000 ms, and participants were required to judge whether the target concept was reasonably true for the featural description.

Fig. 1
figure 1

Experimental timeline for Experiment 1, including both a “true” and a “false” trial

Lists were randomly assigned to participants, and the order of presentation of the stimulus pairs within a particular list was randomized. The verification latency was recorded as the time from the onset of the feature description until the participant’s response. Participants completed 12 practice trials and 120 experimental trials (i.e., a stimulus list, as described above). Experiments 1 and 2 lasted approximately 10 and 12 min, respectively.

Data analysis

Latency and response accuracy were recorded. All test trials with reaction times (RTs) slower than 3,000 ms (<0.5 % of the trials in both experiments) or that were related to incorrect responses (about 9.5 % of the trials in both experiments) were discarded from the analyses. We logarithmically transformed the remaining verification latencies (Ln_RTs) to satisfy assumptions of normality for the analyses.

We aimed to obtain a detailed picture of the relative influences of a number of lexical and semantic predictor variables on both our semantic verification tasks. To this end, the effect of the conceptual and featural indexes described above was assessed by conducting linear mixed-effects modeling, as implemented by the lmer function from the lme4 library (version 0.999999-0; Bates, Maechler, & Bolker, 2012) in R (version 2.15.2; R Development Core Team, 2012), an open-source language and environment for statistical computing, freely available at http://cran.r-project.org. Linear mixed-effects modeling is a relatively recent development in computational statistics, based on restricted maximum likelihood estimation that allows for conducting analysis of trial-level data by including subjects and items as crossed, independent, random effects (see Baayen, 2007; Baayen, Davidson, & Bates, 2008). This approach has several advantages over traditional general linear model analyses (such as repeated measures analysis of variance and multiple regressions) that made it particularly appropriate for the present data. First, unlike general linear models, mixed-effects models are very robust with respect to missing data (which, in our case, account for approximately 10 % of the total data). Moreover, because mixed-effects models do not require prior averaging across participants, as by-items multiple regression models do, they offer the possibility of preserving and taking into account any variability across individuals, thus increasing the accuracy and generalizability of the parameter estimate. In other words, they allow for a richer modeling of random effects (i.e., the variance, or standard deviation, associated with variables such as participants or items, which levels represent a nonreproducible sample of a population), permitting a better evaluation of the effects of the predictors (i.e., fixed effects) and better protection against capitalization on chance, or Type I error, especially for data sets with significant by-item random-effects structures (Baayen et al., 2008; Quené & van der Bergh, 2008).

We started by fitting the simpler model of our data—that is, an empty model that included, in the random part, an estimation of the variance component attributed to participants (i.e., the random effect of subjects, which is read as a random effect introducing adjustment to the intercept grouped by participants) and in the fixed part did not contain any explanatory variables except the grand mean or intercept. We then evaluated whether the inclusion of additional random- or fixed-effect parameters was justified, using a log-likelihood ratio test to compare progressively more complex models with simpler models (Baayen, 2007). For fixed effects, we report the estimated coefficient (b), standard error (SE), and t values for each variable included in the final model. In addition, we report the p values (p MCMC) and upper and lower highest-posterior-density (HPD95) intervals estimated on the basis of the posterior distribution of the corresponding parameters, obtained through Markov chain Monte Carlo (MCMC) sampling (10,000 samples), supported by the pvals.fnc function of the languageR package (version 1.4; Baayen, 2011; see also Baayen et al., 2008). Note that HPD95 intervals can be taken as traditional 95 % confidence intervals.

Regarding the selection of variables, we chose to examine the influence of as many lexical and semantic variables as possible on speeded semantic tasks, in order to avoid any possible bias in predictor selection due to our hypotheses. Indeed, we included many lexical/semantic indexes collected in our recent norms (Montefinese et al., 2013) as potential predictors, and most of the indexes that have been shown in the literature to influence (or that could logically influence) featural and conceptual verification latencies (e.g., Cree et al., 2006; McRae et al., 1999). This choice, however, came at the price of a high multicollinearity between our predictors, as was evidenced by a condition number κ of 257, calculated with the collin.fnc function of the languageR package (Baayen, 2007). This unsafely high multicollinearity was due to different reasons. First, some of the variables may measure the same semantic dimension—as, for example, feature distinctiveness and cue validity, which both index feature informativeness with regard to specific concepts (McRae et al., 2005)—or may be closely related dimensions—as in the case of feature intercorrelational strength, which can be considered as the opposite end of the spectrum from feature informativeness (Cree et al., 2006). Moreover, it should be noted that some of our variables were higher-order transformations of others, as in the case of accessibility, relevance, and significance, which were computed as nonlinear combinations of the dominance and other indexes. As a result, many of the 19 potential predictors were significantly intercorrelated, with high correlations between the mathematically related variables, as is shown by the intercorrelation matrix (Table 1).

Table 1 Mean values, dispersion values (SD) and correlation matrix (Spearman’s ρ coefficients) for the 19 lexical and semantic predictors

However, we chose not to restrict the set of predictors a priori by making a (somewhat arbitrary) decision as to which measures to include among the different correlated variables measuring the same dimension. Instead, we employed a different approach in order to mitigate the collinearity. First we examined the correlational structure of our predictors in a search of meaningful groups, or clusters, of correlated features in more detail, and then iteratively tested each of the variables composing a given cluster in search of the predictor to be inserted in the model that better predicted the participants’ verification latencies. We thus performed a hierarchical cluster analysis by using the varclus function provided in the Hmisc package (version 3.10-1; Harrell & Dupont, 2012) in R. We made use of the squared Spearman’s rank correlation as a distance metric because it works better with nonsymmetrical vectors, as in our case, and permits one to obtain a more robust insight into the correlational structure of (possibly nonlinear) predictors (Baayen, 2007).

The hierarchical cluster analysis revealed five clusters of strongly correlated predictors. As is shown in Fig. 2, the three leftmost clusters, which are further grouped into a supercluster that brings all of the nine conceptual indexes together, respectively group the two measures of conceptual representation richness, the five measures of the importance of a concept for the categorical representation (including its familiarity), and the two lexical measures. The two rightmost clusters respectively isolate the four measures of feature informativeness and the five measures of feature importance for the conceptual representation, with the one featural lexical measure remaining alone. This correlational structure of the predictors guided us in the creation of a model that better explained the verification latencies of our participants.

Fig. 2
figure 2

Hierarchical clustering of the 19 predictors, using the square of Spearman’s rank correlation as a distance metric

Results

Experiment 1

We started the linear mixed-effects analysis by modeling the random part of the model. First, we tested whether the inclusion of either the by-features or the by-concepts random intercept was justified. The likelihood ratio test revealed the significance of the random effect of features [χ 2(1) = 301.28, p < .0001], whereas the inclusion of the parameter for the random effect of concepts was not justified [χ 2(1) = 0.49, p > .48]. Next, we examined a control variable for the potential longitudinal effects of fatigue or familiarization across participants. We first centered the rank of a trial in the experimental list (i.e., the trial vector) by subtracting its mean from each value in order to remove the (possible) spurious correlation between the by-subjects random intercepts and slopes. We then tested three models in which we included a parameter for the fixed effect of trial (cTrial), and then the corresponding by-subjects random slope, either adding or excluding the parameter for the correlation. The fixed effect of trial was significant [χ 2(1) = 26.41, p < .0001], as well as the by-subjects random effect of trial [χ 2(1) = 72.76, p < .0001], but the same was not true for the correlation parameter, which was thus superfluous [χ 2(1) = 1.31, p = .25].

After we were confident that we had a well-fitting model for the random part, we turned to modeling the fixed part. We began by exploring the effect of the lexical indexes on participants’ performance. To this end, first we included the only lexical variable related to features—that is, the number of letters composing the word(s) for the featural descriptions (Fleng) in the model—in order to estimate its predictable effect on basic linguistic processing during the feature verification task. As expected, the effect of Fleng was significant [χ 2(1) = 28.88, p < .0001], showing that longer featural descriptions were verified more slowly than shorter ones. We also assessed the effects of the two lexical measures related to the concept that grouped together in the hierarchical clustering. The likelihood ratio test revealed that the inclusion of the COLFIS frequency of the word denoting the concept (Cfr) significantly improved the model fit [χ 2(1) = 5.61, p = .018], suggesting that the higher the frequency of a concept word was, the faster the verification latencies were.

Next, we focused our attention on the remaining variables related to the concept, in order to exclude any possible effect of conceptual processing on feature verification latencies. We tested the two variables composing the conceptual richness cluster, the effect of which was not significant [both χ 2s(1) < 0.52, ps > .47], and the five indexes of the conceptual importance for the categorical representation that clustered together. Also in this case, the inclusion of none of these variables had a significant impact on the model fit [all χ 2s(1) < 1.49, ps > .22]. Therefore, no further conceptual predictors were included in the model.

Finally, we studied the variables of primary theoretical interest, by investigating whether any of the featural semantic indexes had a significant effect on predicting participants’ feature verification latencies. With regard to the variables constituting the feature informativeness cluster, the likelihood ratio test revealed that the inclusion of either distinctiveness or cue validity led to a significant improvement of the model fit [both χ 2s(1) < 9.17, ps ≤ .002]. In addition, adding either of these two distinctiveness measures eliminated significant effects of the COLFIS frequency of the concept word (p MCMC ≥ .065). Therefore, we eliminated the Cfr predictor from the model and added the term for the main fixed effect of cue validity in its place, because its log-likelihood value was smaller than that for distinctiveness (−254.43 vs. −257.44).

The final step of the model-building process pertained to the evaluation of the effect of the five semantic indexes related to the importance of a feature for the representation of the corresponding concept, in order to test our hypothesis about the influence of feature significance on verification latencies. The inclusion of each of these five predictors significantly improved the model fit [all χ 2s(1) > 30.46, ps < .0001], with significance showing the smallest log-likelihood value (−212.88), followed by accessibility, relevance, dominance, and centile order of production (−215.56, −216.31, −217.75, and −239.19, respectively). We chose to add semantic relevance to the model and then to test whether incorporating significance as an added predictor significantly improved the model fit, and vice versa (in other words, we assessed the partial effect of significance against that of relevance). We did this because our aim was to validate our new measure of featural importance, semantic significance, and to test whether it might predict verification latencies better than other indexes of feature “importance” because it takes into account the order of production of a feature, apart from its dominance, in indexing how much a given attribute contributes to the meaning of the corresponding concept. The likelihood ratio test revealed that the inclusion of significance led to a significant improvement of the model fit [χ 2(1) = 7.36, p = .007], whereas the inclusion of relevance did not [χ 2(1) = 0.52, p = .007], thus showing that significance added a significant contribution over and above that provided by relevance. Therefore, we included the term for the fixed main effect of significance and left out that for relevance. We also removed the cue validity term from the model specification, as the addition of significance caused its effect to be no longer significant (note that the portion of variance shared by these two variables was now accounted for by significance alone). In addition, we explored whether it was possible to add any of the remaining measures to the present model, finding that the inclusion of either the correlation percentage (Pcorr) or the intercorrelational strength had a significant impact on the model fit [both χ 2s(1) > 3.91, ps < .048], showing that strongly correlated features were verified faster than features with low correlation values. Therefore, we added the former predictor (which had a smaller log-likelihood value) to the model.

To sum up, the final model included, in the random part, four parameters—for the random effects of subjects and features, the by-subjects random slopes for trial, and the residual error—and, in the fixed part, five parameters—for the fixed effects of intercept, cTrial, Fleng, significance, and Pcorr. At this point, we performed a residual analysis to identify possible outliers that could distort the results by “pulling” the regression line toward them. Moreover, outlier observations are problematic because they may not reflect the ability under examination, since very long verification latencies are likely to be co-determined by later processes of no interest. We detected 172 outliers (corresponding to 4.95 % of the total cases) as observations for which the standardized residual exceeded the value of ±2. We then refit the final model after excluding these outliers, and found that the same predictors remained significant. The resulting trimmed model was as follows: Ln_RTs = 6.848 − .0007 × cTrial − .0094 × Sig + .0136 × Fleng − .0465 × Pcorr.

Table 2 shows the summary of the trimmed model and lists the parameters for both the fixed and random effects—that is, respectively, the estimated coefficients and the standard deviations (note that the fixed main effect of significance had a higher t value than did the other lexical-semantic predictors). The proportion of variance (R 2) accounted for by the final trimmed model was 61.38 %, and the collinearity of the predictors entered into the final model was not harmful at all, as is shown by a condition number κ of 8.44 (Baayen, 2007).Footnote 2

Table 2 Estimated parameters and statistics of mixed-effects modeling of data from Experiment 1

Experiment 2

We performed a second experiment to control whether the predictive power of semantic variables on the participants’ verification latencies could be influenced by the order of presentation of the concept and feature. In fact, several studies, which used semantic verification tasks, showed that the effects of some semantic variables were limited to a specific presentation order (e.g., category dominance effect was limited to the instance-category condition; Casey, 1992; Larochelle & Pineau, 1994). To verify this hypothesis, we asked a different sample of 64 participants to perform a feature verification task in which the feature was presented prior to the concept target. We expected significance to be confirmed as a reliable predictor of the verification latencies in this case as well, since, by virtue of its global component, it represents how informative a given feature is in recognizing, or how effective it is in activating, the corresponding concept.

We assessed the effect of the conceptual and featural indexes on verification latencies by conducting linear mixed-effects modeling, as in Experiment 1. Here we briefly report the model specification obtained at each step. Regarding the random part of the model, the same structure of parameters from Experiment 1 was achieved, accounting both for the by-subjects and by-features adjustment to the intercept and for the by-subjects random slopes for cTrial. In this case, however, the estimate of the fixed effect of cTrial was not significant. However, we chose to leave it in the model due to its inclusion for the corresponding random slopes in the random part (Baayen, 2007). The resulting model provided a much better fit than did the empty model [χ 2(3) > 542.05, p < .0001].

As in Experiment 1, we then explored the predictable effect of basic linguistic processing on participants’ performance, but now we focused on the two lexical measures related to the concept, since the task required participants to verify the concept as soon as possible after having read the word denoting it. The effect of the COLFIS frequency of the concept word (Cfr) was stronger than that of word length (log-likelihood = −842.07 vs. −856.67, respectively), so we added a term for the former fixed effect to the model, accounting for lower RTs in verifying concepts with higher word frequency. Next, we investigated the effects of the remaining conceptual variables. The inclusion of neither of the two variables composing the conceptual richness cluster was justified [both χ 2s(1) ≤ 1.39, ps > .23], and the same was true for the five indexes of conceptual importance for the categorical representation that clustered together, albeit the effect of concept familiarity was marginally significant [χ 2(1) = 3.678, p = .055]. Therefore, no further conceptual predictors were included in the model.

Regarding the featural semantic variables—and, in particular, the indexes composing the feature informativeness cluster—the inclusion of distinctiveness, cue validity, or correlation percentage led to a significant improvement of the model fit [all χ 2s(1) < 5.77, ps ≤ .016]. We chose to add the term for the main fixed effect of cue validity because of its smaller log-likelihood value (−819.1 vs. −824.05 and −839.19 for distinctiveness and correlation percentage, respectively). However, when we investigated the effects of the semantic indexes of primary theoretical interest—that is, those related to the importance of a feature for the representation of the corresponding concept—the cue validity effect was no longer significant (p MCMC = .1) when we added the significance term, which had a larger effect on the model fit [χ 2(1) = 67.58, p < .0001]. Therefore, we removed the cue validity term from the model, and then, as in Experiment 1, we assessed the partial effect of significance against that of relevance. Again, the likelihood ratio test revealed that significance added a significant contribution to the model fit over and above that provided by relevance [χ 2(1) = 11.62, p < .001], whereas the opposite was not true [χ 2(1) = 0.35, p > .55]. Consequently, we added to the model only the term for the significance fixed effect. Finally, we explored the possibility of including any of the remaining semantic measures to the present model. Indeed, the addition of the term for the fixed effect of concept familiarity (Cfam), which in a previous step was nonsignificant, was now justified [χ 2(1) > 3.91, p < .048], suggesting that the more familiar a concept was, the faster was its verification process.

To sum up, the final model included, in the random part, four parameters—for the random effects of subjects and features, the by-subjects random slopes for trial, and the residual error—replicating the results of Experiment 1. In the fixed part, five parameters were included—for the fixed effects of intercept, cTrial, Cfr, Cfam, and significance. At this point, we performed a residual analysis to identify possible outliers. We detected 171 outliers (4.92 % of the total cases), for which the standardized residual exceeded the value of ±2. We then refit the final model after excluding these outliers. The results showed that the same predictors remained significant. The resulting trimmed model was as follows: Ln_RTs = 7.1805 − .0002 × cTrial − .0125 × Sig – .0251 × Cfr − .0130 × Cfam. Table 3 shows the summary of the trimmed model, listing the parameters for both the fixed and the random effects—that is, respectively, the estimated coefficients and the standard deviations (note that the fixed main effect of significance had a higher t value than did the other lexical-semantic predictors). The proportion of variance (R 2) accounted for by the final trimmed model was 64.25 %, and the collinearity of the predictors entered into the final model was moderate (κ = 14.46; see Baayen, 2007), probably due to the correlation between concept familiarity and word frequency (see Table 1).

Table 3 Estimated parameters and statistics of mixed-effects modeling of data from Experiment 2

By comparing the final model, explaining feature verification latencies when a featural description preceded the concept name, with that obtained in Experiment 1, in which featural description followed the concept name, the concept-level lexical index Cfr replaced the feature-level lexical index Fleng, and concept familiarity replaced the feature correlation percentage. Significance, however, was entered as a significant predictor of both concept and feature verification latencies.

Discussion

Concepts are described by a number of features that have different degrees of importance in conceptual representations, and thus contribute with different weights to the representation of their meaning. On a more general level, our purpose was to investigate the predictive value of different featural and conceptual dimensions on participants’ performance in two speeded feature verification tasks, as well as the influence of term order. Both the experiments yielded very sensible patterns of results, highlighting the significant effect of semantic significance, as well as other variables that have been shown in the previous literature to reliably influence participants’ performance in semantic processing.

Regarding feature length, we expected this measure to be a reliable predictor of participants’ response latencies in Experiment 1, because the speeded feature verification task that we used required, first of all, that the featural description be read and understood before verifying it as a true characteristic of the corresponding concept. These basic cognitive processes, such as reading, are linearly related to the number of letters that composed the target description: It takes more time to read and process a long word or sentence. Therefore, the influence of feature length on verification performance would not concern the verification processes per se, but rather simple, and necessary, reading processes. Moreover, the hierarchical cluster analysis revealed that feature length was the only significant nonsemantic predictor of participants’ performance that was also independent from all other parameters, reflecting the relative independence of reading processes. On the contrary, as expected, this lexical measure did not reliably predict verification latencies in Experiment 2. Indeed, in this case the verification processes, which were prompted by the presentation of the concept name, occurred after the basic cognitive processes that are influenced by feature length were terminated, because the featural description was presented for an interval that was long enough (2,000 ms) for these processes to be completed.

A further variable that emerged from the analysis in Experiment 1 was the correlation percentage. This finding confirms the important role played by correlated features in the organization of semantic memory (McRae et al., 1999; McRae et al., 1997; Taylor, Moss, Randall, & Tyler, 2004). In particular, McRae et al. (1997) found that feature correlation influences the early computation of word meaning, since features that were correlated with a number of others were activated more quickly, in line with attractor network dynamics, but only in online tasks (e.g., semantic priming and speeded feature verification task) similar to those we used in the present study, because these tasks do not require high-level reasoning processes. In addition, neuropsychological studies have suggested that feature correlations are important for explaining the semantic impairments due to progressive diseases such as Alzheimer’s dementia (e.g., Devlin, Gonnerman, Andersen, & Seidenberg, 1998; Gonnerman, Andersen, Devlin, Kempler, & Seidenberg, 1997; Tyler & Moss, 2001; Zannino, Perri, Pasqualetti, Caltagirone, & Carlesimo, 2006). Correlation is considered a crucial variable because correlated features are more resistant to diffuse, but mild, neurological damage, due to the collateral activation that the cluster of intercorrelated features each receive from each other.

With regard to Experiment 2, the analysis revealed that word frequency significantly predicted participants’ verification latencies. This reflects the frequency with which words are used (Forster & Chambers, 1973; Whaley, 1978), and past research has shown that it correlates with both concept familiarity and word length: Lower-frequency words tend to be less familiar and longer (Graves, Grabowski, Mehta, & Gordon, 2007), as was also confirmed in our norms (Montefinese et al., 2013) and in the present set of stimuli (Spearman’s ρs = .53 and −.39, respectively). The influence of word frequency in explaining participants’ performance could be due to the fact that it modulates the ease with which the meaning of a word can be computed (Cree & McRae, 2003), as well as how quickly and easily that word can be recognized and produced (Duncan, 1977; Forster & Chambers, 1973; McRae, Jared, & Seidenberg, 1990).

Moreover, it is not surprising that the familiarity measure also predicted verification latency, because familiarity effects are ubiquitous in word recognition. This result was expected, since the familiarity variable facilitates semantic access to the concept. In fact, familiarity represents the degree to which people have heard, read, or otherwise been exposed to a given concept during their lives (e.g., Funnell, 1995; Marques, 2007). Consequently, we can postulate that the more familiar a word is, the stronger and more direct is the access to this word. Although familiarity is most often considered a nuisance variable, several studies have highlighted its influence as a statistical variable in tasks requiring word processing (e.g., Gernsbacher, 1984)—as, for example, in feature verification tasks (McRae et al., 1999; McRae et al., 1997). Similar results have also been found in past studies about the effect of this variable on category verification latencies (e.g., McCloskey, 1980; Smith et al., 1974), and more recently, other studies have revealed a familiarity effect on typicality ratings (e.g., Barsalou, 1985; Hampton & Gardiner, 1983) and lexical decision (e.g., Balota & Chumbley, 1984; Gernsbacher, 1984).

Apart from these considerable results, however, the main purpose of our study was to validate a new measure of featural importance, which we call semantic significance, and to assess its relative effect on feature verification processes in healthy young participants, as compared to other indexes of feature “salience,” and in particular to semantic relevance, with which significance shares the same underlying principle. In Experiment 1, in which the concept name was presented prior to the featural description, verification latencies were faster for features with high significance than for features with low significance, suggesting that the former are activated more strongly and more quickly in the time course of the computation of word meaning. In Experiment 2, features with high significance activated a concept more strongly than did features with low significance, suggesting that they are better cues for a concept. Moreover, in both experiments, significance fit the experimental data better than did the other semantic variables. Significance also added a significant contribution to the model fit, over and above that provided by relevance. Together, these findings suggest that this new semantic dimension may capture more information about the weight of a given feature in the representation of a concept than do other indexes of property and concept salience. Moreover, our results provide the first evidence that semantic features with high significance values hold a privileged status in the organization of semantic memory and in the computation of word meaning. We propose that semantic features have an associated significance weight that represents the level of salience of a certain semantic feature for a given concept.

It is important to note that semantic significance derives from semantic relevance, as defined by Sartori and Lombardi (2004). That is, it is a measure of the contribution of semantic features to the “core” meaning of a concept, including those features that enable us to identify a concept and to discriminate it from other, similar concepts. This index is the result of two components—that is, dominance and distinctiveness. Both of these indexes have proven to be very important to our semantic representation, particularly in neuropsychological studies. Indeed, it has been proposed that features with high dominance are resilient to neurological damage, because they can be accessed even when the semantic system’s resources decline (Zannino, Perri, Pasqualetti, Di Paola, et al., 2006). On the other hand, several studies have shown that distinctive features are more vulnerable in the initial stages of Alzheimer’s disease because they lack the “collateral support” provided by correlated features (e.g., Laisney et al., 2009; Perri, Zannino, Caltagirone, & Carlesimo, 2011). Relevance is more informative than both dominance and distinctiveness taken singularly, because relevance is composed of both these measures. Moreover, distinctiveness is a dimension that is not concept-dependent, whereas the relevance of a given semantic feature varies across different concepts, and in a way, may be considered concept-dependent (Sartori et al., 2007; Sartori & Lombardi, 2004).

Here, we propose that semantic significance captures more featural salience than does relevance, because it is computed by taking into account the accessibility of a feature, instead of its dominance. The novelty of accessibility is that it reflects the weight of both dominance and production order—the latter of which is important to understanding the structure of semantic memory, because it reflects the access to semantic memory. In particular, a feature listed earlier in a feature-listing task has easier access than does a feature listed later (Montefinese et al., 2013). The results of Experiment 1 are in line with our hypothesis, showing that features with high significance are activated more strongly and more quickly during the time course of the computation of word meaning. Importantly, the fact that significance added a significant contribution to model fit, over and above that provided by relevance, provides the first evidence of the importance of production order in assessing the salience of a feature for conceptual representations. Previous studies have not taken production order into account. This is because order information has typically been discarded during the data entry phase, rather than retained, in previously published corpora. Consequently, the importance of this variable has been relatively disregarded in the past literature. In fact, the order of production has been considered in studies on category representation in which participants were asked to produce as many instances as possible of a given category in a specific interval (Izura, Hernández-Muñoz, & Ellis, 2005; Van Overshelde et al., 2004) and in association studies such as that of De Deyne and Storms (2008), who investigated differences in early and late responses in word association tasks, proposing that information about production order might give additional insight into the conceptual structure and the processes driving the generation of word associates, but to the best of our knowledge, no study has investigated the order of production on conceptual representation in studies in which participants have performed a feature-listing task. Most interestingly, De Deyne and Storms’s results showed that the first associate of a given word is automatically activated and comprises linguistically grounded information (see the LASS model of Barsalou, Santos, Simmons, & Wilson, 2008), whereas the second- and third-associate responses reflect different information. According to the authors, the second- and third-associate responses are less available, require different processing than first associates, and presumably are generated later because of their semantic role (De Deyne & Storms, 2008). Consequently, if we consider the feature listing as a type of controlled word association task wherein the words are restricted to the features (Cree et al., 2006), we could presume that the order of production would also capture the lexical association link between concept and feature.

It is important here to note that our findings revealed that significance was a good predictor regardless of the order of terms, and thus also represents how informative a feature is for a given concept. Indeed, semantic features may trigger concept retrieval due to their degree of informativeness for the target concept, rather than because of their content: A concept may have uncountable semantic features, although those that are useful in distinguishing it from other, closely related concepts may not be as numerous. The results of Experiment 2, in which featural descriptions were presented prior to the concept name, showed that a feature with high significance is able to either activate or evoke a concept’s representation more strongly than a feature with low significance. Therefore, it is a better cue for the corresponding concept because the global component of the significance (comparable to the distinctiveness of that feature) measures how informative a feature is in identifying the corresponding concept. These results are consistent with those of Cree et al. (2006), in which the distinctiveness effect emerged on verification latencies regardless of the order of concept and feature. Accordingly, by adding information about feature production order to its dominance and distinctiveness, we aimed to create a featural measures that was more comprehensive than distinctiveness and cue validity, which are related but not identical to feature salience, since they directly measure how informative a feature is of a concept, but not how prominent it is (Cree et al., 2006). Therefore, these variables alone are not able to capture the total salience of a given feature for the conceptual representation, whereas semantic significance may represent a more informative measure of the importance of a feature in the conceptual representation.

Conclusion

This present study extends our knowledge about the parameters related to concepts and features in the computation of a concept. In particular, we introduced a new featural measure aimed at capturing the meaning of a given concept and advancing understanding of semantic memory models in which concepts are represented by semantic features with associated significance values. It would be interesting to extend these findings to different populations, in order to investigate the predictive power of semantic significance in Alzheimer and herpes simplex encephalitis patients’ performance. In this way, it would be possible to further understand the differences between living and nonliving domains and to verify whether selective impairments may be ameliorated when semantic relevance is matched across categories and feature types. Moreover, semantic significance may help us organize and understand several findings and may guide additional research. For instance, it may represent a factor that influences the likelihood that memory will be accurate or distorted, and studies showing that people falsely recall or falsely recognize semantic associates of presented words may, indeed, represent an area of memory research in which significance could be implemented.

In conclusion, the present results suggest that semantic significance may have straightforward implications for feature-based models of semantic memory, because it introduces an important additional factor for understanding conceptual representation. In fact, future research could take into consideration the role of semantic significance in verification latency performance, in order to better clarify the contribution of a feature to a given concept.