1 Introduction

In this paper, I argue that both language acquisition and cultural and social factors contribute to the formation of schemata that facilitate false belief reasoning. For a start, I show that other accounts addressing this issue present some shortcomings. Analyzing the existing proposals, I first draw a distinction between “structure-oriented” views (e.g. de Villiers and de Villiers (2009), de Villiers (2005, 2007)), stressing the role of language as a set of rules providing a representational format, and “cultural/social-oriented views”, that stress the role of social interaction (e.g. Nelson 2005; Hutto 2008b). As I show, while “structure oriented” proposals underestimate the role of cultural factors and fall into a number of theoretical difficulties, “cultural/social oriented” views fail to make specific predictions and to give a concrete idea of how to be included in a larger view of mentalizing, while also not accounting for some relevant data. As an alternative to both these groups of views, I present my own account of the role of language acquisition in aiding false belief reasoning, arguing that existing empirical literature and data from cross-cultural studies support a pivotal role of language acquisition in the formation of schemata that pre-schoolers use to make sense of folk-psychology explanations. My proposal includes an hypothesis about the specific learning mechanism tapping into specific cognitive abilities, suggesting a specific role for linguistic input embedded in cultural contexts in the development of false belief skills.

The debate about folk-psychology and propositional attitude attribution has been fueled by many different issues since the opening of the debate with the classic false belief task problem (Wimmer and Perner 1983; Baron-Cohen 1985). False belief understanding has been the center of attention: children up to the age of 4 famously have difficulties in attributing false belief to another agent, as it is proven by the standard versions of the false belief task. In a popular variation, Sally changes the location of a marble from one box to the other when Anne is not present. Upon the return of Anne, the child is asked where she will look for her marble. Typically, 4-year-olds answer correctly, while 3-year-olds predict that Anne will look for the marble in the new location, suggesting they do not attribute lack of knowledge or false belief to the actor. While the results have been replicated and presented in several variants of the task (among many Perner et al. 1987; Gopnik and Astington 1988; Flavell 1993; Byom and Mutlu 2013; Clements and Perner 1994), implicit tasks, where the child does not have to verbally predict behavior, have brought different results (Onishi and Baillargeon 2005; Scott et al. 2011; Scott and Baillargeon 2009; Surian et al. 2007). In this kind of task, the paradigm is often that of preferential looking or eye-tracking: children’s eye movements are tracked to show that, despite their false verbal predictions, they actually expect the agent to look for the object in the wrong location, since that is what their fixation times seem to suggest (although, see Mukanata 2000; Poulin-Dubois and Yott 2017).

While the classic debate has focused extensively on whether the development of mentalizing skills should be conceived of as theory-based or simulation based (Gopnik 2001; Gopnik and Astington 1988; Gopnik 1993; Gopnik et al. 1999; Heal 1986; Gordon 1986, 1992, 2005; Goldman 2006; Nichols and Stich 2000, 2003), the focus has slowly shifted on several central issues that put the nature of folk-psychology itself under the spotlight. On the one hand, the question regarding which kind of representations are involved in false belief reasoning has been addressed (Zaitchik 1990; Leekam and Sewell 2008; Cohen 2015; Perner et al. 1998; Leekam and Perner 1991; Aichhorn 2009; Perner and Leekam 2008): in this case, the debate revolves around whether or not the representations involved in mentalizing are domain specific. On the other hand, a lot of literature has focused on the relation between what is called implicit mindreading and explicit mindreading, asking the following questions: do the above results invalidate the classical conclusion, that 3-year-olds do not have a “theory of mind”, or do they suggest that mentalizing abilities are multi-faceted and take time to develop? What is the relationship between implicit skills and explicit skills? A recent aspect of the debate has concerned how to reconcile the results of the implicit and explicit tasks, whether or not the results of the two should be reconciled at all, and whether or not the results of the implicit tasks are valid (Hutto 2008a; Apperly 2010; Scott and Baillargeon 2009; Poulin-Dubois et al. 2018; Baillargeon et al. 2018; Onishi and Baillargeon 2005).

What has been relatively overlooked in this realm is the role of language. While some classical theories on both the theory-theory (Gopnik et al. 1999) and on the simulationist side (Gordon 2007) have attributed some role to linguistic information, this has not been the center of attention. If on the one hand language is considered to be mostly just part of the “evidence” that the child can use to build their own theory (Gopnik and Astington 1988; Gopnik 2001), and on the other hand Gordon’s view is bound to the assumption that private speech is what allows attribution to mental states to one self (Gordon 2007),Footnote 1 I am concerned with how language acquisition influences false belief attribution in third-person attribution and practice, and with how the acquisition of specific linguistic abilities influence false belief task performance. My proposal is concerned with how language provides tools to deal with information involved in false belief-like mentalizing tasks.

In the first part of the paper, I highlight some relevant empirical findings that point to a specific role of linguistic abilities in aiding false belief reasoning and understanding. I classify the available accounts as either structure-oriented or socio/culturally oriented views and synthesize them. In particular, I highlight some problems with one of the main hypothesis about the role of language in mentalizing, the “syntactic bootstrapping hypothesis” (de Villiers 2005, 2018; de Villiers and de Villiers 2012, 2009), and shift the focus on the need to consider language and folk-psychology as related social practices. In part 4 of the paper, I describe the views that stress the role of socio-cultural factors. Using these accounts as a starting point, I propose a mechanism according to which syntactically-structured input helps the child to form abstract schemata that are embedded in cultural practices and help the learner to engage in false belief reasoning effectively. In contrast with the socio-cultural views already available, my account explicitly addresses the data related to sentential complement structures, proposes a specific learning mechanism, and provides a developmental picture that connects pre-linguistic cognitive skills with the role of language.

2 The role of language in folk-psychology

2.1 General linguistic abilities

To start, let us briefly review the available evidence for a role of linguistic abilities in false belief reasoning. The meta-analysis conducted by Milligan (2007) analyzes more than one hundred studies on the relationship between language and false belief understanding, specifying that language is investigated as a complex intertwined set of skills rather than a monolithic ability. The analysis takes into consideration three fundamental factors: the type of language ability involved, the kind of false belief task, and the direction of the effect, testing for general language ability, semantics, receptive vocabulary, syntax and memory for complements. The results suggest a complex relation between language and false-belief task performance. The overall relation shows that language abilities account for 18% of the variability in false belief task performance across all types (10% controlling age). Moreover, receptive vocabulary was found to account for 12% of the variance, semantics accounting for 23%, general language for 27%, syntax for 29%, and memory for complements for 44%. Importantly, the effect was found to be bi-directional but more important for the language-false belief task aspect, as language abilities were a stronger predictor of the performance in FBT than the opposite. The results by Milligan (2007) confirmed those of other studies, like the previous meta-analysis conducted by Wellman et al. (2001) and the results in Astington and Jenkins (1999), where a longitudinal analysis revealed an effect of language abilities on false belief performance but not the opposite.

More studies suggest that general language ability has an impact on FBT. In a series of studies (Ruffman et al. 2002, 2003), four age groups were tested for a variety of language skills and a variety of false belief tasks. Language ability was related to belief understanding over a 2.5 year period, with semantic abilities being a good predictor of variance in belief task, and with syntax alone not explaining unique variance for any of the tasks. Similar results were presented in a later study by Slade and Ruffman (2010) that used different tests for language comprehension and working memory, confirming a relation between language abilities and mentalizing performance.

Evidence also comes from clinical data, with children on the ASD spectrum naturally being at the center of attention, given their impairment in false belief reasoning (Baron-Cohen 1991; Happé 1995; Leekam and Perner 1991; Peterson and Siegal 1998; Leslie 1992). Interestingly, communication impairment is one of the core diagnostic criteria for autism (Vicker 2009; Mody and Bellievau 2013), and a relation between vocabulary measures and performance in false belief task was found in both typically developing and autistic children (Happé 1995; Tager-Flusberg and Sullivan 1994; Dahlgren and Trillingsgaard 1996; Sparrewohn and Howie 1995). Even more strikingly, children with Specific Language Impairment (SLI) were found to be impaired in visual perspective tasks, often called VPT2 (second level visual perspective taking), where the child has to guess how another agent sees an object from another perspective (Farrant et al. 2006).

Perhaps the most impressive data in favor of an active role of language acquisition in false belief reasoning development comes from the literature on deafness, with the specific case of deaf children with hearing parents, who learn sign language later and are exposed to rich linguistic input considerably later than typically developing children. Late signing children are known to have delayed false belief reasoning skills (Peterson and Siegal 1999, 1998, 2002; Russell et al. 1998), and this seems to be true in a variety of populations (Remmel et al. 1998; Steeds et al. 1997; Deleau 1996). In a study by Schick et al. (2007), late signing children were delayed in both verbal and non-verbal false belief task performance.

2.2 Syntax

A variety of studies investigated the specific role of syntax. In de Villiers and Pyers (2002), a longitudinal study is reported, exploring the relation between syntactic abilities and false belief reasoning. The experiment focuses on sentential complements: mental state verbs, as other verbs (e.g. communicative verbs like say) can have sentences as complements, as in (1):

figure a

where (2) is an embedded full sentence.

In de Villiers and Pyers (2002), children had to perform a memory for complements task, a location-change task and an unexpected-content task. In the memory for complements task, children had to report on what characters had said, after listening to stories that contained embedded contents. In this case, communication verbs were used. Crucially, a correlation between performance in the memory for complements task and the performance in standard FBT was found. Once again, the direction of the effect was crucial, since, while performance in the false belief task was predicted by performance in the memory for complements task, the opposite did not hold. Moreover, controlling for FBT performance three months later gave similar results, showing the reliability of the effect.

Interestingly, memory of sentential complements with communication verbs was the most predictive variable for FBT performance, suggesting that the results were not due to a semantic component related to the verbs know and think.

Similarly, Hale and Tager-Flusberg (2003) trained children in the use of sentential complements and compared their performance in the FBT before and after the training. Results showed that training on sentential complements and on the FBT, but not on relative clauses, had a positive effect on FBT performance. In Lohmann and Tomasello (2003), children were familiarized with objects which were deceptive, as they appeared to be one thing but were instead something else (e.g. a pen that looked like a flower). In the full training condition, children were trained with sentential complement constructions with mental state verbs and communication verbs, and with noticing deceptive characteristics of objects; in the discourse only training, no sentential complements were used, but deceptive characteristics of objects were highlighted; in the sentential complement only training, sentential complements were used but no deceptive characteristic was highlighted; finally, no verbal descriptions were used in the no-language training group, where the experimenter would only make comments like “Look!”, “Oh!” and so on. After and before training, children were tested in sentential complement comprehension and the FBT. All groups except for the no-language one increased their performance in the FBT, with the full training group outperforming every other group, and the sentential complement condition providing better performance than the discourse only condition. Perspective shifting discourse and sentential complement training were found to independently facilitate FBT performance, which suggested that syntax training was sufficient facilitation to aid false belief reasoning. Language was a necessary condition for an increase in FBT performance, whereas experiencing the deceptive objects was not sufficient. Training in sentential complements was also sufficient to increase performance, but the strongest facilitator was the combination of sentential complements and discourse shifting measures to highlight deceptive characteristics of the objects. Thus, while the contribution of syntax seems to be sufficient for an improvement, it was not the only factor at play.

Importantly, a relationship between syntactic abilities and false belief performance was also found in Tager-Flusberg and Sullivan (1994) and Tager-Flusberg (2000), where comprehension of mental state verbs-complementation was predictive of performance of ASD children in the FBT. This was the case in a variety of tasks, including one where children needed to report what a character said after hearing a story about it. This suggests a relationship between difficulties reporting statements and performance in the FBT that is unique to autism. A relation between complement structure tasks and performance in a mentalizing task was found in Specific Language Impairment children in Miller (2010) and in de Villiers et al. (2003), even though SLI children could pass the FBT. Finally, sentential complement task performance was found to be predictive of false belief reasoning performance in deaf children in several studies (de Villiers and Pyers 2002; de Villiers and de Villiers 2011; Schick et al. 2007).

3 Structure-oriented views

In combination with the results on semantics and general language abilities, the above studies further reinforce the finding that the development of linguistic skills and that of mentalizing skills might be interrelated on a deeper level.

On this basis, proposals have been made about the role of language in mindreading, among which the most prominent is that by de Villiers (2005), which has been developed in de Villiers (2004, 2005, 2018), de Villiers and de Villiers (2009, 2012), de Villiers et al. (2014), and whose core idea is that syntactic structures entailing sentential complements are fundamental for the development of mentalizing abilities. The argument connects what de Villiers calls the semantic property of Point of View, marked in natural language through syntactic structuring, to the attribution of a Point of View of a specific agent on a specific situation. As said, in a sentence like the one below (2), Martin’s PoV on the piece of information that “that mum and Christopher are in the kitchen” is expressed by the fact that the main clause has an entire sentence as a complement (which, importantly for de Villiers’ theory, is a sentence with a finite verb).

figure b

In cases like other mental state verbs, as think, we can talk about non-factive mental state verbs, i.e. verbs that can have a false sentence as sentential complement without impacting the truth value of the main clause like in (3)

figure c

While the choice of verb in (2) suggests that, if the mum and Christopher are not in the kitchen, (2) is false, the same is not true for (3) if there is no tiger under the bed.

According to the syntactic bootstrapping hypothesis (SBH), then, the acquisition of complement structures like those in (2) and (3) has a pivotal role in false belief reasoning; once children learn to master these structures, they acquire a new representational format to attribute mental states to an external actor whose behavior they interpret. Data showing a correlation between sentential complement comprehension and false belief reasoning support this hypothesis. The syntactic structure gives the possibility of representing the fact that different points of view can be held over the same situation, involving a representation that is not the one of the speaker. Note that this applies only to cases of “explicit mindreading”, i.e. cases where there is an explicit prediction to be made about a person’s behavior, making a choice depending on the evaluation of the situation (p.226 de Villiers et al. 2014).

In de Villiers (2005), a possible developmental sequence for the acquisition of the meaning of mental state verbs is spelled out. First, syntactic evidence allows the child to classify verbs like think similarly to verbs like say. In other words, children firstly become familiar with communication verbs, which can take false sentences as their complements. This happens relatively soon after the child has started mastering the structures underlying the sentences, because the child can come across instances where the uttered sentence is in contrast with reality. This is then transferred to verbs like believe, think and so on, which share the syntactic structure but do not have such an easy relationship to observable reality. Such an account clearly implies that experiencing this difference between what is said and what is happening plays an important developmental causal role.

A strength of this account is that it provides specific means of tracking the role of syntactic information, and it fits in this sense with a lot of the data mentioned above. However, as it is formulated, the SBH relies on the idea of internal means of representation with grammatical features, and lacks any specification of how this representational capacity that language provides should be understood. Elsewhere, de Villiers relies on the idea that I–language as conceived by Hinzen (2006) might be the representational means that enhances cognition in some relevant way (de Villiers 2014), but that mentalizing has to occur in an internal language format is a strong position to hold for false belief reasoning. In de Villiers (2004), the idea seems to be that something like Logical Form as suggested by Carruthers (2002) might be the right representational format for false belief reasoning and mentalizing, but this suggestion is not further developed.

3.1 Cross-linguistic challenges and the language as a monolith problem

The strong prediction made by de Villiers’ account leaves many questions open. While a stance on the necessity of language for certain forms of mentalizing is not new (Bermúdez 2009), the syntactic bootstrapping account is one of the few frameworks that takes the data on the interaction between language development and false belief reasoning into account, since linguistic input provides specific information that allows the development of an emerging skill. Accounts like that of Bermúdez (2009) tend to solve the issue of the role of language in mentalizing by positing that sophisticated forms of mentalizing, requiring higher order mental state attribution and explicit manipulation of propositional information, also require a linguistic form of cognition. While this is likely, it does not address whether or not acquisition of certain abilities, like syntactic ones, has a more general impact on other forms of mentalizing.

Several issues remain open with an approach like that of de Villiers, among which cross-linguistic specificity is the most important. As discussed at length in Perner et al. (2005) and reiterated by Van Cleave and Gauker (2010), the account appears rather language-specific. Despite the attempts to expand the findings regarding complement structures and their role for mentalizing in other languages (Mo et al. 2014), problems arise with languages like German, where it is possible to use sentential complements with finite verbs embedded with desire verbs in the main clause, as in the examples below:

figure d
figure e

While (4) is a structure that works in German but not in English, (5) is acceptable in both languages; the problem with the SBH is that sentential complements with finite verbs for think and for communication verbs are supposed to do the heavy lifting for providing a representational format for false belief reasoning. However, since German allows finite verbs in sentential complements for main-clause desire verbs too, as in (4), and since children seem to master the sentential complements with desire verbs in German before they pass the standard FBT (Perner et al. 2005), the argument seems to be shaken at its foundation. de Villiers has claimed that the relevant distinction to be made is that between realis and irrealis: the “direction of fit” (Searle and Vanderveken 1985) is different for mental state verbs and desire verbs, as verbs like desire denote a mind-to-world state (they express a relation going from my desire to a wanted state in the world) and verbs like think express a relation going from the state of the world to a consequent “state of the mind”. While want takes irrealis objects, believe and think can also take realis objects, which makes them a good vehicle for the expression of different perspectives on current states of affairs. As Van Cleave and Gauker (2010) notice, this is still a problem for the SBH because it shifts the focus on a semantic property that is not related strictly to complementation anymore, hence casting doubt on the purely syntactic mechanism that is supposedly central for the account. The main point seems to be that, if semantic information of this kind is implied, then theories that advocate a fundamental conceptual change at the heart of the development of mentalizing skills (Gopnik 2001; Gopnik and Astington 1988) have the advantage of not appealing to obscure syntactic markers and information to do the same explanatory work, hence putting the SBH on the side.

Mostly, what emerges from such an approach is one central issue: the syntactic bootstrapping mechanisms seems to be rooted in a framework that does not accommodate cross-linguistic differences. As will be seen, another problem that emerges is that language is considered as a set of rules in its semantic and syntactic dimension, but no real attention is given to the cultural and social dimensions of language. Before getting into this, I will briefly mention the issue of the connection with pre-linguistic abilities.

3.2 The developmental picture: what connects with other abilities?

A second problem with de Villiers’ picture, common to other pictures stressing the role of language in mentalizing (Gordon 2007), is that of how to integrate such an account in a developmental picture that considers pre-linguistic skills, on the one hand, and the literature on the implicit forms of mentalizing, on the other hand. Arguably, cases like Onishi and Baillargeon (2005), Scott et al. (2011); Scott and Baillargeon (2009) and similar, where infants seem sensitive enough to other people’s intentions to implicitly predict where the actor will look for a given object, depending on their knowledge, show that some ability that paves the way to false belief reasoning might be present at an early age. While some of these results are often discussed in terms of whether they actually reveal mentalizing skills or just some form of more basic intention reading (Ruffman and Perner 2005; Perner and Ruffman 2005), or even in terms of whether or not they survive the recent “replication crisis” (Poulin-Dubois et al. 2018; Baillargeon et al. 2018), the point is that evidence in the literature seems to indicate that it is hardly justifiable to have a picture of mentalizing as a skill that emerges abruptly with language, or even to think that there are no relevant predecessors in development. Among these, good candidates are some basic mentalizing abilities dedicated to understanding intentions and intentional action directed to objects, which are documented broadly (Tomasello 1995; Tomasello et al. 2005, 2007), and which are compatible with the fact that some comprehension of behavior is possible at a very early age, as the implicit mindreading studies suggest. Some understanding of intentional action is arguably present in apes as well (Suddendorf and Whiten 2003; Call and Tomasello 2008; Townsend et al. 2017). Considerable efforts in the literature have been made to address the fact that an account of how explicit mentalizing works has to provide, at least, an idea of how these abilities are related. Relevant and recent attempts are the accounts in De Bruin and Newen (2014) and Apperly (2010). In both of these cases, it is assumed that there are at least two different systems operating in the realm of mentalizing. In Apperly’s case, a lower level system and a higher level system are concerned with different kinds of input and relatively independent. In the associationist framework proposed by De Bruin and Newen (2014), an association system and an operative system interact to deal with associations that are perceptual, motor, or “cognitive” (i.e. representing different agents’ cognitive perspectives). While the account in Apperly and Butterfill (2009) tends to consider the two systems as completely independent of each other, claiming that they respond to different developmental constraints, the account in De Bruin and Newen (2012), De Bruin and Newen (2014) aims at two integrated systems.

As stressed, the SBH seems to leave open the question of how implicit and explicit forms of mentalizing are related, if at all. Moreover, another problem emerges: as Montgomery (2005) notices, it seems unnecessary to assume that mastering mental state verbs has to come with possessing concepts of mental states. After all, both Van Cleave and Gauker (2010) and Montgomery (2005) point out, for a child to be able to utter “I want ice-cream” and communicate successfully, there is no special need to attribute to the child the possession of the concept of desire; a less demanding assumption is that the child is able to understand that there is a good chance she will get ice-cream if she utters the sentence, since she is taking part in a linguistic game of expressing her needs, which get responded to in particular ways when interacting with an adult. Assuming that concepts of mental states become available when language is brought into the picture, then, seems to pose problems in terms of how to relate pre-linguistic and post-linguistic abilities, attributing to syntactic structures and development a role that might not be necessary.

4 Towards folk-psychology as a practice

More recently, proposals have focused on the importance of cultural narratives in folk-psychology in correlation with language, a prominent example being Hutto (2008b, 2011, 2017) following Garfield et al. (2001), and Nelson (2005).

What makes these accounts important for the purpose of this paper is the fact that, while stressing the role of linguistic practice in the development of folk-psychology, they highlight how the structural representational component is either insufficient or unnecessary for the skills to develop, once language is considered in a cultural dimension.

Nelson (2005) basic idea is to reject the label “Theory of Mind”, because of the inadequacy of the label for more than simulationist inclinations. According to Nelson, the onset of folk-psychology abilities should be understood more like the entrance of the child into a “community of minds”, i.e. the practice of interacting with other people and providing explanations for their behavior. Folk-psychology, then, is to be seen as a cultural practice embedded in the social environment, and in no way similar to a theory in the theory–theory sense. In a slightly different direction, Garfield et al. (2001) stress that the influence of language on mentalizing abilities is necessary but not sufficient for the development of mentalizing abilities, since social development and variables are the other individually necessary condition. Social development and language acquisition are, Garfield et al. (2001) argue, individually necessary and jointly sufficient for false belief reasoning. While they do not provide details on how the origins of these abilities develop, the framework partially revisits the standard approach to mentalizing and language, as the focus is not on the role of language as internal representation means, but as a way to engage in social communication.

One could understand Hutto’s proposal similarly. Hutto’s take on folk psychology should be considered in light of his commitment to an enactivist approach to cognition, i.e. the idea that the best way to understand the mind is not by computational assumptions, but rather by conceiving it as embodied. In this picture, social practices expand the borders of cognition. This is the reason why Hutto (2008b) proposes the Narrative Practice Hypothesis, (NPH), that embeds social cognition in the narratives used in everyday practice (very similarly to Nelson) and that focuses on second-person mindreading. Of central concern in this account is the fact that most of our mentalizing experiences happen online, when interacting with another agent, as opposed to taking a “spectatorial stance” towards mentalizing. The standard approaches to mentalizing, Hutto argues, focus too much either on artificial situations like the third-person attribution as a spectator, or on self-attribution of mental states. However, mentalizing skills are in play especially when we interact with people in specific situations and during communication, and not just when we are silent viewers of situations, or alone by ourselves.

Folk psychology is, in this view, a specific kind of narrative practice, and it is through engaging in narrative practices of this particular kind, where agents’ behavior is explained in light of their characteristics and external circumstances, that mentalizing abilities develop. In other words, any narrative, fictional or based on reality, heard or read, provides substantial ways to “make sense” of the behavior of agents. While a standard simulation theory or theory–theory approach would assume that the skills that are necessary to engage with these narratives are already mentalizing skills, Hutto’s claim is that the socially embedded nature of these narratives is what provides the skills necessary for the practices of mentalizing; first come the narratives, and later the practice of second-order mentalizing in dialogue, followed only then by the spectator-like mentalizing tested by standard accounts. The central idea of the approach is that, by engaging with narratives, children become sensitive to the variables included in folk-psychology explanations, and they get used to the representational and linguistic artifacts that are narratives and stories. Representations are involved in linguistic practice, Hutto claims, but they are not necessarily internalized.

Note however that Hutto assumes that concepts of mental states precede narratives, as folk-psychology is a “training ground”:

[...]it is important to stress that the NPH supposes that FP narratives do crucially important but nonetheless limited work. They are not responsible for introducing an understanding of mental concepts, rather [...] they put on show how these attitudes can integrate with one another[...]. The NPH assumes that kids already have a practical grasp on what it is to have a desire or belief before learning how to integrate their discrete understanding of these concepts in making sense of actions in terms of reasons. FP narratives enable this by showing how these core attitudes and other mental states behave in situ. (Hutto 2008c, p.178).

This part of Hutto’s story is not clear, as it is not completely understandable why the concepts of mental states have to be in place, especially in an enactivist network, for the child to engage in the folk-psychology narratives. Low-level (pre-verbal, implicit) forms of mentalizing are cashed out by Hutto as not content-handling, but based on practical engaging with goal directed action. In this sense, attributing concepts of mental states to the infant engaging with folk-psychology practices for the first time seems premature.Footnote 2 If anything, concepts are likely to develop as a consequence of the engagement of children with narrative practices. But how does this occur, and what is the role of in the development of false belief reasoning?

5 A proposal: culturally embedded schemata for false belief reasoning

5.1 Towards embedding language in cultural practices

The view I propose is in line with approaches to cognition that assume that language boosts specific cognitive processes, following Vygotsky (1962), (see also Clark 2013; Tillas 2015; Camp 2009; Dove 2017). Specifically regarding folk-psychology, Hutto (2009) claims that humans construct and shape their niche, and that, at the same time, the environmental constraints that result from the niche influence further cognitive development. This dynamic interaction explains how folk-psychology has evolved to be a significant part of our interactions, since the construction of narratives that involve it have been shaping the cognitive development of children for generations. I push the proposal even further, by showing that language structures can play a specific role in such an account, once two fundamental assumptions are made:

  1. 1.

    As supra-communicative views of Language assume (Dove 2017; Clark 1996, 1998, 2006), language can be a powerful means to create cognitive niches and enhance cognition.

  2. 2.

    The acquisition of ToM-relevant semantic and syntactic features in specific cultural and social practices, as stressed by Garfield et al. (2001), Hutto (2008b) and Nelson (2005), make folk-psychology a product of both language and culture.

The claim is both phylogenetic and ontogenetic. I propose to consider both language and mentalizing as communal social practices whose processes develop in time.

5.2 Recognizing cultural specificity: mindreading as a non-universal feature

My account is an attempt to recognize the fact that some aspects of folk-psychology rely on socio-cultural factors. In this sense, a fundamental assumption in my account is that some aspects of how we solve FBTs, for example, will be highly culture-dependent. This assumption is supported by some literature summarized here.

While evidence regarding the role of language in mentalizing is mostly related to English, there is some evidence concerning other languages, including the exploration of languages whose syntax does not align with English. In some cases, results are similar to those achieved by de Villiers (Hale and Tager-Flusberg 2003; Mo et al. 2011) in supporting a role of language in false belief reasoning. However, the most interesting data comes from other cross-cultural studies. For example, interesting cases are those like Quechua, spoken in Peru, that lacks mental state terms (Vinden 1996). While children of these communities do not present particular difficulties when tested in appearance–reality tasks, Vinden (1996) reports serious difficulties in FBTs for children up to age 8.

The absence of vocabulary is not the only variable in ToM. Consider the case of communities in Papua New Guinea studied by Vinden (1999), who reports that children of Tainea, Tolai and Mofu communities all have difficulties with the understanding of false belief reasoning tasks, where direct questions to identify the beliefs of the characters were not easily comprehended. While children would eventually pass the task (considerably later in age than their English speaking peers), the author suggests that the practice of using beliefs to explain behavior might not be as universal as is assumed.

Wassman et al. (2013) tested children belonging to Micronesian cultures in five different studies, with adaptations of theory of mind tasks. The findings suggest that, often, false belief reasoning performance does not follow the patterns of development that have been registered in Euro-American cultures. In the studied cultures interpersonal relations seem to play a more important role compared to “internal states” in explaining behavior. Naito and Koyama (2006) report that Japanese children pass the standard FBT later than English speaking children (around the sixth birthday), attributing their finding to the simple fact that mentalistic explanations are less common in Japanese culture, which would rather rely on explanations related to promises, social rules and conventions than intrinsic motivation. The same is true for the results in Mayer and Träuble (2012), comfirming a strong cultural relativity of the classic ToM acquisition timeline, with children belonging to the Samoa culture passing the standard FBT (change of location) much later than Western children (in some cases, not until age 13). This is connected, it is argued, to what has been referred to as “doctrine of the opacity of other minds” (Robbins and Rumsey 2008), i.e. the fact that in some cultures, including many in the Pacific, explanations involving mental states attributions are not considered central, because of the difficulties that are inherent in accessing what actually is the case in other people’s mind. The doctrine predicts that determining what is in other people’s heads is close to an impossible task.

Other ethnographic studies offer a variety of interesting data regarding the extent of the variation in folk-psychology practice. For example, Danziger (2006) reports that Mopan Mayans do not hold mental states in the same regard as Euro-American cultures, dismissing them in explanations. Danziger reports the absence of fiction in the Mopan culture, where stories are supposed to be either believed as true, or discovered as lies. Along similar lines, Luhrmann (2011) suggests not only that the way humans conceptualize the mind differs across cultures, but also that this is reflected in the practices of explaining behavior, dealing with social hierarchies, and even conceptualizing and experiencing mental illness Barrett (2004), Luhrmann (2011). The same holds for the ethnology studies reported in Lillard (1998), which report the quasi- absence of discussion of mental states in Papua New Guinea (Fajans 1997; Ochs and Schieffelin 1984), and even Himalayan cultures (Paul 1995). In this sense, a variety of evidence suggests that not only false belief reasoning, but folk-psychology in general might vary a lot cross-culturally.

Interestingly, Lillard reports data showing that, much like Indian children and adults, American children tend to explain the behavior of individuals in terms of the specific situations they are in, rather than attributing the cause to character traits. This tendency changes for American adults, but not Indian adults, who use traits as explanations more frequently (Miller 1984; Beauvois and DuBois 1988). An emphasis on traits and personal character as explanatory of people’s action, then, seems to be a partially culture-specific behavior. In general, variation in what is supposed to have causal effects on behavior, in a given culture, is striking: behavior, for example, can be motivated more by relationships than by individual will (Cheyenne culture, Straus (1977)), or by others’ desires rather than the subject’s (Utku culture, Briggs (1970)). Both of these factors are radically different in Euro-American theory of mind, as Lillard (1998) points out, where individuals’ mental states and will are thought to be the main cause of action.

The conclusion drawn by Lillard and Luhrmann is that folk-psychology narratives as we know them are not as universal as assumed to be. Among other things, Lillard suggests that fundamental differences between Euro-American folk-psychology and other, much less studied, cultures, might reside in the emphasis that some Western dominant cultures tend to put on individuals as disconnected from their own community and as independently minded and oriented. Regardless of what the cultural causes for this diversity are, cultural and social factors influence how minds are described and how behavior is explained across cultures.

5.3 A mechanism facilitating false belief reasoning

5.3.1 Main claim

In what follows, I argue that language structures provide a tool facilitating familiarity with false belief reasoning. Language, as a tool, builds on the already existent abilities children possess, while helping to acquire new abilities. In particular, I argue that linguistic structures provide a way for the child to form schemata relating agents and descriptions of state of affairs, that are used in false belief reasoning.

In this sense, language is a tool to navigate the world. Far from being an innate module, mentalizing is a set of skills that evolves with time. I argue both that syntactic information plays a role (Diessel and Tomasello 2001; Lohmann and Tomasello 2003; de Villiers 2005), and, follwing Montgomery (2005) and Van Cleave and Gauker (2010), that pragmatic roles and the function of mental state verbs are central in the formation of false belief reasoning. In a nutshell, I argue that language provides schemata that constitute a structural link between an agent and a description of a state of affairs. Schemata aid false belief reasoning, and have a linguistic and cultural origin. In what follows, I delineate the fundamental components of my account and discuss the specific predictions, the kind of empirical evidence supporting them, and how my account relates to other views.

5.3.2 Schemata

What is meant by “schemata” is the result of an abstraction process over regularities picked up in language. In a different fashion, schemata are also invoked by Apperly (2010), who argues that one way adults deal with particularly rich input in a concrete situation is by using scripts and schemata that are general enough to avoid processing excessive quantities of data. In this sense, the notion comes from social psychology (Schank 1982; Schank and Abelson 1977; Gilbert 1998). Situational schemata are formed to deal with specific situations that require operating according to a large variety of variables or include large quantities of data. Hence, the central notion is that information can be stored in a rather abstract representational format that captures regularities.

The idea of schemata as instruments for organizing knowledge can be traced back even further, to Piaget (1936); Piaget and Cook (1952). Piaget described schemata as repeatable action sequences, or a series of linked mental representations used by the learning child to respond to the environment. Most schemata, in the Piagetian sense, develop with learning, and they are increasing in complexity and number when the child goes through developmental stages.

The core idea I develop here is that handling folk-psychology narratives is a matter of manipulating associations between agents and states of affairs, which are present in story telling and narratives; relations between an agent (e.g. Anne) and state of affairs described verbally (e.g. the fact that her teddy bear is lost in the woods) are very present in Western culture’s stories and fairytales, and part of our common practice of explaining behavior (e.g. Anne goes in the woods because she believes the teddy bear needs her help). This pattern is well reflected in the syntax often used in folk-psychology, that connects agents and descriptions in this sense. The developing syntactic and semantic abilities allow for the individuation of this kind of association; at the same time, the narrative context of the associations between agent and descriptions is what provides the child with the kind of information that we deem relevant in folk-psychology.

The child not only forms a schema that allows her to readily connect agents and descriptions, but also develops the implicit understanding that these associations are readily used in explanation practices characteristic of folk-psychology. The idea that these abilities rely largely on the pre-existent ability to deal with associations is supported by studies in implicit mentalizing (Onishi and Baillargeon 2005; Scott et al. 2011; Scott and Baillargeon 2009), but crucially also by two current double-mechanism approaches to mentalizing like Apperly (2010) and De Bruin and Newen (2014): in both these cases, the central idea is that, while infants and pre-schoolers gradually come to possess the ability to deal with associations that are motor and perceptual in nature and are therefore able to understand goal-directed action in conjunction with agents–objects associations, it takes more time to develop the ability to deal with associations at a more abstract level (De Bruin and Newen 2012, 2014) or to use this information at a higher level in flexible but demanding mentalizing (Apperly 2010).

My view accounts for the difference between 3-year-olds’ and 4-year-olds’ performance in false belief reasoning by appealing to a similar argument: children are aided by language in the formation of culturally embedded schemata that put agents and descriptions in relation to one another in order to provide explanations of behavior. Learning syntactic structure alone is not sufficient; familiarity with practice of explanations of behavior in terms of relations between agents and descriptions of states of affairs is also essential for the child to learn to deal with false belief.

A 3-year-old child comes to the scene with the ability to keep track of associations between locations and agents, agents and preferred movements, agents and determined perceptual perspectives, and so on. For the child to have these abilities, no abstraction is probably necessary. Things change, however, in the case of the explicit FBT that entails an elicited response. In the classic Sally and Ann situation, the child has to produce a prediction of how Sally will behave. The child who has a agent-situation schema at hand is facilitated in at least two ways: she possesses a format in which the association has been stored in a sufficiently abstract way, which might cause the association to be stored more efficiently in the first place, i.e. more easily kept in mind. She also has a readily available schema for what the relevant information is in order to predict the behavior; she knows that when dealing with agents’ behavior, the association between the agent and the description is what is usually requested, and this is something that she has learned through exposure to narratives and folk-psychology practice. The schema provides an easy way to recall this information, since it is readily stored in a quite general format.

5.3.3 Structural alignment

How exactly does language aid the formation of these schemata? A suitable candidate is structural alignment, as explored by Gentner and Medina (1998), Gentner et al. (2011), Gentner and Gunn (2001): the idea is that language provides the child with means of generalizing over situations.

In Gentner and Medina (1998), children had to correctly guess the location of a sticker, which was underneath one of the objects in their sets. An experimenter also had a triad of objects and would place one sticker underneath one of the objects, as a clue for the child to find her own sticker. The correct solution was based on relational similarities and not perceptual similarities: for example, if the experimenter put the sticker underneath the biggest pot in his triad, the child had to look underneath the biggest pot in her own triad. 3-year-olds improved their performance significantly when they were given “relational language”, i.e. language stressing relational properties, in this case labels like “daddy”, “mummy”, “baby”. The authors claim that using relational language was what facilitated performance, because it allowed the children to focus on the relative size of the pots and to see the analogy between the two sets. Language allows structural alignment because it allows a comparison that promotes a focus on common relational structure.

As schematized in Fig. 1, I propose that very different situations, described to the child via stories and interaction, present the same pattern of association: this allows the child to form a more abstract schema that can easily be used to store associations between specific agents and specific descriptions in a more general format, and that can easily be readily available for explanations. Language, then, provides clues for individuating relations. Upon individuation of this pattern, the child forms a relatively abstract representation of a schema relating an agent and a description of a state. Possessing this schema makes storage of new associations easier, helps finding new associations in new given stimuli, helps retrieve them for new tasks, and possibly helps in comparing different associations. In other words, not only does language provide a recurring pattern and structure, but it also provides information about the kind of information that is relevant when engaging in folk-psychology practices like, for example, the FBT.

Fig. 1
figure 1

The generalization across several instances, bringing to the schemata relating an agent with state of affairs

If syntactic abilities play a role in allowing the child to individuate the pattern, this does not mean that false belief reasoning necessarily employs syntax-like organization; the child can make sense of what is asked of her in the standard FBT partially because she possesses the culturally embedded schema that allows her to employ the relation between agents and descriptions in the context of behavior-explanation. However, the labeling of different relations between agents and descriptions (believe, think, etc.) may be what paves the way to forming and disentangling the various concepts. Note that this is in line with the fact that children only really start differentiating between verbs like know, think, but also guess and hope around age 4, and they keep refining their understanding until they are at least 6 years old (Shatz and Silber 1983; Limber 1973; Diessel and Tomasello 2001; Bloom and German 2000; Abbeduto and Rosenberg 1985; Johnson and Wellman 1980).

I hypothesize that, being able to retrieve and store the information in abstract schemata, children employ associations between agents and states of affairs in order to solve tasks like the FBT. In the classic Anne and Sally scenario, for example, being able to retrieve the information that Anne can be associated with the marble being in box A, and use it to explain or predict her behavior, is facilitated by the fact that an abstract schemata is present, and that this kind of association is recognized by the child as suitable to explain and predict behavior.

5.3.4 Cultural-dependency and narratives

Such an account relies on two ideas: on the one hand, being exposed to linguistic input and being able to develop syntactic skills helps in developing abstract schemata relating agents and states of affairs. On the other hand, the fact that these associations are used in explaining and predicting behavior is something that the child learns from being exposed to folk-psychology narratives in the first place.

Similarly to what is suggested by Hutto, I see narratives as the training ground for the use of associations between agents and descriptions in the context of behavior explanation. In this sense, false belief reasoning is facilitated by cultural and social factors, while still relying on cognitive development.

The main assumptions of the argument can be spelled out this way:

  • When dealing with other agents’ actions, we often explain them in terms of associations between agents and descriptions of states of affairs.

  • These associations between agents and descriptions are rendered in a recurring syntactic form.

  • This recurring structure, as a consequence, in many communicative situations, has a specific role, i.e. that of providing the explanation required in folk-psychology narratives.

  • This specific role is culturally-dependent.

A fundamental role syntax plays is that of structuring the kind of input received by the child; thanks to the syntactic skills, it is easier and faster to individuate the relevant information and to detect the pattern upon which abstraction takes place. This is in line with the fact that delays in syntactic acquisition also determine delays in false belief reasoning (see Sect. 2.2). The assumption is that children are sensitive to both the grammatical form used in speech, and to the practice of explaining behavior, which implies certain “rules”, including that sometimes the best explanations for somebody’s behavior is the fact that they can be associated with a specific state of affairs. “Rules” are invoked in scare quotes here because I do not claim that children learn a propositional rule that explicitly tells them to provide explanation for people’s behavior in a determined way. On the contrary, the claim is that children are sensitive to the format and recurring formulas in folk-psychological narratives, where there is a structural pattern (the syntactic form that is used) on the one hand, and a specific context and pragmatic function (the fact that the syntactic form is used to explain behavior), on the other hand. In this sense, “narrative” is here intended in a broad sense, as the “stories” we tell each other about other people’s behavior and the reasons behind them. These stories, whose constants and patterns vary cross-culturally, give a specific pragmatic context for the use of the relevant syntactic structures. In other words, narratives are the ways behavior is described and talked about, and this includes using associations between states of affairs and agents.

5.3.5 Selective attention

One of the mechanisms that supports this story is that of selective attention, which has been argued by others (Barsalou 2005; Tillas 2015) to be central for cognitive mechanisms involved in conceptualization as well. Expectations and knowledge about how certain stimuli are usually processed can guide the retrieval and apprehension of new stimuli as well, making the new information fit with previous expectations. In a predictive coding system, information is recruited in a context-dependent, flexible way, depending on what the contextual clues generate as predictions and on the information stored in the system in virtue of previous experience. In such a system, language can act as an anchor. Since new information acquired is automatically clustered with previously-existent information, there is a sense, according to Clark, in which advanced thinking requires inference and reliable trajectories in a “representational space” (Clark 2006).

I argue that previous input acts exactly in this way for language representations, forming schemata relating specific information (an agent and a description of a state of affairs) to the context in which behavior is explained. In this sense, the child will learn to pay attention to these associations, and will naturally be more inclined to notice them in given narratives, which will also make it possible to “find” the relevant information in these narratives and to use it appropriately. The child, in other words, will know which kind of information is important when providing explanations.

As mentioned, Apperly (2010) argues persuasively that, when faced with a FBT, a child has to take into consideration an extremely large amount of information, with a complex and perceptually-rich scene that has to be computed efficiently. According to his argument, a lot of our folk-psychology experience in basic everyday tasks implies dealing with information that is already categorized in slots and frames. This fits with my claim; the 4-year-old child is already able to do a lot, including using well-developed skills to track attention, goal-directed action, and so on. Infants also seem to be able to register associations between locations and objects and to use them to form expectations about people’s behavior, at least according to some interpretation of the implicit FBT literature (Onishi and Baillargeon 2005; Scott and Baillargeon 2009; Song et al. 2008). However, having an abstract schema is useful in so far as it guides attention during interaction, as the child is implicitly instructed to pay attention to specific patterns of explanations. Acquired schemata have the same function as scripts in making information that is stored and can be re-used appropriately available. The schemata are, like scripts, culturally determined and bound to linguistic information, and acquired in specific ways because of how the child’s community of minds interacts.

Language provides means of abstraction that facilitate the creation of schemata, which in turn make false belief reasoning easier. This, however, does not imply that schemata can only be formed with the help of language, nor that language alone enables false belief reasoning. In Wellman and Peterson (2013), a group of deaf children with hearing parents was trained with thought bubble scenarios, where speech bubbles represented the thoughts of the characters in narratives in a cartoon story board: things could be placed in the speech bubbles to signify a character’s thoughts. Training with speech bubbles increased the performance of deaf children in false belief reasoning, and similar results were produced with autistic subjects (Paynter and Peterson 2013). In thought bubble experiments, visual aids are used to link an agent, in a clear-cut perceivable way, with a potential mental state: this kind of association is potentially similar to the one between an agent and a state of affairs given by a description used in schemata. In this case, the description is substituted by a visual stimulus. In this sense, other non-verbal ways in which schemata are developed might exist. What is fundamental is that these schemata are formed in development, and that they make tasks like the FBT easier for the child; they point at the right information to use, they provide a way to store it, and they make it available for verbal use.

5.4 Predictions and supporting data

In this section I describe which kind of empirical evidence can, or could in the future, support the account I am proposing. In a strong sense, my account is built on already existing empirical evidence. However, it also makes a specific set of predictions.

Ontogenically speaking, language learning has an impact on false belief reasoning: delayed or impaired language learning can delay the formation of schemata, which make false belief reasoning easier. Being exposed to specific linguistic structures and to specific patterns of behavior-explanation leads to the formation of schemata that make manipulating the relevant associations in the standard FBT easier. This implies two fundamental predictions: (1) that good linguistic skills have a positive impact on false belief reasoning, and (2) that language acquisition delays might make false belief reasoning harder. (1) is confirmed by the cited studies that prove a connection between linguistic abilities and false belief reasoning (Milligan 2007; Wellman et al. 2001; Astington and Jenkins 1999; Ruffman et al. 2003, 2002; Slade and Ruffman 2010; de Villiers and Pyers 2002; Hale and Tager-Flusberg 2003; Lohmann and Tomasello 2003; Tager-Flusberg 2000). (2) is confirmed by studies on late-signing deaf children (Russel 1987; Peterson and Siegal 1999, 2002, 1998; Remmel et al. 1998; Steeds et al. 1997). Note however, that my claim is that a mechanism like the one described merely helps children deal with the information that is involved in false belief reasoning, and thus pass the FBT. This does not exclude the fact that children might be able to achieve the same level of abstraction in some other way and possibly following a different schedule.

I predict that being exposed to linguistic structures that include complementation, and hence mental state verbs, has the double effect of providing a pattern of explanation expressed by recurring syntactic structures and of introducing the child to the culture-specific practice of false belief reasoning. However, since schemata can be non-verbal, functionally similar non-verbal schemata should similarly improve explicit FBT performance.

Another prediction is that linguistic impairment in adults, when acquired after development, does not necessarily impact false belief reasoning skills, since false belief reasoning does not necessarily imply manipulating linguistic representations. This is compatible with data showing that aphasic patients can indeed perform normally in FBTs (Varley and Siegal 2000; Siegal 2001; Apperly et al. 2006). However, this does not exclude the fact that some other forms of mindreading might require language-like thought, as suggested by Bermudez (2005).

This is compatible with the idea that very different kinds of folk-psychology will emerge, depending on the specific characteristics of narratives in a given culture. This says little about the content of folk-psychology narratives in general, given that it is a hypothesis about the format of reasoning and not the content of folk psychological narratives. If narratives in specific cultures make use of associations between agents and states of affairs, very similar schemata to those described here might develop and be used in false belief reasoning as well. Note, however, that my tentative prediction is that, with time, different kinds of relations between associations and agents will be encoded with different labels, e.g. “believe”, “think”, “know”, etc. In cultures where mental states are not the main explanatory tool for behavior and different labels are used, these categories might change. Extensive studies of mechanisms at play in false belief reasoning in cultures that do not produce and use a large number of mental state terms, for example, are essential, both for eventually reconsidering the potential importance and universality of false belief reasoning as hallmark of folk-psychology, but also to have a clearer picture of which kind of linguistic input can indeed play a role.

I also make a specific claim, that abstract schemata are formed through structural alignment. Data that supports structural alignment and abstraction in children comes from Gentner (1978), Gentner et al. (2011), Gentner and Medina (1998) and more in general from evidence regarding how language fosters generalization, abstraction, and category formation (Lupyan 2009; Lupyan et al. 2007; Lupyan and Mirman 2013). This is encouraging evidence for the idea that abstract schemata are formed and used. However, while the idea of schemata as the relevant format is supported by this evidence, it is not straightforwardly demonstrated. More direct evidence has to be collected in this sense, in particular on the contribution of language in understanding abstract relations, for example, and in forming generalization.

In conclusion, further evidence for the account could come from two different directions: in one direction, more evidence for a role of language in aiding the formation of abstract structures could be collected, strengthening the account on the more “general” side and relating it to an idea of language in cognition. In the opposite direction, more specific data can be collected on how both linguistic structures that are present cross-linguistically and practices that are present cross-culturally can interact in promoting false belief reasoning.

5.5 Other accounts

In this section, I highlight similarities and differences with other views. This will allow me to refine my own claims and the predictive scope of the account.

The account by Baldwin and Saylor (2005) also relies on structural alignment. The idea is that using labels and relational language for different situations allows children to realize patterns that are otherwise not available. In the framework of Baldwin and Saylor (2005), this is what allows children to form concepts of mental states. That is, the fact that they hear adults using words for objects that are not existent will aid the formation of an “internal focus of attention”, or an intended referent—in other words, a mental representation. In this sense, while I share Baldwin and Saylor’s (2005) assumption that structural alignment is at play in false belief learning, the hypotheses differ substantially since I don’t connect it to the onset of mental states concepts necessarily.

Hutto’s (2008c) proposal also attributes a relevant role to children’s ability to be sensitive to “forms and norms” of folk-psychology. A fundamental difference lies in the fact that, according to Hutto, children’s sensitivity to patterns (that are considered abstract and general) does not imply that they become able to form abstract generalizations about them with any representational content. I favor the opposite view, namely that children pick up on regularities across a concrete number of instances and this allows them to form abstract schemata, i.e. generalizations that allow the child to operate on the available information more quickly. This implies that, instead of having a practical understanding of belief and thought as concepts that are used in narratives, as the Narrative Hypothesis suggests, the child comes equipped with the ability to detect patterns and regularities, and narratives and language helps them abstract over these regularities to form more general scripts.

Crucially, while giving some credit to the Syntactic Bootstrapping Hypothesis, my view fundamentally differs from it in various respects. According to de Villiers, language is likely to be the representational format in which the FBT is dealt with. In contrast, the schemata are not thought to be linguistic: while their formation is aided by linguistic stimuli, the format does not need to be linguistic per se, as it does not need to include, for example, syntactic markers like those that de Villiers assumes. Schemata are loose, abstract representations that need not be sensitive to syntactic features and cues like tense, rather they are formed through abstraction from linguistic input.

This, however, raises the question,Footnote 3 whether or not children need to represent the stimuli syntactically in the first place, i.e. whether sentences have to be represented as having certain syntactic properties, or if they could be simply represented as instances of re-occurring linguistic constructions. While in de Villiers’ account the answer seems to be the first (see Sect. 3), I do not need to take a specific position in this respect. Following Tomasello (2009b), it may be that sentences are not represented syntactically but as instances of re-occurring constructions. Tomasello (2009b) argues that acquiring language involves skills like pattern finding, schematization and analogy that allow the child to abstract syntactic constructions, using as input the concrete utterances they are exposed to. Syntactic constructions, then, are learned because the child is able to generalize over concrete occurrences and then form schemata. The process starts with learning holophrases and proceeds with increasingly complex schemata (Tomasello 2009a), working around construction “islands”, i.e. item based constructions. On this view, syntax is not initially represented as such but learned through processes of abstraction. Nonetheless, contra Tomasello, I do not claim that syntax must be wholly learned (thought it might be).

Finally, my view has strong affinity with the double-system account proposed by Apperly (2010), albeit with some differences. For a start, Apperly attributes a role to scripts and schemata, but focuses on a more coarse-grained level, operating at the level of complex actors and actions (e.g. being in a restaurant) more than on the specific agent-state of affair level. However, one level does not exclude the other, and it is compatible with my view to think that the same schemata that are used to associate particular agents and particular descriptions can be used as part of a larger schema that takes into consideration a broader situation.

The biggest difference with Apperly’s account is that according to Apperly (2010) language is relevant because of its social and communicative dimension. Syntactic information, on the other hand, is not considered central. In my account, specific linguistic structures do play a role in the formation of schemata, since syntactic structures are effective in combining the social-communicative dimension at play. Finally, Apperly’s account stresses the fact that storage of the relevant associations might be a fundamental problem in false belief reasoning, when it comes to the tasks that most 3-year-olds fail. In this sense, my account provides additional detail: if my hypothesis is right, and schemata help individuation, comparison and storage of different associations, this might be a relevant piece in Apperly’s puzzle.

6 Conclusions

In this paper, I made a case for understanding false belief reasoning as at least partially relying on culturally embedded acquired schemata that are facilitated by language acquisition.

In the first part, I reviewed some empirical literature that makes the case for a role of language in facilitating the development of false belief reasoning skills. Correlational, longitudinal and training studies suggest that language acquisition and false belief reasoning are related, and that several factors, including syntax acquisition, aid the development of false belief reasoning. I also presented the syntactic bootstrapping hypothesis, according to which mastery of sentential complements is necessary and possibly sufficient for explicit false belief reasoning. I pointed out that this hypothesis does not survive some cross-linguistic challenges and that the socio-cultural dimension of language should be considered. While I do not claim that all aspects of mentalizing are culturally-determined, I suggest that folk-psychology narratives are highly language and culture specific. In the second part of the paper, I presented my hypothesis on the use of language to individuate patterns in folk-psychology to form abstract schemata relating agents to descriptions of state of affairs, and to engage in false belief reasoning. I argue that structural alignment mechanisms operate by exploiting the systematization given by syntactically-structured input, and that schemata are used to easily recall and individuate the kind of information that is necessary to engage in tasks like the standard explicit FBT. This suggests that false belief reasoning as studied by the classic false-belief paradigms is at least partially linguistically and culturally influenced. While developing their linguistic skills, children also train their own engagement in folk-psychology narratives through interaction with caregivers by being slowly introduced to the cultural game of explanations of people’s behavior. Language has an important role in this, not only because it is used to express the narratives at play, but also because it provides means to generalize over a variety of stimuli to get to abstract schemata to use in false belief reasoning.

While I recognize that other factors are essential (association-mechanisms, intention reading, and so on), I present an argument according to which language acquisition and social engagement with culturally dependent narratives help in developing false belief reasoning skills in the form that is most commonly studied. More evidence is clearly needed to support this hypothesis: above all, more investigation regarding the extent to which folk-psychology narratives and false belief reasoning are indeed generalizable and “universal” is needed. The available evidence suggests that more attention should be given to cultures that do not seem to use belief-based systems for explanations of behavior, to avoid the Western-centric bias that has partially clouded our research on mentalizing.