Social play is widespread throughout the animal kingdom. One aspect that has recurrently fascinated scholars is its complex, cooperative nature that requires substantial on-the-fly coordination and improvisation (Bekoff, 2001; Bekoff & Allen, 1998; Palagi, 2006), in comparison to other social activities like grooming or sex that involve more stereotyped activity-specific patterns. To play together, partners need to recognize each other’s playful intentions, anticipate each other’s movements, adjust the timing and nature of individual acts (Bekoff, 2001, 2004; Palagi, 2006, 2008), and adapt their moves to the strength and age of their partner (Fröhlich, Wittig, & Pika, 2016; Fry, 1987).

Playing together thus seems to require a particular kind of focused attunement to one’s partner. This has led some researchers to use social play as a means to study complex cognitive abilities like shared intentionality, the skill and motivation to share goals and intentions with others during collaborative interactions (Tomasello, Carpenter, Call, Behne, & Moll, 2005). To test the existence of shared intentionality in human children and chimpanzees, Warneken, Chen, and Tomasello (2006) used interruptions to study how participants valued their joint commitments during social play. They found that human children, but not chimpanzees, attempted to reengage reluctant human play partners after an interruption. They concluded that chimpanzees lacked an awareness of the joint commitment with a social partner toward a common goal—one of the critical requisites taken as evidence for shared intentionality in humans.

While fully fledged shared intentionality is probably unique to humans, some related component abilities may be present in other species. Indeed, one way of assessing the differences in such abilities between species is to investigate how coordination is achieved when individuals interact together and the processes involved in the regulation of joint activities in various species (Call, 2009). Accordingly, in this article, we suggest that social play is a form of joint action that constitutes a unique test bed for studying the evolution of shared intentionality.

We take a different tack from previous research, however. We start from an analysis of joint action in humans, describing the step-by-step processes involved in how humans get into, conduct, and get out of focused joint actions together in an orderly way (H. Clark, 1996), thereby jointly constructing the state of “togetherness” characteristic of shared intentionality (Tomasello & Moll, 2010). In human joint actions, participants typically exchange communicative signals to build up a participation framework (Goffman, 1981), which defines the participants of the interaction, the terms on which they are to interact, and the particular content of the action, among other things (H. Clark, 2006). This constitutes the opening phase of the joint action. When engaging in the action proper, or main body, participants continue to exchange signals to progress through the action in a coordinated manner, or to signal their ongoing commitment to the action. Participants also signal to each other their readiness to terminate the action, proceeding through parting rituals like well-wishing, before disbanding in what is known as the closing phase of the joint action. Taken together, the opening, main body, and closing constitute three macro-level phases that are common to all joint actions and constitute the behavioral embodiment of the process by which participants achieve and maintain shared intentionality.

We propose to use this process in humans as a systematic framework and yardstick for examining how various animal species get into, maintain, and get out of play bouts. Such a framework could, in turn, shed light on the evolution of shared intentionality, the unique human motivation for sharing psychological states with others (Tomasello, Carpenter, Call, et al., 2005), and the specific human “cognition for interaction” (Levinson, 2006a, b). That is, the extent to which the signal exchanges used to coordinate the interactional achievement of shared intentionality in humans are observable in other animal species may constitute a measure of those species’ possession of shared intentionality.

In the next section, we describe what a joint action is, how it is coordinated in humans, and how the coordination of joint action reveals abilities necessary for shared intentionality. We then outline our framework to comparatively study social play as joint action and how animals and young children solve the coordination problems arising in joint actions (entering into the action, maintaining it, and exiting from it), thereby potentially achieving states of shared intentionality. Using this framework, we review the literature in search for evidence on how human children and different social species of mammals and birds solve these coordination problems. Finally, we discuss the implications of studying play as joint action for understanding the evolution of shared intentionality.

Joint action: Shared intentionality as an interactional achievement

What is joint action?

Joint action involves two or more individuals collaborating to achieve a shared goal, often corresponding to an outcome that no individual could attain alone. The term potentially includes situations ranging from small-scale, brief, ad hoc collaborations (e.g., Ed enlisting John to help him move a bench) to large-scale, long-term actions involving organized social groups (e.g., Hannibal and his army crossing the Alps to invade Rome). In this article, we will be concerned with small-scale, brief joint actions that involve a handful of participants who share a joint focus of attention and who attempt to coordinate their individual behaviors toward a goal that they all share. Such joint actions feature a collective state of being that has variously been termed intersubjectivity (Merleau-Ponty, 1962), togetherness, or shared intentionality (Reddish, Fischer, & Bulbulia, 2013; Tomasello & Carpenter, 2007; Tomasello, Carpenter, Call, et al., 2005; Zlatev, Racine, Sinha, & Itkonen, 2008). Our analytical focus will be on describing how these states are established, maintained, and dissolved.

Several disciplines have investigated joint action in human interaction. In the words of Levinson (2006b), “human interaction belongs in an interdisciplinary no-man’s land: it belongs equally to anthropology, sociology, biology, psychology, ethology, but is owned by none of them” (p. 39). Philosophy and pragmatics (Bratman, 1992; Grice, 1975; Sperber & Wilson, 1986) typically analyzes the intentional structure of joint action. Psychology focuses on experimental explorations of the neural and cognitive processes involved (H. Clark, 1996; Sebanz, Bekkering, & Knoblich, 2006). Approaches from the social sciences like ethnomethodology and conversation analysis have described the linguistic and bodily coordination of joint action in natural settings (Sacks, Schegloff, & Jefferson, 1974; Sidnell & Stivers, 2012). Economics and biology have explored its game-theoretical structure (Henrich et al., 2004). Converging evidence from all these fields has led to the establishment of a sophisticated understanding of the phenomena that human interaction produces as well as the cognitive underpinnings entailed by it (Vesper et al., 2016). Increasingly, proposals are emerging for integrative approaches to bridge methodological and epistemological divides (De Ruiter & Albert, 2017; Galantucci & Sebanz, 2009; H. Clark, 1996; H. Clark & Bangerter, 2004; Levinson, 2006a).

In spontaneous joint actions, participants need to accomplish two things together. First, they need to establish a sense of joint commitment by ensuring they are all able, ready, and willing to commit to the action (H. Clark, 2006), that they share their goals and intend to do their part (i.e., that they will not free-ride by not contributing their share to the joint effort). Specifically, establishing joint commitment entails coordinating on a number of generic elements: who is to participate, in what roles, what actions will be performed, and when and where they will be performed (H. Clark, 2006). Second, they need to coordinate their individual actions so that these fit together in time and space to bring about the desired outcomes. This is achieved by exchanging signals in real time that help partners to adapt to each other. For example, in the case of joint actions accomplished mainly via talk (e.g., everyday conversations), speakers design their utterances to display their intentions (Grice, 1975) and facilitate their interpretation, whereas addressees display evidence of how they have interpreted those intentions (H. Clark & Schaefer, 1989). In joint actions involving physical actions (e.g., assembling a Lego model), participants design those actions to be visible and informative to their partners (H. Clark & Krych, 2004) while observing their partners’ actions to extract information from them (Vesper et al., 2016). This process is called grounding, and it is achieved by the exchange of signals that often are produced incidentally or implicitly, in parallel to the main track of conversation (H. Clark & Schaefer, 1989). Grounding is thus the process by which intersubjectivity (Merleau-Ponty, 1962) or shared intentionality (Bratman, 1992; Tomasello & Carpenter, 2007) is attained. Indeed, attributes of shared intentionality include joint commitment to a goal, mutual responsiveness in the pursuit of the commitment, role-reversal, and mutual support (Bratman, 1992; Tomasello & Moll, 2010). Thus, shared intentionality can be construed as a transient state of collective being that participants in joint action strive to attain and maintain, or in the terminology of conversation analysis, as an interactional achievement (Schegloff, 1982, 1995). As a result, shared intentionality is an ongoing process in joint action. At the same time, in most joint actions, certain phases can be distinguished that are particularly important to its achievement.

Entering into, maintaining, and dissolving shared intentionality: Phases in joint action

Joint actions are typically initiated with participants working toward establishing joint commitments. These initial steps of recruiting and ratifying participants (consider Ed approaching John on the street and saying, “Excuse me, do you have a minute?”), coconstructing the content and nature of the interaction and deciding on action location and timing (Goffman, 1981; H. Clark, 2006; Kendon, 1976, 2004) lead to the emergence of a section of the interaction variously termed the initiation, entry, or (hereafter) opening phase. This can be divided into two subphases. In the preentry (attracting attention, checking availability, ratifying participation), participants establish a joint commitment to interact. In the entry, they establish a joint commitment to engage in a specific joint action and to the details of its timing and implementation. In spontaneous joint actions, like picking up a bench, these commitments typically emerge incrementally. If Ed asks John to help him pick up a bench, John might first commit to the action overall, and then they might each commit to picking up one side of the bench and finally, lifting up their sides at the exact same time.

Once participants engage in joint action (e.g., carrying the bench from A to B), they must coordinate progress between and within its different steps (Bangerter & H. Clark, 2003) in what is called the main body (H. Clark, 1996). Participants coordinate to put together their efforts in an optimal way, which involves correctly anticipating what one’s partners will do and timing one’s own actions (i.e., doing the right thing at the right time; Sebanz & Knoblich, 2009). Progress within the main body typically is accomplished via ad hoc turn-taking (Levinson, 2016). For example, two children engaging in pretend play might coordinate switching roles within the play session. Or teams might switch sides at halftime in a soccer match. Sometimes, participants may agree to suspend turn-taking rules to allow some of them to take the initiative for a while, as when one participant tells a story to others (Mandelbaum, 2012). Signals are also exchanged during the main body to reaffirm joint commitments (i.e., to reassure partners that the intent of the action is the same). In rough-and-tumble play, for example, ongoing signal exchange is instrumental to reduce the risk of escalation into real aggression (Bekoff, 2001). Joint actions are often interrupted by some external event; when this happens, participants collaborate to coordinate suspending it in an orderly way, and reestablish a sense of joint commitment when reinstating it after having dealt with the interruption. For example, they may ask permission to suspend the interaction, apologize for keeping their partners waiting, check availability when attempting to reengage or justify the necessity to suspend before reconstructing the topic (Bangerter, Chevalley, & Derouwaux, 2010; Chevalley & Bangerter, 2010).

Finally, to complete a joint action, participants first need to arrive at the mutual conviction that they are both indeed ready to terminate it. In human interaction, participants communicate this readiness through the exchange of signals, such as okay, ensuring that potentially unraised topics can be addressed (Bolden, 2008; Schegloff & Sacks, 1973). Then, they progress through several steps including well-wishing or suggesting continuity of the relationship, reminiscing about the encounter, exchanging leave-taking signals such as good-bye, and finally, taking leave of each other by hanging up a telephone or walking away (Albert & Kessler, 1976; Bangerter, Clark, & Katz, 2004; Broth & Mondada, 2013; H. Clark & French, 1981; Schegloff & Sacks, 1973). These steps collectively comprise the termination, exit, or (hereafter) closing phase of a joint action. We distinguish two steps of preexit (establishing mutual awareness of the readiness of participants to end the encounter) and exit (terminating the encounter) (Schegloff & Sacks, 1973). This step-wise closing process allows participants to maintain interpersonal relationships beyond the encounter. Violations of conventions, as in unanswered good-byes, can pose threats to relationships and thus participants cooperate to avoid such failures (Schegloff & Sacks, 1973).

Sometimes, opening and closing phases are reduced or even absent. This does not mean that shared intentionality is not achieved or that it is attained automatically; rather, such cases reflect the operation of interpersonal conventions or institutionalized procedures. For example, in rule-based games (as opposed to free play), preexisting common ground provides players with shared behavioral routines and rules that preempt many of the coordination problems. Games thus feature reduced entry and exit phases because participants share understanding about the features of the joint actions they are engaged in (H. Clark, 1996). This can happen also in everyday situations in which individuals share the same physical environment for an extended period of time (e.g., copassengers in a car, workers sharing an open plan office, toddlers spending their day together in a kindergarten). These situations create a state of incipient talk (Schegloff & Sacks, 1973). It is easier for participants to initiate and terminate focused interactions without necessarily engaging in fully fledged opening and closing procedures. For example, an activity may lapse and be picked up again later or an extended pause in a conversation may occur without being interpreted as inappropriate (Berger, Viney, & Rae, 2016), and so on. While states of incipient talk may simplify the requirements of overt communication, they do not completely obviate the need for them, and participants often still mark encounters within such environments, however fleetingly (González-Martínez, Bangerter, & Lê Van, 2017).

Is the interactional achievement of shared intentionality unique to humans?

Many species of animals engage in joint actions on a daily basis with members of their group. For example, primates groom each other (Fedurek & Dunbar, 2009), chimpanzees hunt collectively (Boesch, 2002), and many species of animals engage in rough-and-tumble play (Palagi et al., 2015). These activities qualify as cooperative since they require that participants coordinate their behaviors both in time and space. However, human joint action seems unique in the animal kingdom. According to Tomasello, Carpenter, Call, et al. (2005), it is the ability to engage in shared intentionality, or “togetherness,” that constitutes the crucial difference between humans and other species. Indeed, participation in interactions involving shared intentionality has transformed human cognition in fundamental ways and underlies other unique human abilities such as language, cultural learning, and pretense (Tomasello & Moll, 2010).

In a similar vein, Levinson (2006b) suggests that the properties of human joint action are expressions of a uniquely human set of capabilities and motivations for social interaction, the human “interaction engine.” These include special communicative abilities, such as multimodal signal use (Levinson & Holler, 2014) and alternations of speaker–recipient turns in conversation (Levinson, 2016), special cognitive abilities, such as shared intentionality, and other ethological outputs, such as leave-taking rituals (Levinson, 2006a). Taken together, the elements of this engine enable human “cognition-for-interaction” (Levinson, 2006a) in a way that is independent of language, even though language has evolved as one of the primary means by which humans coordinate joint action. By enabling sophisticated forms of joint action, the human interaction engine paves the way for the emergence of cumulative culture and the development of social institutions. Levinson and Holler (2014) further suggested that the human interaction engine emerged as a step in an increasingly stratified system of communicative competencies that mark the evolution of modern human communication and has potentially evolved around 2 mya with the early forms of Homo.

Thus, it seems unlikely that the ability to engage in shared intentionality appeared suddenly with the genus Homo and seems likely that some components of this ability may be found in other species, at least in our ape ancestors. Indeed, apes seem to possess some abilities necessary for understanding shared intentionality, like reading others’ attention (Tomasello, Call, & Hare, 1998) and intentions (Call, Hare, Carpenter, & Tomasello, 2004; Call & Tomasello, 1998). They are also capable of communicating multimodally to convey meaning (Genty, Clay, Hobaiter, & Zuberbühler, 2014; Hobaiter, Byrne, & Zuberbühler, 2017) and engaging in gestural turn-taking (Fröhlich, Kuchenbuch, et al., 2016; Rossano, 2013). But they have difficulties participating in activities involving shared attention (Melis & Tomasello, 2013; Tomasello & Carpenter, 2005; Tomasello, Carpenter, Call, et al., 2005). Other abilities related to shared intentionality have been claimed to be missing as well, like the ability to communicate declaratively to share attention and interest about external objects or events (Call & Tomasello, 2007; Plooij, 1984; but see Hobaiter, Leavens, & Byrne, 2014), to engage in triadic joint attention (Tomasello, Carpenter, & Hobson, 2005; Tomonaga et al., 2004; Warneken et al., 2006; but see Pika & Zuberbühler, 2008), to engage in active teaching (but see Boesch, 1991) and to offer unprompted help (Warneken, Hare, Melis, Hanus, & Tomasello, 2007). All of these cases taken together suggest that the major difference between apes and humans seems to be the ability and motivation to share psychological states with others (Call, 2009; Tomasello, Carpenter, Call, et al., 2005), which emerges early in human ontogeny at around 1 year of age, as infants start understanding and then participating in joint actions with others (Carpenter, 2009).

Much of the empirical evidence concerning the lack of shared intentionality in nonhuman animals is based on how they understand shared goals. This is tested, for instance, by establishing whether, when playing games with human partners, apes attempt to reengage reluctant partners after interruptions (Warneken et al., 2006). But reinstating an interrupted joint action is only one aspect of the more global interactive process described above by which individuals enter into, maintain, and dissolve a sense of togetherness. We thus call for a more holistic approach to study whether various animal species exhibit phases of opening, main body and closing when engaged in naturally occurring joint activities with peers. The extent to which these phases are observable and the complexity of their coordination could constitute a yardstick for a systematic comparison of the various components of shared intentionality among different species.

We suggest that social play among peers is an ideal test bed for this approach since it is a complex cooperative process that requires substantial on-the-fly coordination (Bekoff, 2001; Bekoff & Allen, 1998; Palagi, 2006) and improvisation, in comparison to other forms of social activities, and is widely shared (and thus comparable) across social species. Moreover, play is only possible given a shared understanding that the behavior implemented is playful and nonserious. In other words, while in many social activities one participant might force a partner into doing something (e.g., food sharing or sex), engaging in social play requires both participants’ understanding of the irrealis nature of the joint action (E. Clark, 2009) and therefore that they both have a shared intention to play.

So far, however, communicative signals in play have typically been studied in isolation as means to signal particular intentions (e.g., the intention to initiate or to terminate play), but the role of those intentions within the overall cooperative activity of play (i.e., as means to articulate the different phases of the activity to gradually reach a state of shared intentionality) has not been investigated. We thus propose studying how species enter into, maintain, and exit from play as a means of gauging shared intentionality. We outline our framework in the next section.

A comparative framework for the study of shared intentionality in social play

To enable systematic interspecies comparisons, play needs to be analyzed by means of a consistent framework based on human interaction. We now propose such a framework specifically for the study of naturally occurring play bouts. This approach is designed to encompass results from unimodal research (e.g., gestures only: Fröhlich, Wittig, et al., 2016) and research concentrating on specific moments of the bout (e.g., opening: Fröhlich, Wittig, et al., 2016; closing: Luef & Liebal, 2013; main body: Palagi, 2008). The framework takes a multimodal perspective (Levinson & Holler, 2014), including all communicative means, such as gaze, body orientation, behaviors, body postures, facial expressions, vocalizations, and gestures and analyzes play bouts in their entirety. This framework can produce a more holistic picture of play coordination and a more complete understanding of how shared intentionality is achieved.

As described earlier, human participants’ efforts to coordinate joint action emerge as a sequence of moves that can be divided into macro-level phases of opening, main body and closing. These phases allow constructing a framework for comparing data in social play of animals and human children concerning the presence of phases, the complexity of communicative or behavioral means deployed to articulate those phases, and the presence of markers of shared intentionality: joint attention and joint commitment, mutual responsiveness and coordination of role reversal, communicative turn-taking, reengagement after interruption, mutual support, and leave-taking rituals (Bangerter et al., 2010; Bratman, 1992; Gräfenhain, Behne, Carpenter, & Tomasello, 2009; Levinson, 2016; Schegloff & Sacks, 1973; Tomasello & Moll, 2010).

In Table 1, we provide a description of each phase, as well as subphases, the coordination problems that lead to the emergence of each subphase, and the behaviors and communicative signals typically or potentially deployed to solve those coordination problems. We distinguish two subphases for the opening (preentry and entry), five for the main body (continuation, type change, role reversal, and suspension and reengagement after interruption), and two for the closing (preexit and exit). In Table 1, subphases occur in the sequence depicted in opening and closing, but may vary in their exact sequence in the main body (i.e., they may occur in a different order, get repeated, or not occur at all, depending on the circumstances). The framework embodied in the table is derived from the phase structure of adult human joint action as described above, and provides a yardstick for analyzing the achievement of shared intentionality with which to compare animal and human play.

Table 1 Descriptions of macro-level phases and subphases of play, coordination problems, and observable behavioral outputs deployed to solve these coordination problems as potential markers of shared intentionality

The opening is a phase in which participants establish joint commitment to engage into play (see Table 1). We distinguish two subphases of preentry and entry. Preentry involves selecting participants, orienting toward and approach with the aim of attaining a state of joint attention, and making sure they are ready and willing to interact by establishing joint commitment to an as yet unspecified action. Next, in the entry subphase, they jointly commit to the nature (through species-typical play initiation signals) and location (through potentially deictic signals and behaviors) of the play bout, and time its actual beginning.

The main body involves the actual play bout. It begins when play starts, with the first body contact for contact play (e.g., rough-and-tumble play; Palagi et al., 2015) or the first chase movement for chase play (e.g., Pozis-Francois, Zahavi, & Zahavi, 2004). The main body is itself composed of various subphases, depending on how the play bout unfolds. Play continuation signals serve to coordinate on the willingness to continue the bout and may occur when participants encourage less active partners. Laughter and play faces in primates (Davila Ross, Owren, & Zimmermann, 2010; Demuru, Ferrari, & Palagi, 2015) are examples. Participants may coordinate a play type change (e.g., from contact to chase play). They may reverse roles (e.g., from being the chaser to being chased; see Table 1). If an interruption occurs, for example through a loud noise, participants need to coordinate on the suspension and the possible reengagement of the play bout.

Finally, closings allow participants to exit from their joint commitments, thereby preventing possible hostile escalations and maintaining social bonds beyond the play action. Participants typically articulate intentions to end play before actually doing so (preexit), for instance, through behavioral or communicative efforts that reduce play intensity or tempo. Preexits may be followed by terminations of the interaction (exit). Participants might use leave-taking signals to signal the termination of the interaction. Closing processes may allow participants to maintain interpersonal relationships and to prolong the feeling of togetherness beyond the encounter (Albert & Kessler, 1976). An exception is for abovementioned states of incipient talk, where individuals share the same environment for prolonged time periods (such as animals living in captive conditions or infants spending their day together in a kindergarten), and where opening and closing phases might be attenuated or simply disappear. Compared to other phases, the structural features of closings in animal play may be attenuated (or absent). If this is the case, it would point to the lack of a sense of togetherness as a feature of animal play or children’s play.

In the next section, we review the human infant and animal play literature in search of evidence that social play is organized into macro-level phases of opening, main body, and closing, and for behaviors and communicative signals used to coordinate them. Guided by the idea that the sequential coordination of joint action in macro-level phases is how people incrementally get into and out of the state of shared intentionality, we aim at looking for behavioral markers of shared intentionality in human children and animal social play coordination.

Applying the framework: Social play in children and animals

Social play in children

Children express social interest in one another from a very young age, but before the age of 18 months, peer social interactions are rare and poorly coordinated (Brownell & Brown, 1992; Eckerman & Peterman, 2001). Cooperative forms of play among peers, such as imitative games, increase between 20 and 24 months of age alongside the emergence of more sophisticated interactional skills to initiate, maintain, and coordinate activities (Eckerman, Davis, & Didow, 1989). Between 24 and 30 months of age, children reliably cooperate with each other in problem-solving tasks, but younger children do not (Brownell & Carriger, 1990). It is only during the third year of life, as the child’s social understanding and language about self and other develops and they begin to care about social norms and rules of games (Rakoczy & Schmidt, 2013; Rakoczy, Warneken, & Tomasello, 2008), that social games become more coordinated and cooperative (Brownell, Ramani, & Zerwas, 2006; Eckerman & Didow, 1996; Verba, 1994). Indeed, by taking into account their partners’ intentions and by monitoring, timing and sequencing their own and their partner’s actions, children can adjust their behavior appropriately to attain a shared goal (Barresi & Moore, 1996; Brownell et al., 2006; Smiley, 2001). In terms of the cognitive abilities necessary for shared intentionality understanding, from 12 months on children possess the motivation to inform others and to share attention and interest via declarative pointing (Liszkowski, Carpenter, Striano, & Tomasello, 2006), but also understand and engage in role reversal (Carpenter, Tomasello, & Striano, 2005). Furthermore, unlike chimpanzees, children from 18 months on attempt to reengage reluctant partners after interruption of a shared game (Warneken et al., 2006) and show mutual support by helping others achieve their goal (Warneken & Tomasello, 2006). Taken together, then, children, already possess a “we” intentionality (shared intentionality) at least from 14 months of age and act cooperatively, but it is only by 3 years that they become sensitive to joint commitments and begin to understand the obligations and conventions involved in joint action (Gräfenhain et al., 2009; Gräfenhain, Carpenter, & Tomasello, 2013; Kachel, Svetlova, & Tomasello, 2017).

It is unclear how children interactionally achieve shared intentionality via the coordination of phases in naturally occurring joint action with peers. Thus, we now review evidence for how children communicate, verbally or nonverbally, to coordinate social play into macro-level phases of opening, main body, and closing, focusing on two types of play: rough-and-tumble play (hereafter R&T) and pretend play. We focus on these types because they pose particularly intricate coordination problems. R&T requires coordinating physical actions on the fly and carries the risk of escalating into aggression, whereas pretend play requires coordinating the pretense. Indeed, pretend play has been argued to constitute one of the earliest ontogenetic cases of true shared intentionality (Rakoczy, 2008). We highlight potential behavioral markers of shared intentionality in the coordination of play phases, such as establishment of joint attention and joint commitment in the opening phase, coordination of role reversal, communicative turn-taking, mutual support, and reengagement after interruption in the main body phase, and leave-taking rituals in the closing phase.

The general characteristic of R&T is that the behaviors performed seem agonistic but are performed in a nonserious context (Smith, 1997) with friends (Blurton-Jones, 1972; Smith, Smees, & Pellegrini, 2004). Human R&T consists of chasing, fleeing, wrestling, grappling, pinning down, and delivering restrained blows (Blurton-Jones, 1972; Fry, 2005). To coordinate R&T and avoid escalation into real aggression, children need to metacommunicate to ensure mutual awareness of playful intentions (Fry, 2005; Smith & Boulton, 1990). The patterns of R&T and the play signals involved appear to be widely comparable across cultures (Fry, 2005). In the opening phase, children often initiate R&T and ratify participants by hitting, pulling hair, using verbal insult, but at the same time exhibit playful signals, including smiles, laughter or giggles (Fry, 2005; Smith & Boulton, 1990). In the main body, ongoing signal exchange is instrumental to reduce the risk of escalation into real aggression (Bekoff, 2001). For example, smiles and giggles increase when physical contact becomes rougher (Fry, 1987). The cooperative aspect of alternation of turns (role reversal) and reciprocity (mutual responsiveness) is also essential to attenuate competition (Pellis & Pellis, 2017). Role reversals may involve the stronger partner giving advantage to the weaker participant (Fry, 1987). The termination of R&T requires active cooperation by players to deescalate fights (e.g., by turning their body away from their partner; Fry, 2005) or using linguistic markers like “mercy” (Sluckin, 1981). Moreover, after R&T and in contrast to real aggression, partners tend to remain in each other’s company (Aldis, 1975; Fry, 1990; Humphreys & Smith, 1984; Smith & Lewis, 1985), suggesting continuation of the relationship beyond the interaction.

Social pretend play, or make-believe play, refers to an activity in which children transform “the Here and Now, You and Me, or the action potential in these features of the situation” (Garvey & Berndt, 1975, p. 4) into some shared imaginative framework (Rakoczy, 2006). Behaviors in pretense are nonliteral or simulative (Fein, 1981; Lillard, 1993). For example, children may engage in object substitution, (e.g., pretending a banana is a telephone) or object imagination (e.g., pretending there is a pillow although there is none; Lillard, 1993). The ability to interpret behaviors as “not real” appears in toddlers as early as 18 months of age (Lillard & Witherington, 2004). Mothers play an especially active role in assisting the interpretation of pretend behaviors (Haight & Miller, 1992; Haight, Wang, Fung, Williams, & Mintz, 1999; Lillard, 2007; Miller & Garvey, 1984). The acquisition of symbolic understanding provides an essential fundament to engage in shared pretense with peers at a later stage in development (Bretherton, 1984; Lillard, 1993). Due to its symbolic nature, pretend play coordination is challenging and requires ample metacommunication between players to regulate shared symbolic frameworks (Bretherton, 1984; Stambak & Sinclair, 1990).

In the opening phase, children often initiate pretend play through imitation of the peer’s actions, by performing an action complementary to the peer’s, by joining the peer’s manipulation of material, or by offering appropriate objects to assist a partner setting up a shared pretense scene (Garvey, 1977; Giffin, 1984; Nelson & Seidman, 1984; Ramani & Brownell, 2013; Schwartzman, 1978; Stockinger Forys & McCune-Nicolich, 1984). Before the development of full speech, imitation of a peer’s nonverbal actions represents an important behavioral strategy to achieve coordination (Eckerman et al., 1989). Later, at the preschool stage, coordinating pretend play involves more complex forms of cooperation, as linguistic and sociocognitive skills advance (Garvey, 1977; Miller & Garvey, 1984). Prior to the start of the game, children can establish joint commitment by determining the type and location of the game and the assignment of roles with the use of explicit verbal social bids, for example, “Let’s play house”; “this is the kitchen”; “I’ll be the patient and you’ll be the doctor, ok?” (Bretherton, 1984; Garvey, 1974; Giffin, 1984). Common ground plays a major role in setting up pretend scenarios, especially when activities reflect an event structure borrowed from real-life cultural activities (e.g., baking a cake). To ensure smooth coordination during the main body of pretend play, children need to cooperate by communicating their own—but also accepting other’s—ideas (Ramani & Brownell, 2013). Preschoolers engage in a continuous process of negotiating, discussing, improvising, and proposing new features within the game (Ramani & Brownell, 2013). In order to introduce new ideas or to change rules to the current scene, children produce metacommunication or stage directions (E. Clark, 2009; H. Clark, 2016). For example, children may use the past tense to suggest future actions (e.g., “You said you were going to the ball”; E. Clark, 2009) or negotiate roles (e.g., “I’m the mommy now”; Bretherton, 1984; Giffin, 1984). Cooperation by players to maintain joint pretense is especially manifested in their efforts to avoid interruptions, suggesting that children have a mutual understanding that their actions contribute to the joint action (Schwartzman, 1978). In the closing phase, children terminate play by displaying leave-taking signals such as meaningful looks; gestures; verbal markers (Gräfenhain et al., 2009), such as “Let’s not play this anymore” (Garvey, 1974; Schwartzman, 1978); or statements helping a player to abandon the play identity, such as “I’m not the dragon anymore” (Garvey & Berndt, 1975; Schwartzman, 1978).

According to Tomasello and Moll (2010), the participation in joint action involving shared intentionality underlies several unique human abilities, including pretense. Pretend play is indeed a complex form of play that is specific to humans (Gómez & Martin-Andrade, 2005; Vygotsky, 1967), although it may not be common to all cultures (Gaskins, 2013). Although children seem to be able to engage in joint action from 12 months on (Carpenter, 2009) and to understand shared intentionality from 14 months (Tomasello & Moll, 2010), it is only around 18 months that they start understanding pretense (e.g., Lillard & Witherington, 2004). And it is even later, around the third year of age, as they start understanding social conventions and developing more sophisticated linguistic competences, that they engage in more coordinated social pretend games (Brownell et al., 2006; Eckerman & Didow, 1996; Verba, 1994). Because the coordination of pretend play relies mostly on verbalization, it is easier to identify markers of shared intentionality in this form of play (i.e., joint commitment; e.g., “Let’s play doctor together”), role reversal (e.g., “Now it’s your turn to be doctor”), reengagement after interruption (e.g., “Come back, we’re not finished!”), mutual support (e.g., “This is the right box”), and leave-taking signals (e.g., “I don’t want to play anymore”).

R&T play, on the other hand, is widely comparable across human cultures (Fry, 2005) and also common to many animal social species (Palagi et al., 2015; Pellis & Pellis, 2017). Although R&T relies less on verbalization, its coordination still requires metacommunication to avoid escalation into real aggression. We thus suggest that the comparative study of how different species solve the various coordination problems inherent in R&T could provide a promising tool to shed light on the evolution of the human unique motivation to share psychological state with others and its special “cognition-for-interaction.” In the next section, we review evidence on how different species of mammals and birds achieve coordination in R&T, and what behaviors and communicative signals are used to articulate the structure of play into opening, main body and closing phases, looking for markers of shared intentionality.

Social play in nonhuman animals

In animals, R&T involves behaviors that resemble fighting (e.g., wrestling, tumbling, chasing) but lack key characteristics of agonistic behaviors: threats are rare or absent, muscles are relaxed, biting is inhibited, and nonserious intent can be communicated via play faces and play vocalizations (Palagi et al., 2015; Palagi, Antonacci, & Cordoni, 2007; Pellis, 1984; Smith, 1997). For R&T to be distinguishable from competition and to remain enjoyable, it requires a certain degree of reciprocity (Pellis & Pellis, 2017). Reciprocity is achieved by partners through cooperation, for example, by giving the advantage to a currently overpowered partner (e.g., Pellis & Pellis, 2017). We review the evidence starting with species for which communicative signals and behaviors have been documented in the coordination of each phase of play, including openings, main bodies, and closings. These species include gorillas, black bears, red-necked wallabies, red kangaroos, and Arabian babblers.

In the opening phase, gorillas (Gorilla gorilla) select play partners with ample, silent gestures or audible gestures to attract their attention (e.g., arm shake; Tanner, 2004). Once shared attention is established, one-handed grab is commonly used to initiate contact play, while drum object is used to initiate chase-play (Genty, Breuer, Hobaiter, & Byrne, 2009). During the main body phase of play, play faces and laughter serve to maintain play (Palagi et al., 2007). Play faces are more frequent and intense compared to gentle play, and bouts that include full play faces (upper teeth exposed) are longer than those with play faces (upper teeth covered) (Waller & Cherry, 2012). To reengage partners following interruptions, individuals animate objects and show them to partners to reestablish mutual attention toward the game (Tanner & Byrne, 2010). In the closing phase, to exit from play, gorillas use hand-on and pirouette gestures (Genty et al., 2009; Luef & Liebal, 2013).

Black bears (Ursus americanus) select play partners via approaches in which they communicate intent through subtle positioning of their ears, that is, crescent ears (ears visible but turned laterally from the head; Henry & Herrero, 1974). Once joint attention is established, play is initiated with signals, such as pawing, biting, rearing (i.e., holding the forepaws off the ground in a sitting or standing position) and head butting. In the main body, play is maintained via signals such as the relaxed open-mouth face and breathing/panting sounds. When physical contact becomes intense, one of the participant faces its partner and moans. If moans are followed by further play, moaners often flatten their ears; if this is ignored, players risk being attacked by partners (Henry & Herrero, 1974; Pruitt, 1976), suggesting that flattening of ears signals an intention to terminate the interaction. In the closing phase, before leaving their partner by walking away or running away, bears often look away, lick partner, extend neck and head or shrug away (Henry & Herrero, 1974).

Red-necked wallabies (Macropus rufogriseus banksianus) select prospective play partners by approaching, high-stance posturing and orienting toward (Watson & Croft, 1993). Once joint attention is established, they initiate play by sniffing, skipping and grabbing the partner (Watson & Croft, 1993). Wallabies coordinate role reversal by self-handicapping their defense and giving the subordinate partner a chance to take advantage through standing flat-footed (Watson & Croft, 1996). In the closing phase, the termination of play is coordinated by one of the participants removing itself from the bout (e.g., by orienting away or moving away from their partners). After termination of R&T, partners often remain close or even face each other, suggesting potential awareness of mutual participation (Watson & Croft, 1996).

Red kangaroos (Macropus rufus) exhibit similar behaviors as wallabies to select participants: Approaching, high-stance posturing and upright body position (Croft & Snaith, 1990). They initiate play with pawing, head arching, and kicking (Croft & Snaith, 1990; Watson, 1998). To maintain reciprocity and coordinate role reversal during the main body phase, both partners self-handicap their movements (e.g., lowering their kicking rates) to increase the opponents chances (Croft & Snaith, 1990). Kangaroos terminate play by pushing away or pushing down their partners, a signal that reliably leads to terminations of fights between participants (Croft & Snaith, 1990).

Arabian babblers (Turdoides squamiceps) select partners via gaze alternation and initiate play with pendulums (one participant pushing the other from a branch and grab-holding its foot), crouching, holding up a twig on the ground or bowing signals (Pozis-Francois et al., 2004). Both crouching and bowing are also used to reengage partners after interruptions. If play becomes too rough, abrupt terminations are often signaled via vocal signals (Pozis-Francois et al., 2004). Play is usually terminated when participants stop moving, and exit is often followed by affiliative behaviors (allopreening) between players (Pozis-Francois et al., 2004).

Evidence for coordination of only some of the phases of play is also available for chimpanzees, bonobos, dogs, coyotes, wolves, dolphins, lemurs, rats, Visayan warty pigs, kakas, and keas.

In the opening phase, chimpanzees (Pan troglodytes) select prospective participants with audible gestures to attract attention, such as drum object (Hobaiter & Byrne, 2014). Once joint attention is established, they initiate play with gestures such as arm shake, dangle, gallop, head nod, head stand, object in mouth approach, poke, roll over, and stomp other with two feet (Hobaiter & Byrne, 2011, 2014). When soliciting play with same-age or younger individuals, participants cooperate by using self-handicapping gestures (Fröhlich, Wittig, et al., 2016). During the main body, signals such as play faces, laughter (Davila Ross et al., 2010; Preuschoft, 1992), and feet shaking (Hobaiter & Byrne, 2014) serve to maintain play and distinguish it from potentially serious actions. Galloping is occasionally used to decrease the intensity from chase-play to contact-play and, conversely, hand shaking to increase intensity from contact-play to chase-play (Hobaiter & Byrne, 2014). If interruptions occur during the bout, participants re-engage their partners by gesturing with feet shake, object in mouth approach, head stand or roll over (Hobaiter & Byrne, 2014).

Like chimpanzees, bonobos (Pan paniscus) maintain play bouts with play faces and laughter (Enomoto, 1990; Palagi, 2008). Play faces are more frequent when participants match in age and size and if play includes more physical contact (Palagi, 2008; Palagi & Paoli, 2007). Experimental evidence shows that bonobos reengage reluctant partners in a social triadic game if conditions are ecologically relevant (e.g., include species-specific gestures and naturalistic play objects; Pika & Zuberbühler, 2008). Reengagement signals used to reinstate play in bonobos are begging or grabbing gestures, often combined with facial expressions, such as protruded lip displays (Pika & Zuberbühler, 2008).

To establish joint attention with prospective play partners, dogs (Canis familiaris) use attention getters such as barking (Bekoff, 1974; Horowitz, 2009). To initiate play they use signals such as face-pawing, bowing (Bekoff, 1977; also used by coyotes, Canis latrans and wolves, Canis lupus; Bekoff, 1995), alternating approach-withdrawals (Bekoff, 1974), leap on, bow head, and play slap (Horowitz, 2009). In the main body, participants coordinate on changes of the play type with the use of body biting, side-to-side head shaking, and roll over to switch from chase to wrestle-play (Bekoff, 1974, p. 332). To regulate role reversals and promote reciprocity during the bout, dogs self-handicap via inhibitory bites (Bauer & Smuts, 2007). To reengage a reluctant partner dogs use biting, pawing, barking, nosing, bumping, exaggerated retreating, and presenting (Horowitz, 2009).

Kakas and keas (Nestor notabilis and Nestor meridionalis) select prospective play partners with bouncy hopping toward them and initiate play with signals such as head cock and roll over displays (Diamond & Bond, 2003). Keas also further initiate play with signals such as stiff-leg walk, directed gaze, or vertically toss objects (sometimes in direction of the partner). During play, both keas and kakas coordinate role reversals and maintain reciprocity through self-handicapping signals, such as rolling over and foot pushing (Diamond & Bond, 2004).

In ring-tailed lemurs (Lemur catta), the relaxed open mouth display serves to regulate and maintain social play (Palagi, Norscia, & Spada, 2014).

In bottlenose dolphins (Tursiops truncatus), signature whistles are characteristic sounds given during R&T with the possible function of giving feedback, promoting a playful mood and distinguishing play from aggression (Blomqvist, Mello, & Amundin, 2005).

In rats (Rattus norvegicus), role reversal is coordinated in the main body through self-handicapping behaviors, such as standing on the partner with four paws once the attacker has overpowered the subordinate player (Foroud & Pellis, 2003; Pellis & Pellis, 2017).

In Visayan warty pigs (Sus cebifrons), role reversal and reciprocity in the main body phase is regulated through the production of submissive signals, such as crouching (Pellis & Pellis, 2016, 2017). Crouching and swerving laterally by 90° or more also often leads to the termination of play (Pellis & Pellis, 2016).

Taken together, across species, this review shows that R&T seems to be organized in macro-level phases of opening, main body, and closing. Coordinating these phases relies on species-specific behaviors and communication. However, systematic comparisons are not yet possible, for several reasons. First, much research on animal play signals has mainly focused on communicative signals per se (Palagi, 2006; Palagi et al., 2015; Pellis & Pellis, 1996) and not as means to coordinate joint action. Second, the literature often only analyzes single phases of the bout and not the entire sequence as a potential achievement of shared intentionality. Many studies have analyzed the signals used to communicate the intention to initiate (e.g., Fröhlich, Wittig, et al., 2016) or maintain play (e.g., Palagi, 2006), but there is less evidence for the existence of a closing phase. This might indicate the absence of a “we” intentionality or togetherness feeling that would motivate individuals to maintain relationships beyond the encounter. However, there is suggestive evidence for markers of shared intentionality, such as establishment of joint attention, reengagement after interruption, role reversal, and potential leave-taking signals. This evidence is summarized in Table 2, which also includes evidence from children’s R&T. We find this promising and call for more comparative research on play as joint action according to our framework to shed light on the building blocks that gradually led to the evolution of the human unique motivation to interact cooperatively and share psychological states with others.

Table 2 Summary of the evidence on communicative signals and behavioral means used to coordinate phases in R&T play in human children and animals. Only subphases and species for which evidence is available are shown

Conclusion: Implications for studying play as joint action across species

The study of joint action in humans has led to a rich understanding of the interplay between cognition and communication in the coordination of interdependencies between individuals cooperating to achieve a shared goal. Here, we reviewed interdisciplinary research on joint action, which has revealed the importance of shared intentionality as a key feature of joint action in humans. We also described the interactive process by which shared intentionality is achieved, distinguishing between opening, main body, and closing phases. Social play, especially R&T play, represents an ideal test bed for a systematic comparative analysis of the interactional achievement of shared intentionality because it requires on-the-fly coordination and improvisation in comparison to other social activities and because it is widely shared across species. Applying a joint action framework to comparatively study social play could offer some insight into the evolutionary significance of social play and shed light on the evolution of human unique motivation to interact (cognition-for-interaction; Levinson, 2006a) and share psychological states with others (shared intentionality; Tomasello, Carpenter, Call, et al., 2005).

Our framework allows testing the relationship between species’ abilities to solve the different coordination problems in play (see Table 1) and their overall cooperativeness. It suggests a principled approach to explore the existence of potential components of shared intentionality and how it is achieved in the interactions of nonhuman animals. This could expand the range of situations where evidence of shared intentionality has been looked for. For example, an influential test is based on experimental evidence obtained from chimpanzees and children playing games with experimenters. When cooperative games were interrupted, children tried to reengage experimenters, but chimpanzees did not (Warneken et al., 2006). This constitutes evidence that chimpanzees do not have a sense of being jointly committed to the same activity and sharing the goals (Warneken et al., 2006). While this study uses an interruption in a joint action to draw conclusions about shared intentionality, our framework theorizes shared intentionality as an interactional achievement, suggesting a range of occasions potentially related to the establishment, maintenance, change/negotiation, interruption, reengagement and dissolution of joint actions that may constitute situations for exploring shared intentionality. Moreover, while the Warneken et al. (2006) study is experimental and features interactions with a human caretaker in the context of an artificial game, our framework suggests the potential for investigating the achievement of shared intentionality in naturally occurring joint actions between conspecifics. Play is a prime example of such an activity because it is intrinsically cooperative. Finally, by enabling the systematic study of different species, our framework opens up the possibility to discover more nuanced aspects of shared intentionality. For example, using more naturalistic triadic play interactions in bonobos, Pika and Zuberbühler (2008) found that subjects were very active in their attempts to reengage a human partner, something that is regularly reported from dogs interacting with humans in such ways.

Of course, the fact that species perform coordinative behaviors superficially similar to those humans perform in opening and closing phases does not necessarily constitute evidence of shared intentionality. For example, animal species that produce leave-taking signals may not have the same understanding or interpretation of what they are doing that humans would have. In other words, similarities in behavior do not necessarily reflect similar shared understandings of the situation (in this regard, experimental studies that test the flexibility of the behaviors remain crucial).

Taken together, our analysis of play as joint action reveals insights into species’ capacities to coconstruct a state of shared intentionality through the orderly process of play coordination. Such an insight permits to recreate the building blocks that may have led to the fully fledged cognition-for-interaction (including shared intentionality) underpinning human joint action. Since many of the key attributes taken as evidence for shared intentionality in humans (e.g., joint commitment, mutual responsiveness, and role reversal; Bratman, 1992; Tomasello & Moll, 2010) also characterize R&T play in many other species, we conclude that the extensive practice of social play may have contributed to the evolution of cognition-for-interaction in humans (Levinson, 2006a).