Word and deed: A computational model of instruction following

doi:10.1016/j.brainres.2011.12.025

Brain Research

Volume 1439, 23 February 2012, Pages 54-65

https://doi.org/10.1016/j.brainres.2011.12.025 Get rights and content

Abstract

Instructions are an inextricable, yet poorly understood aspect of modern human life. In this paper we propose that instruction implementation and following can be understood as fast Hebbian learning in prefrontal cortex, which trains slower pathways (e.g., cortical–basal ganglia pathways). We present a computational model of instruction following that is used to simulate key behavioral and neuroimaging data on instruction following. We discuss the relationship between our model and other models of instruction following, the predictions derived from it, and directions for future investigation.

Highlights

► We propose a model of how people rapidly encode and implement instructions. ► The model is based on a computational tradeoffs framework and has Hebbian learning at its core. ► It is shown to be consistent with both behavioral and neuroimaging data on instruction following.

Introduction

There are many types of learning. Human beings can learn through trial-and-error by interacting with the environment (Thorndike, 1911). This, however, is costly, time-consuming, and dangerous. Consider for example learning the consequences of touching fire or eating a strange berry, by trying it out. Learning by observation can circumvent the costs associated with trial-and-error (Bandura, 1977). With the advent of language, yet a third option became available. Verbal information sharing promotes group cohesion while facilitating learning at reduced temporal cost. Understandably, learning from instructions (often verbal) became an integral part of the learning repertoire of the human brain.

From to do lists to software manuals, instructions influence the lives of modern humans on multiple levels. Yet, how they are understood and implemented by the brain remains a mystery (Monsell, 1996). On the empirical side, a few recent studies have explored the effects of instructions on performance. A few themes emerge when the literature is surveyed. First, instructions can be rapidly and accurately implemented without explicit training (Ruge and Wolfensteller, 2010). Second, even verbally instructed mappings that have never been applied, can interfere with well-applied mappings if the two sets of mappings share stimulus dimensions (Cohen-Kdoshay and Meiran, 2009, De Houwer et al., 2005, Waszak et al., 2008). A few recent studies have also explored the neural correlates of instruction following (Brass et al., 2009, Cole et al., 2010, Hartstra et al., 2011, Ruge and Wolfensteller, 2010). Third, dissociations may arise between instructions and their implementation: In particular, patients exhibiting goal neglect can verbally report the required instructions without being able to implement them (Duncan et al., 1996, Luria, 1966).

On the theoretical front, instructions are crucial yet unexplained components in theories of cognition. For example, models of cognitive control typically incorporate task representations which implement the demands imposed by the task at hand (Botvinick et al., 2001, Cohen et al., 1990, Verguts and Notebaert, 2008). However, investigations of instruction following itself are less common. One exception is the computational modeling work by Noelle and Cottrell (1995). More recently, Helie and Ashby (2009), Helie et al. (2010) proposed a computational account of learning that begins with explicit rules and ends with procedural knowledge. However, they do not explore the acquisition of the rule itself. Doll et al. (2009) recently proposed a model of instruction control of reinforcement learning. In their model prefrontal cortex (PFC) projects directly to motor cortex and the basal ganglia (BG) to select responses consistent with the instructions.

Here, we follow up on the Doll et al. (2009) approach and consider instruction learning and implementation as instantiations of Hebbian learning. For this purpose, we combine two complementary models of learning and automatisation, namely SPEED and COVIS. In SPEED (Ashby et al., 2007), learning occurs initially in the basal ganglia and is eventually transferred to cortex with an attendant increase in automaticity. In the COVIS framework (Ashby et al., 1998, Ashby et al., 2011), performance is governed by two systems — a rule-based one (dependent on prefrontal cortex) and a procedural one (dependent on BG). We propose that a more general model that combines the main features of both SPEED and COVIS is suited to explain various forms of learning, including instruction following. In this framework, when instructions are provided, the prefrontal cortex learns them quickly, but executes them slowly (Boureau and Dayan, 2010, Daw et al., 2005). Indeed, novel learning typically activates prefrontal cortex (e.g., Miller and Cohen, 2001, Toni et al., 2001). Upon repeated application, the BG (which learn more slowly but execute more quickly) pick up the appropriate stimulus–response mapping by Hebbian learning, where the appropriate response is provided by the prefrontal route. Finally, after extensive application another cortical pathway would take over (hyperdirect pathway; Ashby et al., 2007).

With this general framework, we describe and test a model that focuses on the acquisition and transfer of instructed mappings. In the next section we describe the model and discuss its biological plausibility. This is followed by the simulation studies. Theoretical considerations and empirical predictions are elaborated in the General discussion.

The first route in the model is indirect (left part of Fig. 1); here, new instructions (such as “if you see a hexagon, press the left arrow key” and “if you see a square, press the right arrow key”) can be rapidly learnt. The second one is the direct route (right part of Fig. 1); it gradually picks up the regularities implemented by the indirect route.

In the indirect route, the instruction is represented in terms of its components. One component contains stimulus representations (e.g., “hexagon”) and the other, response representations (e.g., “press right key”). These two components are typically (but not necessarily) verbal, encoded by two distinct layers (see Fig. 1) and related to sensory and motor areas via long-term memory (temporal lobe (TL) and premotor cortex, respectively; see Fig. 1). By premotor cortex we refer to the human analog of the dorsal premotor cortex (PMd) in monkeys, which is a region associated with abstract motor planning (Nakayama et al., 2008).

In the case of verbal instructions, stimuli and responses are connected to their verbal equivalents (e.g., “hexagon”, “left key”). It is reasonable to assume that a tight association between an object or attribute and its verbal analog comes to be encoded during development (Fischer and Zwaan, 2008). Also, action verbs evoke activation of motor representations (Fischer and Zwaan, 2008, Hauk et al., 2008). The two components of the indirect route are linked together by PFC; in particular, PFC subregion Inferior Frontal Junction (IFJ) appears to be a candidate for this role given that it is active in circumstances that require task-set switching or the loading of novel task sets (Derrfuss et al., 2005, Derrfuss et al., 2009), both of which require flexible verbal mapping. In the model, associating stimulus and response representations is achieved by fast Hebbian learning during the instruction phase. Neurally, the associative striatum is probably also part of the indirect path (Ashby et al., 2010), but we don't include it here for simplicity.

The direct route includes, in addition to the stimulus and response areas, the basal ganglia. The circuitry of the cortico-striato-pallido-thalamo-cortical pathway (Mink, 1996) is approximated by a one-layer excitatory path. In particular, fronto-striatal loops are not included, and the direct path can be considered a simplified version of subcortical control of action selection (e.g., Ashby et al., 2010, Dominey, 2005, Frank, 2005, Frank, 2006). The direct route gradually acquires stimulus–response associations by Hebbian learning, where the correct stimulus–response pairs are provided by the indirect route. The hyperdirect cortical pathway mentioned in the introduction (e.g., Ashby et al., 2007) is not currently implemented.

It may be argued that the characterization of the indirect route as one having more intermediate steps than the direct one is contrary to the actual neural organization, where the BG has more intermediate synapses (e.g., Mink, 1996). However, the layout is consistent with the generally acknowledged finding that the PFC route is a slow processing route (e.g., Miller & Cohen, 2001). More generally, the number of synapses between two processing layers may be an imperfect measure of processing speed. Nevertheless, we report simulation studies that explore the influence of varying the respective path lengths in the two routes.

Information flows along the directions indicated by arrows in Fig. 1. In each trial, activation in the stimulus layer is clamped, i.e., set at a specific value as opposed to allowing the layer to reach the activation level over time. Stimuli are represented by localist coding in a vector with the element corresponding to the stimulus set to 1 and all other elements set to zero. Activation of other model units is described by standard difference equations of the form (activation of input and output units denoted x and y, respectively): $y_{j} (t) = τ y_{j} (t - 1) + (1 - τ) \sum_{i} x_{i} w_{ij}$ where y_j(t) is the activation of the jth output unit at time t in the trial, x_iw_ij is the net input from unit i to unit j, and τ is a cascade rate parameter (set at 0.9). A response is chosen if one of the response units reaches a threshold of 1. Reaction times (RTs) are calculated by counting the number of activation cycles (within a trial) needed to reach this threshold.

We now describe how learning occurs in the model. Initially, all weights are random values sampled from a uniform distribution between 0 and 0.01. After response, a competitive process selects the most active unit in the PFC and the most active unit in BG. Typically, the winner's activation in a competitive model is a function of its original activation before the competition (e.g., Grossberg, 1973). As a simplified implementation of this process, we scaled the winner's activation (stimulus j) after competition by $\frac{1}{2} (1 - e^{- N_{rep} (j)})$ , with N_rep (j) the repetition number of stimulus j in the trial at hand. In the instruction phase, learning occurs only in the connections between stimulus representations, PFC and response representations. In the test phase, learning occurs only in the BG. In both layers, learning follows the rule given below from trial n – 1 to n: $w_{ij} (n) = w_{ij} (n - 1) + λ (x_{i} - d w_{ij} (n - 1)) y_{j}$ where d is a weight decay parameter, set to 0.1. The term dw_ij(n− 1) is subtracted from the input x_i to constrain the learning process in the direction of the input (as is typical in a Hebbian/competitive learning algorithm, e.g., Fritzke, 1997).

The fact that only the PFC learned in the instruction phase, and the BG only in the test phase, implemented our assumption of fast learning in PFC. In their respective phases (instruction and test, for PFC and BG respectively), the learning rate was λ = 9 for each layer. In the simulations, we also explore the case when PFC learns in both instruction and test phases.

The model provides a unified mechanism for rapidly implementing instructions as is required in a variety of contexts (e.g., cognitive control; Botvinick et al., 2001, Cohen et al., 1990). To provide points of contact with empirical data, the results of two recent studies (Ruge and Wolfensteller, 2010, Waszak et al., 2008), dealing directly with instructions were simulated. The first study (Ruge and Wolfensteller, 2010) describes the transition from instructed to implemented mappings at both behavioral and neural levels. To test different hypotheses regarding the architectural and parametric features of the model, different versions of the model were built and tested using the Ruge and Wolfensteller paradigm as detailed in the methods section (4.1.1, 4.1.2 and 4.1.3). The second (Waszak et al., 2008) describes a paradigm for investigating the influence of merely instructed mappings on performance. This study describes the basic interference and congruence effects reported in other studies (Cohen-Kdoshay and Meiran, 2009, De Houwer et al., 2005). Finally, a modified version of the Ruge and Wolfensteller instruction paradigm was used as an initial computational exploration of goal neglect (Duncan et al., 1996).

Instructions can be implemented with a high degree of accuracy on the very first trial (e.g., Cohen-Kdoshay and Meiran, 2009, Cole et al., 2010). With increasing practice, the mapping loses novelty and becomes automatic as reflected, for example, in RT. Ruge and Wolfensteller (2010) studied the transition from instructed to implemented stimulus–response mappings using fMRI. They used a simple stimulus–response mapping task to identify the neural correlates of mappings that had been instructed and subsequently applied. Each stimulus was mapped onto one of two possible responses (press left or right key). Four stimuli (two for each key) and their mappings were first instructed (instruction phase). This was followed by 32 practice trials in which each stimulus appeared 8 times in a randomized sequence (test phase). This procedure was repeated over 20 blocks with new stimuli in each block to obtain accurate fMRI data. Within each block, responses became faster with repetition. Error rates also decreased with repetition. At the neural level, activation levels across repetitions decreased in the left IFJ and increased in the BG (in particular, the caudate nucleus). The reported changes in other areas (e.g., decrease in left posterior intraparietal sulcus) are beyond the scope of the current study and hence not discussed.

Waszak et al. (2008) investigated the effect of merely instructed and applied visuomotor mappings. They used stimuli varying on two dimensions (color and shape) and subjects were presented with color-task or shape-task trials intermixed. The tasks involved applying arbitrary mappings from colors and shapes to left and right responses (for instance, “if circle, then press left arrow key” or “if brown object, then press right arrow key”; see Fig. 6). A third of the stimulus–response associations in each task were merely instructed and the other two-thirds were applied (i.e., trained). The irrelevant stimulus dimension allowed the stimulus in any given trial to be categorized as univalent, bivalent, or instructed. In the case of univalent stimuli, only the relevant stimulus dimension had a valid response mapping; for example, the stimulus would be a shape in a particular color with only the shape being associated with a response. For bivalent stimuli, both relevant and irrelevant dimensions had valid response mappings. The instructed stimuli were similar to the bivalent ones, except that the irrelevant stimulus dimension and its mapping were merely instructed and had never been applied.

The experiment consisted of an instruction phase, followed by five practice blocks of 96 trials each. The fourth and fifth practice blocks were preceded by two test blocks (each lasting for 36 trials) in which the instructed stimuli were presented as valid targets. This rendered these stimuli effectively bivalent for the final two practice blocks.

Waszak et al. (2008) hypothesized that the presentation of a stimulus with two dimensions having valid response mappings (with one of the two mappings being valid for a second task) would lead to an interference effect. Interference effect refers to the delay in responding to bivalent or instructed stimuli (congruent or incongruent) relative to univalent stimuli. In addition, the RT differences between incongruent and congruent stimuli for the bivalent and instructed types were also computed (congruency effect).

Waszak et al. (2008) found an interference effect for both bivalent and instructed stimuli, but it was larger for bivalent stimuli. Moreover, the interference effect was larger for the practice blocks after the test blocks than before the test blocks (see Fig. 7a). They also found a congruency effect for bivalent stimuli across all practice blocks. In contrast, the instructed stimuli did not show a congruency effect in the first three practice blocks but an effect was observed in the final two practice blocks (see Fig. 7c).

Goal neglect refers to being able to describe an instruction while not being able to implement it. First reported in frontal lobe patients (Luria, 1966), Duncan et al. (1996) demonstrated that goal neglect can also be observed in normal subjects if instructions are sufficiently complex. In a recent study, Duncan et al. (2008) demonstrated that it is specifically the total number of elements to be remembered in the task which determines the extent of goal neglect and its correlation with general intelligence. We hypothesize that the modeling framework presented here may yield insights into goal neglect. Currently, we apply an adapted version of the Ruge design as a first step toward modeling goal neglect.

Section snippets

Simulation study 1.1

The major behavioral findings of Ruge and Wolfensteller (2010) were replicated. First, mean error percentage was zero (in the empirical study, it was very low and decreased across repetitions). Next, RTs decreased as a function of stimulus repetition (empirical and simulated data in Figs. 2a and b, respectively).

In the empirical data, activation decreased in PFC across stimulus repetition (Fig. 2c). The same is observed in the model (Fig. 2d). Also the activation increase across repetitions in

General discussion

We presented a unified framework for instruction implementation, with Hebbian learning at its core. We derived a dual-route model from this framework and applied it to instruction following. The model was tested by simulating two recent experiments on instructions. The first simulation showed that sufficient practice can cause a switch from one route to another without need for a homunculus. However, a complete switch only occurred if the faster-learning path also acted more slowly. The second

Simulation study 1.1: Ruge and Wolfensteller (2010)

There were four units in the stimulus layer, corresponding to the four stimuli. Left and right responses were encoded by two units in the response layer. The same coding scheme was applied to the (verbal) stimulus representations and (verbal) response representations (indirect route), with the same number of units in the respective layers. The PFC and BG layers had 200 units each.

The design of Ruge and Wolfensteller (2010) was replicated exactly, with 32 trials in the test phase. During the

Acknowledgments

We thank Kevin Diependaele, Kevin Gurney and Sebastien Helie for their valuable comments on a previous version of this manuscript. AR and TV were supported by Research Project 3G005909 awarded by FWO-Flanders (Belgium). Address correspondence to [email protected].

References (56)

J.G. Craggs et al.
The dynamic mechanisms of placebo induced analgesia: evidence of sustained and transient regional involvement
Pain
(2008)
S. Dehaene et al.
Cultural recylcing of cortical maps
Neuron
(2007)
S.W. Derbyshire et al.
Fibromyalgia pain and its modulation by hypnotic and non-hypnotic suggestion: an fMRI analysis
Eur. J. Pain
(2009)
B.B. Doll et al.
Instructional control of reinforcement learning: a behavioural and neurocomputational investigation
Brain Res.
(2009)
J. Duncan et al.
Intelligence and the frontal lobe: the organization of goal-directed behaviour
Cogn. Psychol.
(1996)
J.L. Elman
Finding structure in time
Cogn. Sci.
(1990)
M.J. Frank
Hold your horses: a dynamic computational role for the subthalamic nucleus in decision making
Neural Netw.
(2006)
O. Hauk et al.
The time course of action and action-word comprehension in the human brain as revealed by neurophysiology
J. Physiol. Paris
(2008)
J. Mink
The basal ganglia: focused selection and inhibition of competing motor programs
Prog. Neurobiol.
(1996)
P. Petrovic et al.
A prefrontal non-opioid mechanism in placebo analgesia
Pain
(2010)

A. Ploghaus et al.

Neural circuitry underlying pain modulation: expectation, hynposis, placebo

TRENDS Cogn. Sci.

(2003)

A. Raz et al.

Can suggestion obviate reading? Supplementing primary Stroop evidence with exploratory negative priming analyses

Conscious. Cogn.

(2011)

M.F. St.John et al.

Learning and applying contextual constraints in sentence comprehension

Artif. Intell.

(1990)

F. Waszak et al.

Cross-talk of instructed and applied arbitrary visuomotor mappings

Acta Psychol.

(2008)

F.G. Ashby et al.

A neuropsychological theory of multiple systems in category learning

Psychol. Rev.

(1998)

G.F. Ashby et al.

A neurobiological theory of automaticity in perceptual categorization

Psychol. Rev.

(2007)

G.F. Ashby et al.

Cortical and basal ganglia contributions to habit learning and automaticity

Trends Cogn. Sci.

(2010)

G.F. Ashby et al.

COVIS

A. Bandura

Social Learning Theory

(1977)

F. Benedetti et al.

Neurobiological mechanisms of the placebo effect

J. Neurosci.

(2005)

M.M. Botvinick et al.

Short-term memory for serial order: a recurrent neural network model

Psychol. Rev.

(2006)

M.M. Botvinick et al.

Conflict monitoring and cognitive control

Psychol. Rev.

(2001)

Y.-L. Boureau et al.

Opponency revisited: competition and cooperation between dopamine and serotonin

Neuropsychopharmacolog. Rev.

(2010)

M. Brass et al.

Neural correlates of overcoming interference from instructed and implemented stimulus–response associations

J. Neurosci.

(2009)

C.S. Carver et al.

Serotonergic function, two-mode models of self regulation, and vulnerability to depression: what depression has in common with impulsive aggression

Psychol. Bull.

(2008)

J.D. Cohen et al.

On the control of automatic processes: a parallel distributed processing account of the stroop effect

Psychol. Rev.

(1990)

O. Cohen-Kdoshay et al.

The representation of instructions operates like a prepared reflex

Exp. Psychol.

(2009)

M.W. Cole et al.

Prefrontal dynamics underlying rapid instructed task learning reverse with practice

J. Neurosci.

(2010)

Cited by (0)

View full text

Research ReportWord and deed: A computational model of instruction following

Abstract

Highlights

Introduction

Section snippets

Simulation study 1.1

General discussion

Simulation study 1.1: Ruge and Wolfensteller (2010)

Acknowledgments

Pain

Neuron

Eur. J. Pain

Brain Res.

Cogn. Psychol.

Cogn. Sci.

Neural Netw.

J. Physiol. Paris

Prog. Neurobiol.

Pain

TRENDS Cogn. Sci.

Conscious. Cogn.

Artif. Intell.

Acta Psychol.

A neuropsychological theory of multiple systems in category learning

Psychol. Rev.

A neurobiological theory of automaticity in perceptual categorization

Psychol. Rev.

Cortical and basal ganglia contributions to habit learning and automaticity

Trends Cogn. Sci.

COVIS

Social Learning Theory

Neurobiological mechanisms of the placebo effect

J. Neurosci.

Short-term memory for serial order: a recurrent neural network model

Psychol. Rev.

Conflict monitoring and cognitive control

Psychol. Rev.

Opponency revisited: competition and cooperation between dopamine and serotonin

Neuropsychopharmacolog. Rev.

Neural correlates of overcoming interference from instructed and implemented stimulus–response associations

J. Neurosci.

Serotonergic function, two-mode models of self regulation, and vulnerability to depression: what depression has in common with impulsive aggression

Psychol. Bull.

On the control of automatic processes: a parallel distributed processing account of the stroop effect

Psychol. Rev.

The representation of instructions operates like a prepared reflex

Exp. Psychol.

Prefrontal dynamics underlying rapid instructed task learning reverse with practice

J. Neurosci.

Research Report
Word and deed: A computational model of instruction following