ABSTRACT

Background: Expertise in clinical reasoning is essential for high-quality patient care. The Clinical Integrative Puzzle (CIP) is a novel assessment method for clinical reasoning. The purpose of our study was to further describe the CIP, providing feasibility, reliability, and validity evidence to support this tool for teaching and evaluating clinical reasoning. Methods: We conducted a prospective, randomized crossover trial assessing the CIP in second-year medical students from a single institution. Feasibility was estimated through the time taken to complete a CIP during a CIP session and through comments from faculty developers. Reliability was addressed through calculating odd–even item reliability (split-half procedure) for grid questions within each CIP. Evidence for content, concurrent, and predictive validity was also measured. Results: 36 students participated in the study. Data suggested successful randomization of participants and nonparticipants. The CIP was found to have high feasibility, acceptable reliability (0.43–0.73 with a mean of 0.60) with a short time for CIP completion. Spearman–Brown correction estimated a reliability of 0.75 with completing two grids (estimated time of 50 minutes) and 0.82 for three grids (estimated time of 75 minutes). Validity evidence was modest; the CIP is consistent with clinical reasoning literature and the CIP modestly correlated with small group performance (r = 0.3, p < 0.05). Conclusions: Assessing clinical reasoning in medical students is challenging. Our data provide good feasibility and reliability evidence for the use of CIPs; validity data was less robust.

INTRODUCTION

Expertise in clinical reasoning is essential for high-quality patient care. Clinical reasoning involves establishing a diagnosis and treatment approach that is specific for a patient's circumstances and preferences; it pertains to nearly everything a physician does in practice.

Much remains unknown regarding how expert performance in clinical reasoning is developed or maintained. There is also debate regarding whether clinical reasoning can be taught.1 Given these and other challenges, to include the fact that clinical reasoning is not directly observable with our standard assessment methods, it has been difficult to develop tools to evaluate clinical reasoning. In this report, we present preliminary psychometric data of a novel tool for assessing clinical reasoning called the Clinical Integrative Puzzle (CIP) and we compare these findings with commonly used measures to assess clinical reasoning followed by a discussion of clinical reasoning theories. Though there remains controversy regarding the assessment and teaching of clinical reasoning, we believe educational theories can help shed light on this debate.

THEORETICAL FRAMEWORKS

Studies in deliberate practice argue that to achieve expert performance in a domain (such as clinical reasoning), one has to spend many hours (approximately 10,000 hours) in effortful practice with the material, which at least initially should be under the guidance of a coach or mentor.2 Many scholars believe that for clinical reasoning this practice should entail the deliberate construction of illness scripts. Script theory3,5 is germane to contemporary thinking on teaching and assessing clinical reasoning. A script can be thought of as a mental representation of the symptoms and findings that are seen with a diagnosis or illness. Indeed this application leads to the name “illness script” in medicine.6 Illness scripts are also believed to include the causal factors, pathogenesis, prognosis, and consequences for a disease. Further, illness scripts include a range of features that can be consistent with the illness. Script theory suggests that physicians use two systems for clinical reasoning. One system (System 1) is rapid, automatic, and its function is largely unconscious, and involves low cognitive effort. This rapid system is believed to entail script activation and is often referred to as nonanalytic reasoning, or pattern recognition. A physician can see a typical presentation of a disease and the diagnosis immediately comes to mind, or is activated, without much, if any, effort. Often, the physician arrives at the diagnosis through this nonanalytic reasoning. The second system (“System 2” or analytic reasoning) is slow and rule based, requires high cognitive effort and is consciously controlled. This system is believed at times to be involved in “script confirmation” when both system 1 and system 2 are used to arrive at a patient's diagnosis. In this circumstance, the physician actively compares and contrasts activated scripts from system 1 processes for a given patient presentation to arrive at the correct diagnosis. Limited studies suggest that physicians use both of these systems in caring for patients.7

There have been insights from fields outside of medical education regarding how scripts may develop. This literature from multiple fields2 proposes that expertise is an “adaptation”, and that experts expand and rearrange (i.e., adapt) their long-term memory through “chunking” or grouping multiple pieces of information together into large integrated cognitive units8 to enable processing of more information simultaneously than less advanced practitioners can do. This “chunking” of information is needed to cope with the limited number of independent pieces of information that the human brain can process in working memory at a time. Cognitive load theory addresses this limitation in human cognitive architecture and proposes that the number of interacting units (or “element interactivity”) is a key determinant of cognitive load.9

Additionally, a number of studies suggest that clinical reasoning is a not a generalized skill but rather is highly dependent on a relevant knowledge base and on the context of the encounter.10 Deliberate practice and script theories support this notion in that directed, explicit, focused effort in a domain is required, and involves high effort (System 2) for expertise to develop in clinical reasoning. Thus, these theories suggest that structured learning should improve clinical reasoning performance for a content domain.

The CIP was first described by Ber, as a novel tool for the assessment of reasoning.11 The CIP uses a “grid” approach as outlined in Table I. Clinical diagnoses or syndromes are depicted on the horizontal rows, and findings from generic domains (e.g., history, physical examination, and laboratory findings) that will populate the illness scripts in the students' memories are depicted in the vertical columns. Next, a series of options for each cell in the grid are given. Trainees must match appropriate options with the rows and columns. The process allows for cross-referencing and reinforcement of concepts among a cluster of similar diagnoses with critical differences within and across domains. The rows generally involve similar syndromes or diagnostic entities that trainees often have difficulty differentiating, allowing trainees to practice comparing and contrasting of similar disease entities, allowing for, and perhaps requiring, “script confirmation” to take place. Groothoff12 have provided support for the construct validity of a slightly adapted version of the CIP, called MATCH (Measuring Analytic Thinking in Clinical Healthcare). At this stage of training, we would expect that students would mainly use the slower, “System 2” processing for these clinical reasoning activities.

TABLE I.

Sample CIP Grid on Thyrotoxicosis

DiseaseHistoryPhysicalLabsPathologyRAIUTreatment
Graves' Disease1  713192531
Subacute Thyroiditis2  814202632
Toxic MNG3  915212733
Central Hyperthyroidism41016222834
T4 Ingestion51117232935
Struma Ovarii61218243036
DiseaseHistoryPhysicalLabsPathologyRAIUTreatment
Graves' Disease1  713192531
Subacute Thyroiditis2  814202632
Toxic MNG3  915212733
Central Hyperthyroidism41016222834
T4 Ingestion51117232935
Struma Ovarii61218243036
TABLE I.

Sample CIP Grid on Thyrotoxicosis

DiseaseHistoryPhysicalLabsPathologyRAIUTreatment
Graves' Disease1  713192531
Subacute Thyroiditis2  814202632
Toxic MNG3  915212733
Central Hyperthyroidism41016222834
T4 Ingestion51117232935
Struma Ovarii61218243036
DiseaseHistoryPhysicalLabsPathologyRAIUTreatment
Graves' Disease1  713192531
Subacute Thyroiditis2  814202632
Toxic MNG3  915212733
Central Hyperthyroidism41016222834
T4 Ingestion51117232935
Struma Ovarii61218243036

The purpose of our study was to further describe the CIP, providing psychometric data on this tool for teaching and evaluating clinical reasoning. As the CIP allows for deliberate comparing and contrasting of key features that comprise a diagnosis, it offers unique teaching and assessment moments. For example, this potentially allows for reduction in cognitive load by reducing the interaction of multiple pieces of the patient's presentation by answering questions within generic domains and by potentially facilitating adaptation and chunking through having learners work through the component parts to the diagnoses (completing cells in horizontal and vertical columns as opposed to just answering what is the most likely diagnosis). More specifically, we sought to establish the feasibility, reliability, and gather preliminary validity evidence for this emerging tool in medical education literature. In terms of validity, whereas multiple frameworks exist in the literature, we explored evidence for a number of commonly used arguments: content, construct and predictive validity. Our hypotheses were that the CIP would demonstrate evidence for feasibility, reliability, and validity.

METHODS

Study Context

The Uniformed Services University (USU) is the United States' only federal medical school. Students graduating from USU then enter the same spectrum of specialties offered in the civilian community. During the study period, the curriculum was taught in a traditional approach with 2 years of preclinical courses followed by 2 years of clinical clerkships. The Introduction to Clinical Reasoning (ICR) Course occurs during the second year and was under the direction of one of the authors (SJD) during the study period.

At the time of the study, the ICR course is a year-long course wherein students are introduced to a variety of reasoning processes by exploring a series of symptoms, physical examination findings, laboratory abnormalities, and syndromes. Generally speaking, each of the 30 topics has both a lecture and a case-based reasoning small group session. In the small group sessions, students work through 2 to 5 “paper cases,” which illustrate common diagnostic entities and key findings for the topic being discussed. During the study period, the sessions were 90 minutes in length and small group size ranged from 6 to 9 students per group. During these sessions, preceptors facilitate discussions by reviewing student answers to the cases and lead students through the clinical reasoning process (e.g., pointing out key terminology, pathophysiology, and decision points). There are assigned readings before the small groups and students are expected to arrive prepared to discuss the cases for the topic for the session. Following is an example excerpt from the introduction of a paper case.

“A 60-year-old patient with hypertension, diabetes, and high cholesterol who presents with a three month history of progressive substernal chest pain. His physical examination is normal. Please list your differential diagnosis”

The CIP sessions were conducted in addition to usual small group teaching sessions on topic areas—there were no other differences in terms of formal small group teaching instruction between those who completed a CIP and those who did not for topics with CIPs. Student evaluation measures were also the same regardless of CIP assignment.

CIP Description

The CIP resembles the extended-matching assessment, described initially by Case13 but extends upon this through its crossword puzzle–like appearance (Table I). This tool enables trainees to compare and contrast a group of related diagnoses in a variety of domains such as history, physical, histology, pathology, and/or laboratory patterns. The CIP can be either paper- or computer-based in format. It can also allow inclusion of images (such as radiographs and histology slides) and multiple other media formats, such as streaming audio files, which can be included in electronic CIPs. The tool also allows for integration of curriculum content through simultaneous inclusion of a variety of domains. A CIP allows for the division of a number of prototypical presentations into specific domains, which enables the learner to focus more attention on key discriminating findings.

The study investigators created eight CIPs (5 × 5 to maximum of 6 × 6) for topics addressed in the course. Each CIP had a primary author. CIP topics were anemia, abdominal pain, dyspnea, chest pain, thyrotoxicosis, headache, and pediatric growth and development disorders [available upon request to the principal investigator (SJD)]. Each CIP was written by one primary author and then vetted with content and educational experts within their respective department. Primary authors then subsequently met with each other and further vetted each CIP, resulting in the CIPs used in this study. All participants underwent a 15-minute orientation session wherein the CIP was explained and a practice grid was completed with a nonmedical topic.

CIP Sessions

Each of the four CIP sessions lasted one hour in duration. Each CIP session is a small group teaching and assessment event for clinical reasoning, shaped by exercises with CIPs. More specifically, in each session, the task for students was to complete as many CIPs as possible. Participants worked independently and no instruction or feedback was given before the CIPs were completed within each session. Participants listed start and stop times on the CIP grid and time was also monitored by the research assistant. For each of the four sessions, there was a different “primary” CIP, i.e., the first CIP that all participants for the session received. Each of these sessions was monitored by a research assistant, and student participation was voluntary and involved informed consent.

All students who agreed to participate were randomized to one of 6 groups on entry through a random-number generator. This was done as it was expected that not all students would sign up for this study and also to prevent students from completing CIPs only in topics of interest, which could confound our study finding correlations. Each of these six groups attended two of the four CIP sessions as illustrated below. The session numbers corresponded to time in the academic year (i.e., 1 = first quarter, 4 = last quarter). The six groupings were as follows:

  • Group 1: CIP in session 1 and 2

  • Group 2: CIP in session 1 and 3

  • Group 3: CIP in session 1 and 4

  • Group 4: CIP in session 2 and 3

  • Group 5: CIP in session 2 and 4

  • Group 6: CIP in session 3 and 4

Students were assigned to participate in two of four total CIP sessions (above). The faculty involved in the CIP sessions did not teach any of the small group sessions.

Measurements

Measurements collected during the study period included baseline measures (collected before the CIP sessions), process measures (collected during the academic year), and outcome or after measures (collected at the end of the academic year).

Baselines

End-of-first-year Grade Point Average (GPA), grades in first-year courses, and Medical College Admissions Test (MCAT) scores.

Process

Performance on Introduction to Clinical Reasoning (ICR) Course 3 in-house multiple-choice examinations.

Outcomes

United States Medical Licensing Examination (USMLE) Step 1 (basic science knowledge) and internal medicine clerkship grade. We expected only small to moderate correlations with USMLE Step 1 performance as the construct of clinical reasoning is not fully captured in a multiple-choice test. Participants in our study also underwent unstructured exit interviews. Outcome measurements were collected to help measure the durability of effect of exposure to CIPs. This was felt to be particularly important given findings from transfer literature suggesting differences in durability between instructional modalities for complex tasks such as clinical reasoning.14

Participants

All USU second-year medical students had the opportunity to participate in this study. There were no exclusionary criteria. All 160 students (both participants and nonparticipants in this study) received the same teaching and evaluation materials offered in the course; the only difference between participants and nonparticipants in our study was exposure to CIPs on selected topics.

Students were notified of the opportunity for study participation by e-mail at the beginning of the academic year. Students who did not respond to the e-mail invitation received up to 3 electronic reminders by the research assistant. Students who expressed interest through e-mail invitations then received an overview presentation on the CIP by the research assistant. Following the overview presentation, students who agreed to participate signed an informed consent, were randomized to one of six groups, and given a unique identifier by the research assistant for data identification and analysis. The study was approved by the USU IRB.

Statistical Analysis

Feasibility was estimated through the time taken to complete a CIP during a CIP session and through comments from faculty developers. The time was listed by each participant on each CIP answer grid (start and stop time). The research assistant also timed participants during CIP sessions. Qualitative comments conducted by the research assistant from students on exit interviews also targeted feasibility of this tool. All responses were recorded (audio and/or written) and were analyzed by two members of the study team. Students and faculty were encouraged to make any suggestions or comments regarding the tool; the research assistant also asked specifically about the feasibility and value of this potential tool. Responses were analyzed by two of the study investigators for emergent themes and example comments. Reliability of the CIP was addressed through calculating odd–even item reliability (split-half procedure) for grid questions within each CIP. Odd–even reliability compares the performance of odd numbered items with even numbered items in the CIP cells as a measure of internal consistency of the tool and ensures a reasonable mix of items. CIP scores were calculated by adding the number of correct answers over all cells in a grid (i.e., maximum scores 25–36 per CIP, minimum score 0).

Validity was assessed through a number of means. First, by comparing CIP content with end-of-year examinations and by discussing CIP content with both faculty (unstructured interviews) and participants (through unstructured exit interviews). Second, we assessed validity by comparing CIP method with current reasoning theory (content and construct). Third, we gathered validity evidence by comparing CIP performance with end of course performance in small groups and multiple-choice in-house examinations. Fourth, we compared CIP performance with USMLE examination and internal medicine clerkship grade as a means of assessing predictive validity. Finally, we gathered validity evidence through analysis of comments from trainees by unstructured exit interviews.

We also recorded comments from exit interviews. These qualitative comments were used to help explain underpinnings of quantitative findings and to generate hypotheses.

RESULTS

A total of 36 participants completed CIPs in this study, 26 men and 10 women. There were no significant (all p > 0.05) “baseline” differences between the end-of-first-year GPA, grades in first-year courses, and MCAT scores between participants and nonparticipants at the USUHS during the academic year of the study.

Feasibility

Faculty reported that it took approximately 5 minutes for an instructor to create a cell in the grid, which compares favorably with the amount of time that it takes an instructor to write a multiple-choice question (MCQ). Thus a CIP grid with 25 cells (questions) would be estimated to take 2 hours. The mean time that it took a participant to complete the CIP was 25 minutes (range, 8–34 minutes). It took a participant approximately 1 minute to complete a question cell in the grid.

Reliability

Odd–even reliability per CIP ranged between 0.43 (growth and development disorders) and 0.73 (dyspnea, chest pain). The mean odd–even reliability across all CIPs was 0.60. If one uses the mean odd–even reliability (0.60) and Spearman–Brown correction, completing two 25-cell CIP grids would result in an estimated reliability of 0.75 and completing three 25-cell grids would result in a reliability of 0.82. Two 25-cell grids would require approximately 50 minutes (1 cell per minute) and three grids would require 75 minutes (1 question or cell in CIP per minute).

Validity

All the selected CIP topics (content) corresponded to the topics taught in the ICR Course. Validity evidence was gathered through interviews of faculty and exit interviews of students. These unstructured interviews provide evidence of this tool's content and construct validity. The majority of participants (n = 34) as well as all faculty involved in writing the CIPs (n = 6) endorsed high satisfaction with this tool. They endorsed that the content of the examination mirrored the content of the course as well as the clinical reasoning process. Sample quotes from faculty (quotes represent the 3 identified themes of making the clinical reasoning process explicit, integrating basic and clinical science assessment, and mirrors clinical practice, respectively) included: “Constructing a CIP compels faculty to focus on the key distinguishing features between a group of diagnoses; this is a different exercise than telling students all the common features seen in a diagnosis.”

“CIPs enable the assessment of basic science and clinical science together which is very unique. I enjoyed choosing the domains to include and working with my basic science and clinical faculty on the topic.”

“Taking care of patients is more than which of the following is the most likely diagnosis. Patients don't say I have a. CHF, b. COPD, c. etc. By reviewing the CIP with the student, I can see how they put the pieces of the puzzle together to arrive at their answer. I am going to use CIPs in other venues with trainees to include residents and faculty.”

Examples of student comments from the unstructured exit interviews (n = 31 participants) are listed below and also provide evidence of content and construct validity. Student themes mirrored faculty themes (making the clinical reasoning process explicit, integrating basic and clinical sciences, and mirrors clinical practice). In terms of making the process more explicit, students noted that the CIP was a “unique format for teaching and evaluating reasoning,” “helps me connect the dots and arrive at a diagnosis,” “displays the intermediate steps to the diagnosis,” and “helps compare important elements of similar diagnoses.” In terms of the theme of integrating basic and clinical sciences, student stated the CIP “gives me a tool for reading about and contrasting textbook diagnoses” and “helps me put basic and clinical science information together.” Finally, in terms of mirroring clinical practice, example quotes included: “I prefer these to multiple-choice questions.” Patients do not come in saying “I have abdominal pain and please choose a diagnosis from A-E” and “please put these in the course, they are too valuable for study use only.” Further, all participants (n = 34) reported that the CIP reflected important clinical reasoning concepts and helped performance on the National Board of Medical Examiners examinations. Several students asked for additional CIPs to help prepare for examinations.

No significant correlations emerged between the students' scores for each individual CIP and multiple-choice examinations of academic performance. Correlations for the following were all found to have an r < 0.2, and p > 0.05: first-year GPA, final grade in clinical reasoning course, IM clerkship grade. Significant small to moderate correlations were seen with small group performance (for all non-CIP-session topics) in the clinical reasoning course (r = 0.30, p < 0.05).

DISCUSSION

This study evaluated the feasibility, reliability and validity of the CIP in a pilot program at a single institution in a single academic year. Although the study did demonstrate evidence of feasibility, reliability, and validity, the evidence for the latter was the least robust in this small study with content and construct validity being provided through consistency with clinical reasoning theory as well as student and faculty qualitative comments. The correlation of CIP scores with small group grades from preceptors, whereby students display their clinical reasoning by working through paper cases of presentations of a variety of topics in clinical medicine, is an expected outcome and provides some evidence of construct validity. Further, the concordance of qualitative findings from unstructured interviews of faculty and students provides additional validity evidence. Given that the topics vary in content and scope, we believe this may be the reason that small to moderate correlations were found. The lack of association with MCQs is consistent with the notion that MCQs may be reflecting a different construct than clinical reasoning.

The CIP is a tool for assessing clinical reasoning. The puzzle allows novice and more advanced learners to engage content in a format different from multiple-choice examinations and small-group case discussions that they usually encounter, where the typical focus is on the one most likely diagnosis for 1 single case at a time. The crossword puzzle format of the CIP allows students to put the component “parts” of a diagnosis together, explicitly compare and contrast symptoms and findings between diagnoses, and build basic illness scripts. Thus, the CIP approach is consistent with script theory, providing additional construct validity evidence with the design of brief descriptions in each block that enables the learner to focus on key information for each domain by comparing and contrasting findings vertically while composing illness scripts horizontally. According to studies by Bordage and Lemieux,15,16 medical learners do not organize medical knowledge in linear frameworks such as simple lists of signs, symptoms, and rules but rather develop a network of knowledge of abstract relationships. The CIP appears to represent a logical way of solidifying and organizing abstract clinical relationships, and allows comparing and contrasting between multiple diagnoses with the grid format. Bordage and Lemieux15,16 suggest that a major determinant of diagnostic competence is the ability to compare and contrast the signs and symptoms presented and relate these abstract qualities to stored memory structures and scripts. In addition to serving as an assessment format, the CIP appears to be a useful tool for helping learners to acquire, compare, and contrast strategies that support clinical reasoning. Further, the CIP can plausibly allow for unique testing and teaching of cognitive skills such as knowledge organization, discriminating key features,16 prioritization of and explicitly testing the discriminating key findings for a group of related diagnoses,17 semantic competence (or use of proper medical terminology),16 and encapsulation (establishing the intermediate steps to the diagnosis such as the syndrome)6 all of which have been proposed to improve clinical reasoning. Also, the CIP grid format reduces element interactivity (through limiting the amount of information contrasted for each grid—i.e., a section of the history as opposed to the entire history) and thus assists the learner with managing cognitive load.

As medical students develop into physicians, their reliance on analytic reasoning decreases as they begin to “chunk” information and rely more on pattern recognition and their ability to draw on their experience during their years of training. This could represent one reason why we did not see an association with internal medicine clerkship grade though we suspect the lack of association more reflects our small sample size and the fact that the CIP represents only a small portion of the content covered in the internal medicine clerkship.

We believe the CIP, as an analytic thinking tool, can be used for both teaching and assessment of clinical reasoning. Further, it can assist with developing this chunking and automatization through providing brief descriptions for each block or question as well as enabling horizontal and vertical comparisons on the CIP grid.

The data from this study provide evidence for the feasibility and reliability of the CIP in assessing clinical reasoning in medical students. It takes students approximately 1 minute per item (cell) in the grid and the results of the test appear reliable. Additional research is needed with more participants to assess the validity of this test. Based on our data, completing two to three grids requires less than 1.5 hours for a participant to complete (with high stakes reliability results) and appears to take less time for faculty to construct than creating multiple-choice questions (30 minutes–1 hour per question for an MCQ, 5 minutes per question (or cell) for a CIP, 12.5–25 hours to create a 25-question MCQ, and an estimated 2 hours for a 25-question CIP) with similar reliability results. Our results are consistent with recent studies from others adding to the validity evidence of our findings.12 Additionally, speaking to the validity of the CIP is the tool's consistency with reasoning theory. The CIP was developed from these theories as a basis. Vertically, the participants compare and contrast domains, which may help develop encapsulations. Horizontally the participants build a “script” for a particular illness.

There were several limitations to this study. The CIP was only studied in a single institution with a small number of participants. We suspect that this had major impact on our ability to assess the predictive validity of this assessment tool; the study was not powered to detect small effect sizes. A larger, multicenter trial would be better powered to assess a small effect. Future research may include multicenter trials of this tool at various time points during medical school and residency, comparable to what was done by Groothoff et al.12 We would predict that more advanced learners may be able to better accomplish the CIP tasks, and do so using, in part, the more rapid “system 1” processing of pattern recognition as well as a more robust experience base (system 1 and/or 2) to draw from. Specifically, their CIP examination scores would not only be higher, but would be accomplished in a much shorter time period per CIP. This could be studied in the future. Additional data and correlation studies with USMLE Step II and III board scores and in-service examinations should reveal whether the CIP does measure similar or different constructs than the factual and applied type of clinical knowledge usually tested. Reliability would likely be improved by having more potential answers than cells for each topic, reducing guessing. This has been suggested in prior studies.12

The challenge of assessing and developing clinical reasoning skills and finding a valid predictor of clinical excellence remains an elusive goal of medical education. More research is needed in this field to better establish and quantify the efficacy of assessment and developmental tools used for improving clinical reasoning in medical education today. As each student is unique, it is unlikely that a single tool exists that would help every learner equally well to develop expertise in clinical reasoning. The CIP represents a novel approach to clinical reasoning instruction and offers a unique view of assessment (within and across disciplines) that could generate helpful learner feedback. Our learners endorsed the value of this feedback and our psychometric data provided multiple arguments supporting its use in medical education.

ACKNOWLEDGMENTS

The authors thank Professor Olle T.J. Cate, PhD, for his helpful review and suggested revisions to the manuscript.

REFERENCES

1.

Ramaekers
SPJ
On the development of competence in solving clinical problems; Can it be taught? Or can it only be learned?
Available at http://igitur-archive.library.uu.nl/dissertations/2011-0825-203547/UUindex.html;
accessed October 2, 2014
.

2.

Ericsson
KA
The Cambridge Handbook of Expertise and Expert Performance
.
Cambridge, New York
,
Cambridge University Press
,
2006
.

3.

Charlin
B
,
Tardif
J
,
Boshuizen
HP
Scripts and medical diagnostic knowledge: theory and applications for clinical reasoning instruction and research
.
Acad Med
2000
;
75
:
182
90
.

4.

Charlin
B
,
Gagnon
R
,
Pelletier
J
, et al. 
Assessment of clinical reasoning in the context of uncertainty: the effect of variability within the reference panel
.
Med Educ
2006
;
40
:
848
54
.

5.

Charlin
B
,
Boshuizen
HP
,
Custers
EJ
,
Feltovich
PJ
Scripts and clinical reasoning
.
Med Educ
2007
;
41
:
1178
84
.

6.

Schmidt
HG
,
Rikers
RM
How expertise develops in medicine: knowledge encapsulation and illness script formation
.
Med Educ
2007
;
41
:
1133
9
.

7.

Mamede
S
,
Schmidt
HG
,
Rikers
RM
,
Penaforte
JC
,
Coelho-Filho
JM
Breaking down automaticity: case ambiguity and the shift to reflective approaches in clinical reasoning
.
Med Educ
2007
;
41
:
1185
92
.

8.

Gobet
F
Expert memory: a comparison of four theories
.
Cognition
1998
;
66
:
115
52
.

9.

van Merrienboer
JJ
,
Sweller
J
Cognitive load theory in health professional education: design principles and strategies
.
Med Educ
2010
;
44
:
85
93
.

10.

Durning
SJ
,
Artino
AR
,
Boulet
JR
,
Dorrance
K
,
van der Vleuten
C
,
Schuwirth
L
The impact of selected contextual factors on experts' clinical reasoning performance (does context impact clinical reasoning performance in experts?)
.
Adv Health Sci Educ Theory Pract
2012
;
17
:
65
79
.

11.

Ber
R
The CIP (comprehensive integrative puzzle) assessment method
.
Med Teach
2003
;
25
:
171
6
.

12.

Groothoff
JW
,
Frenkel
J
,
Tytgat
GA
,
Vreede
WB
,
Bosman
DK
,
ten Cate
OT
Growth of analytical thinking skills over time as measured with the MATCH test
.
Med Educ
2008
;
42
:
1037
43
.

13.

Case
S
Extended-matching items: a practical alternative to free-response questions
.
TLM
1993
;
5
:
107
15
.

14.

Bransford
J
,
Brown
A
,
Cocking
R
(editors):
How People Learn. National Research Council Committee on Developments in the Science of Learning. Committee on Learning Research and Educational Practice
.
Washington, DC
,
National Academy Press
,
2000
.

15.

Bordage
G
,
Lemieux
M
Some cognitive characteristics of medical students with and without diagnostic reasoning difficulties
.
Res Med Educ
1986
;
25
:
185
90
.

16.

Bordage
G
,
Lemieux
M
Semantic structures and diagnostic thinking of experts and novices
.
Acad Med
1991
;
66
:
S70
2
.

17.

Yudkowsky
R
,
Lowenstein
T
,
Riddle
J
,
Otaki
J
,
Nishigori
RH
,
Bordage
G
A hypothesis-driven physical exam for medical students: initial validity evidence
.
Med Educ
2009
;
43
:
729
40
.

Author notes

1

The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Department of Defense or the U.S. Government.