The need for an effective digital test to assess perceptual organization

While deficits to visual perception do not often manifest as the most disruptive symptoms experienced in neurological disorders (e.g., stroke, Huntington’s disease, and Alzheimer’s disease) or in psychiatric patients (e.g., schizophrenia and autism), they are nevertheless relatively common (Tatemichi et al., 1994). The Leuven Perceptual Organization Screening Test (L-POST) provides clinicians and researchers with a broad range of subtests for different aspects of perceptual organization. The test can serve both as a clinical screening tool and as a research tool for identifying the level of perceptual organization deficit associated with different conditions.

Visual perception is a hierarchical process (Wagemans, Wichmann, & Op de Beeck, 2005) in which our percept is built up from simple low-level characteristics of the input image, like orientation, color, and contrast, to complex high-level stages of vision at which we can recognize people and objects. Mid-level vision refers to a level of visual processing, situated between the basic analysis of the image (low-level vision) and the recognition of specific objects (high-level vision). This level of visual perception was the focus of Gestalt psychology (Wertheimer, 1938; for recent reviews, see Wagemans, Elder, et al., 2012; Wagemans, Feldman, et al., 2012). Gestalt psychologists identified many of the key challenges in the domain of perceptual organization—namely, the grouping of different elements into one object, the segregation of a figure from a background, the integration of textures and contours, and the completion of partly occluded figures. Perceptual organization occurs so effortlessly that we have no intuition regarding its importance. Indeed, the critical role it plays in enabling higher level functions, like object recognition and scene recognition, becomes apparent only in patients for whom these processes are damaged.

Several important case studies of patients who suffer from specific deficits to mid-level vision, like visual-form or apperceptive agnosia, have been reported in the literature (Milner et al., 1991; Riddoch & Humphreys, 1987). Deficits to mid-level vision are also central to broader neuropsychological syndromes, like Balint’s syndrome or simultanagnosia (Coslett & Saffran, 1991; Robertson, Treisman, Friedman-Hill, & Grabowecky, 1997) and co-occur with other problems, such as neglect (Driver & Mattingley, 1998; Friedrich, Egly, Rafal, & Beck, 1998) and prosopagnosia (Sergent & Signoret, 1992). Besides brain-damaged patients, unusual mid-level visual processing has been reported in a range of other patient groups, including developmental disorders such as autism spectrum disorders (Dakin & Frith, 2005), psychiatric problems like schizophrenia (Silverstein & Keane, 2011), and neurodegenerative disorders like Huntington’s disease (Lawrence, Watkins, Sahakian, Hodges, & Robbins, 2000) and Alzheimer’s disease (Binetti et al., 1998). From a scientific point of view, studying these patients is a highly valuable means of learning more about the mechanisms and the neural correlates of cognitive processes. For instance, the aforementioned case studies on brain-damaged patients (Milner et al., 1991; Riddoch & Humphreys, 1987) have contributed to our knowledge about the functional organization of the visual system. However, identifying interesting patients for research is a difficult and time-consuming process that would profit from a short screening test to facilitate the selection of patients that are of potential relevance for case study research. Besides the scientific applications, a screening test could also be relevant in clinical practice to aid diagnosis of visual problems and to guide rehabilitation and treatment.

The neuropsychological tests on visual perception that are currently used in clinical practice have several limitations that make them less suited to screening for deficits in perceptual organization. The existing measurements of mid-level vision are often confounded with high-level vision because stimulus material consists of recognizable objects, shapes, or letters, as in the Birmingham Object Recognition Battery (BORB; Riddoch & Humphreys, 1993) and the Visual Object and Space Perception Battery (VOSP; Warrington & James, 1991), which mainly focus on object recognition and spatial relations. In addition, the few subtests that tend to measure mid-level processes (e.g., figure–ground subtest in the BORB and the screening test in the VOSP) do not build on the extensive literature on mid-level processing and sometimes use rather idiosyncratic stimuli. However, there are some neuropsychological tests that depend less on semantic knowledge and are more specifically designed to measure certain aspects of perceptual organization, like perceptual grouping (Bender Gestalt test, Bender, 1938; Mooney Closure test, Mooney & Ferguson, 1951; or Hooper Visual Organisation Test, Hooper, 1983), figure–ground segregation (Poppelreuter-Ghent Test, in, e.g., Della Salla, Laiacona, Trivelli, & Spinnler, 1995), local and global processing (Embedded Figures Test, in, e.g., Barrett, Cabe, & Thornton, 1968; or Rey Complex Figure Test, Meyers & Meyers, 1996), or several of these processes (Motor-Free Visual Perception Test; Colarusso & Hammill, 2012). Unfortunately, it is seldom clearly described what the test exactly measures; norms are not always provided, and if they are provided, the norms are based on relatively small samples. In addition, it is often hard to gain access to or find copies of these tests, making comparisons between studies or different patient groups difficult to implement. In summary, the existing neuropsychological tests of visual perception mostly involve high-level processes or do not systematically measure different aspects of perceptual organization. This highlights the need for an instrument in which a wide range of processes of perceptual organization are measured as independently as possible from high-level vision, with a solid basis in the literature, a clear description of the processes measured by the subtests, and freely available data from a large norm sample.

While clinical testing and rehabilitation have seen many developments in digital technology, it is striking that a great deal of clinical testing still relies on paper-and-pencil testing. The challenges of testing patients with differing motor abilities and computer literacy often make paper-and-pencil tests a reliable and pragmatic option. Moreover, more sophisticated digital implementations of clinical tests often require specific hardware or software, which also compromises the ability to flexibly test patients when needed. Web-based testing might be a valuable alternative that is already available in social psychology for survey research and in cognitive psychology for testing cognitive abilities (e.g., WebNeuro; Silverstein et al., 2007). There are also tools within the domain of vision research for accessing low-level (acuity and contrast sensitivity) visual function (Bach, 2007) and for diagnosing hemianopia (Koiava et al., 2012). Advantages of Web-based testing include higher external validity because of larger and more diverse participant samples, automatic scoring and data collection, less organizational restrictions like room bookings or testing times, reduced costs and personnel burden, more highly motivated participants, and the possibility of studying cross-cultural effects (Gosling, Vazire, Srivastava, & John, 2000; Kraut et al., 2004; Reips, 2000). In addition, results appear to be consistent with corresponding paper-and-pencil tests (Gosling et al., 2000).

In the domain of visual perception, computerized tests are limited. However, there are numerous phenomena of interest that simply cannot be studied using paper-and-pencil tests—for instance, when motion is an intrinsic aspect of the phenomenon. In order to provide an effective test of numerous aspects of mid-level visual perception, it is therefore essential to develop a digital test. However, the development of this test should also be carefully guided by a number of methodological considerations regarding how easily this test can be used in clinical contexts. The complications involved in a test that requires specific software packages on specific computers means that an online implementation that can be accessed via any browser is probably the optimal platform, especially given that the test will also work on tablet computers. Given the rapid development of software and platforms for online implementation, it is clearly important to select an online implementation that is likely to last and will remain easy to manage. For this reason, the test primarily uses PHP and HTML5, although some parts of the online interface also make use of JavaScript. Rather than drawing stimuli online, all images are loaded as PNGs or animated GIFs. One of the drawbacks of our online implementation, however, is that one cannot make guarantees about the exact timing of events, due to unpredictable server delays. In principle, however, these timing delays are often of a similar order of magnitude to those faced when using a standard keyboard or USB-connection. Indeed Crump, McDonnell, and Gureckis (2013) could qualitatively replicate a number of cognitive science paradigms online that rely on differences in the range of hundreds of milliseconds. Nevertheless, while a timestamp (in seconds) is saved for every response the participant makes, the test is designed to rely on accuracy data alone, which is, in any case, a pragmatic choice in neuropsychological testing.

The online interface also enables us to communicate data to an online database using MySQL, which offers a number of methodological advances that are useful in clinical practice. First, by providing an automatic scoring system and a one-page printable report, clinicians immediately get feedback about the patient’s visual processing abilities without the need for time-consuming manual scoring. Second, these results can be compared with an age-matched norm sample, in which the clinician can flexibly set the age range with which they would like to compare their patient. Third, it offers clinicians and researchers an online database of all the data for all of the patients they have tested. At the time of writing, this database is also automatically backed up every hour by the University of Leuven central information services.

This introduction has highlighted the importance of mid-level visual assessment in research and in the clinical practice and the limitations of the currently available neuropsychological tests. These arguments indicate the need for the development of an easy-to-use screening test of perceptual organization.

Developing the Leuven-Perceptual Organization Screening Test (L-POST)

The primary goal of the L-POST is to screen patients for possible deficits in mid-level vision. The test is freely available at www.gestaltrevision.be/tests. In order to assess a broad range of mid-level functions, 15 subtests are used, each with only five items. In some instances, these subtests are designed to isolate a rather specific process (e.g., global motion or contour integration). In other subtests, the aim is not so much to target a specific process as to detect a clinically relevant behavioral deficit that might be missed by other tests (e.g., the Recognition of missing parts subtest).

The overall range of subtests were selected in order to cover most of the key processes in mid-level vision, from the grouping of individual elements, to the creation of contours or segmentation boundaries, to the assignment of figure–ground relationships (Wagemans, Elder, et al., 2012). Again, however, there is a bias in the test toward subtests that are historically or practically of relevance in a clinical context. In particular the Shape ratio discrimination (Efron) and the Embedded figure detection subtests were developed partly because of their long history in clinical testing. There are also two subtests that, in principle, require access to semantic information (or at the least lexical labels), to select the name associated with an object presented in isolation (subtest 14) or in a scene (subtest 15). These subtests are included, however, not to test high-level object recognition per se (there is already a wide variety of well-developed tests for that purpose) but, rather, to compare performance on both. This comparison could be informative for mid-level challenges faced in the real world. Most tests of object recognition present objects only in isolation, whereas in real life, objects are often encountered in cluttered complex scenes where segmentation of figure and ground is required. Hence, the comparison between these subtests depends on access to high-level vision but offers an important window into a potentially important daily life challenge requiring mid-level vision. However while the L-POST aims to cover a wide range of processes involved in mid-level vision (with a bias toward clinically relevant tests), it is by no means an exhaustive set of all the possible aspects of mid-level vision one could include. Some salient omissions are the representation of structure from motion and of shape from shading, the influence of crowding, the detection of symmetry, the use of Glass patterns or Navon figures, and so forth. This limitation reflects a pragmatic constraint while developing a screening test that can be quickly administered to identify patients with potential mid-level deficits for further testing.

While designing the format, procedure, and stimuli for the L-POST, we were guided by a number of principles and constraints. The first constraint was that all of the subtests had to be directly derived from theoretical work in the cognitive neuroscience of visual perception. This is critical not only for designing meaningful measures, but also in ensuring a dual role for the test, as a clinical screening instrument and as a method that can contribute to theoretical research. The theoretical basis for each of the subtests is detailed in the L-POST subtests section, where the stimuli and logic for each subtest are described.

The second principle was that the L-POST should be as user-friendly as possible both for our norm participants and, especially, for the patients. To reduce cognitive load, we use a matching-to-sample task in all subtests (Fig. 1). Participants are asked to choose one of three alternatives that is perceived as most similar to the target stimulus. For each of the 15 subtests, there are only five trials. This short number of trials enables one to rapidly screen for a wide range of mid-level phenomena while keeping the test duration to a pragmatic limit that is suitable for clinical use. The use of only five trials also ensures that patients who have difficulty with particular subtests do not become unmotivated, since they will swiftly move on to a subtest they may find easier. The L-POST can be administered in about 20–45 min. The test is designed to be suitable for patients with physical disabilities or cognitive problems (e.g., dementia), given that they have sufficient comprehension to participate. We use words in the subtests only when unavoidable, but because our primary goal is not to target language processing, these words can be read aloud by the testing clinician or researcher. A neglect-compatible version where stimuli are centrally aligned in a column is also available. In order to maintain consistency in the use of this neglect version, we have developed a short neglect test that can aid the clinician’s decision regarding whether or not the neglect-compatible version of the test should be used (see the Neglect test section). This should make the test easier for patients also suffering from neglect. In addition, we also include five practice trials before the start of the actual test that are based on the same format, in which the participants receive feedback and can only progress once they have selected the correct alternative.

Fig. 1
figure 1

Examples of the test design. a The target stimulus is shown on top of the screen, with the three alternatives horizontally arranged below the target. b Example trial of the neglect-friendly version, where all stimuli are presented vertically in one column and the target stimulus is placed on top

The third principle in designing the L-POST was that the test should be as user-friendly as possible for the administrators and researchers. The only material needed is a computer (tablet, desktop, or laptop) and an Internet connection. All data are automatically and centrally saved. This facilitates collecting normative data in large sets of different age groups. After completing all subtests, an immediate visualization of the patient’s results is available, together with a flexible comparison with specific norm groups.

Administrating the L-POST

The questionnaire

Before beginning the actual test, participants are presented with a short questionnaire asking for some biographical and medical information. Biographical questions concern year of birth, gender (male or female), handedness (left, right, or ambidextrous), sight (good vision, good vision with glasses or contact lenses, impaired vision), native language, country of residence, country of origin, age at with they started primary school, age at which they finished education, and educational level (high school, higher vocational qualification, bachelor or master’s degree, PhD, or other). Besides biographical information, we ask participants to report problems in motivation, concentration, general intellectual impairment, or general depression or other medical problems. For participants who indicate having brain injury, we ask for the cause of the brain damage, the date of the injury, and the side of the lesion.

Neglect test

Besides the questionnaire, we also provide an additional test that is specifically designed to assess whether the neglect version of the L-POST will be required. Note that this should not be considered as a test of neglect per se, but is rather intended to guide the clinician’s decision with regard to whether a patient with a diagnosed (or suspected) neglect would benefit from using the neglect version of the test. The link for this neglect test becomes available when entering the “brain injury” details of the patient, although the task can also be accessed directly by going to http://gestaltrevision.be/tests/neglect. The task in this neglect test is very simple: The patient has to click on all the squares on the screen. These squares are presented at the same positions as the stimuli in the L-POST subtests. A small “x” will appear in each square when it has been successfully clicked. When as many squares as possible have been clicked on, pressing “submit” will confirm the selection, and the next trial will start. There are five trials with different shades of gray. In order to enhance the consistency with which the test is used, we advise that, by default, the neglect version of the actual L-POST should be used when one or more mistakes are made in our purposefully tailored neglect test. However, some degree of clinical discretion will be required when making this decision.

Measuring perceptual organization

After the questionnaire and the neglect test, the actual perceptual organization test will begin. Participants are given instructions and an example item before they can start the test. In addition, they are asked (and given instructions) to set their browsing window to full screen. This will avoid having to actively scroll down the page to see all the stimuli, which would obviously add unwanted complications via demands on short-term memory. In the following matching-to-sample task, all items are presented in the same configuration: a target item centrally on top and three alternatives aligned in a row below the target (Fig. 1a). All stimuli are contained in a box of 180 × 180 pixels. In the neglect-friendly version, all items are presented in a centrally aligned column (Fig. 1b), with larger spacing between the target and the alternatives than between the alternatives. Participants can select an alternative by moving the computer mouse over the alternatives. This alternative is then highlighted by a blue square to facilitate monitoring the selection. A left click will confirm the answer and begin the next trial. To reduce cognitive load, the matching-to-sample task is the same in all subtests, with exactly the same instruction: “Choose the alternative that is most similar to the target stimulus.” In the instructions, we emphasize that no exact matching is necessary. We have tried to avoid any spurious low- or high-level cues to perform this task. All subtests are presented in a block design with a random order, and the correct alternative is located in one of the three possible locations chosen randomly from trial to trial. Before starting the test, we provide a practise session consisting of a separate set of five trials with stimuli similar to items in the different subtests. During the practice session, feedback is given, and participants can proceed to the next trial only once the correct answer has been selected. The test itself is administered without providing feedback.

Testing conditions

As was highlighted before, the L-POST was designed to be as user-friendly as possible for both participants and administrators. Total test duration is estimated to be 20–45 min, but if necessary, administration can be done in multiple sessions. The test can be stopped at any point, and the clinician has the option to restart the test later with the same participant; the test will then automatically begin from the last noncompleted subtest. Ideally, the administration of the test should take place in a quiet room. At the end of the test, we evaluate the testing conditions by asking participants to report technical problems, problems in having to scroll up or down to view all images, whether they were distracted while taking the test, and whether they filled in the test seriously.

Scoring

The test is entirely computerized, and the scoring is done automatically. At the end of the test, the examiner can view individual test results and compare them with a control sample. The norm group with which to compare the patient’s scores can be chosen by the clinician. We provide the possibility of selecting a norm group on the basis of a selected age range. There is also the possibility of comparing one’s patient with a group of other brain-damaged patients, potentially offering a further insight into the severity of the patient’s deficits relative to other patients. First, an overall score is given on the basis of the number of failed subtests. Second, a summary graph with the score on each individual subtest and the matching percentile (from the healthy norm sample) is presented. Additionally, we also show a summary graph with a norm sample of patients with brain damage and individual graphs of the score on each subtest. An example of the results screen is shown in Fig. 2. For each patient, this summary report can be printed as a one-page document that can be included in the patient’s file. Data are collected and analyzed centrally at the Laboratory of Experimental Psychology (University of Leuven), although the results from patients entered by a given clinician will never be published as part of a larger study without that clinician’s involvement and approval.

Fig. 2
figure 2

Example of the clinical interface for displaying the results for a given patient or control participant. This (printable) results screen is available in the menu after the test has been administered. Green bars reflect subtests with a score above the 10th percentile, while red bars indicate subtests with lower scores

It should be highlighted that, while the comparison with the norm sample and an indication of a deficit for each subtest are provided automatically, some degree of clinical discretion is required in interpreting these results. Like most clinical tests, there are multiple cognitive resources that could influence performance, and apparent deficits in perceptual organization could potentially result from executive problems or problems in shifting or controlling attention (Silverstein, 2008). More specifically in this context, potential problems in low-level vision should also be taken into account, for which the Freiburg Visual Acuity Test (FrACT) could provide a useful complementary test (Bach, 1996, 2007).

Inclusion and exclusion criteria

Only participants who decline to participate and participants with a very short concentration span (< 25 min, judged clinically) have to be excluded from participation. For inclusion of the data in our norm sample, the criteria were stricter. First, data from retakes of the test and from participants who did not complete all subtests were excluded. In addition, data for participants with a visual disorder that could not be corrected by glasses or contact lenses, problems in motivation, communication, general intellectual impairment, or general depression, or other related problems were not included in the norm sample. Also, data from suboptimal test conditions, as in the case of technical problems (slow Internet connection, problems in loading the images) or a small screen size that could not fit all stimuli at once, were excluded. At the end of the test, participants were asked whether they had filled in the test seriously and could indicate whether they were interrupted during the test on a 7-point scale ranging from not at all to continuously. Data from participants who reported an interruption level of 3 or higher or did not take the test seriously were also excluded from the norm sample.

Data

Normative data

The L-POST has been conducted on a large group of control participants without a history of brain lesion. Because of the online nature of the test, norm data can rapidly be collected. Our norm sample is increasing every day, and we have an average of 75 participants every week. Here, we report the data for 200 participants. We tested participants in two settings: 100 participants completed the L-POST at home under supervision of a student research assistant, and 100 participants took the test from home without supervision. A histogram of ages of the collective sample is shown in Fig. 3. All age groups from 18 to 88 are represented.

Fig. 3
figure 3

Histogram of ages in norm sample of 200 participants

Of our norm sample, 30.5% finished high school, 26% finished higher vocational education, 33.5% obtained a bachelor or master’s degree, 4.5% had a PhD, and 5.5% reported ‘other” as highest education level. In our sample, 39% of the participants were male, and 61% was female; 85.5% were right-handed, 11.5% left-handed, and 3% ambidextrous. In an ANCOVA, we found no evidence for an effect of the mode of testing, F(1, 196) = 1.66, p = .2, when controlling for age differences, F(1, 196) = 17.15, p < .001. From these assembled data, we calculated cutoff scores based on percentiles for each subtest score and for the overall score. The calculations are dynamic, since they are constantly updated whenever a new participant is added to the norm group. Cutoff is set at the 10th percentile. Descriptive statistics for each subtest based on this norm sample of 200 participants are presented in Table 1. Given that the maximum score is 5, it is clear from the value of the mean and standard deviation that the distributions of scores are skewed; it is also for this reason that we rely on the percentile to define the cutoff scores.

Table 1 Mean, standard deviation, and the 10th percentile cutoff score (out of 5) for each subtest

Illustrative case studies

Here, we would like to illustrate the use of the L-POST with two patients who have already been described in the neuropsychological literature. The first patient, MP, was reported by Braet and Humphreys (2007) as suffering a stroke at the age of 47, for which an MRI scan in 2006 revealed bilateral lesions to the posterior parietal cortices (including the superior parietal lobe and the intraparietal sulcus) extending more inferiorly in the left hemisphere. As a consequence, she experienced left extinction and dysgraphia. Her results on the L-POST are presented in Fig. 4.

Fig. 4
figure 4

Performance on the L-POST of patients a PF and b MP

The second patient, MP, was previously reported by Humphreys and Riddoch (2001) as suffering an aneurysm of the right middle cerebral artery in 1992. MRI and SPECT scans revealed damage to fronto-temporal-parietal regions in the patient’s right hemisphere, including the inferior frontal gyrus, the superior temporal sulcus, the supramarginal and angular gyri, and the postcentral sulcus. MP showed signs of unilateral left neglect in scanning tasks and reading, and he experienced short-term memory problems. Previous assessment of visual functions with VOSP and BORB indicated mild perceptual impairment (Humphreys & Riddoch, 2001). He performed below the 5th percentile on the dot-counting, position discrimination, and number location subtests of the VOSP. He had no problem in naming isolated objects, and figure–ground segmentation was at a normal level as measured by the overlapping figures of BORB. His performance on the L-POST (Fig. 4) is remarkably similar to what is indicated by VOSP and BORB, since dot counting also was impaired (Dot Counting), while object recognition (Recognition of objects in isolation) and figure–ground segmentation (Object Recognition in a Scene and Figure–Ground Segmentation) is preserved. However, the L-POST gives a more detailed overview of several mid-level functions that also reveal impaired motion perception (“Biological motion” and ‘Global motion detection”) and difficulties in grouping based on proximity (“Dot lattices”) and collinearity (“RFP fragmented outline”).

On the basis of our norm data, both these patients would be regarded as having a score indicative of a deficit in visual perception. Rather than basing this judgment on the patients overall score (44 correct trials out of 75 for MP and 64 for MP), we recommend that clinicians take into account the number of subtests for which the patient falls below the 10th percentile (with a deficit on 4 or more subtests indicating a deficit). Scores below the 10th percentile are illustrated here with a red bar. Thus, patient MP scores below the 10th percentile on 11 subtests, and patient MP scores below the 10th percentile on 6 subtests. It should be noted that the 10th percentile is recommended in clinical contexts because this test is intended as a screening tool to highlight potential problems for further testing. This criterion is also somewhat arbitrary for now, and we recommend that patients who fail an intermediate number of subtests (4–8) receive further testing (which could include repeating the L-POST), to be certain that a deficit has been correctly identified. In the future, we hope to supplement this criterion with a Bayesian latent group analysis, such that we can assign the probability for each patient tested that this patient belongs to the healthy control group or to a patient group with brain damage effecting visual perception (Ortega, Wagenmakers, Lee, Markowitsch, & Piefke, 2012). This latent group analysis will, however, depend upon having a much larger patient sample than we currently have available.

The other salient feature of the test that should be clear from these case studies is that different patients can manifest with different patterns of results across the subtests. This not only should enhance the sensitivity of the test to pick up on a broader range of problems, but also should aid in the interpretation of the strengths and weaknesses of a given patient. It should be clear that the indication of a deficit on a given subtest is contingent on the norms for that subtest. Thus, the same score on different subtests (e.g., a score of 3 on Global Motion vs. Biological Motion Detection) does not always indicate the same level of performance, because of the variability within our norm sample. Given that most neuropsychological tests have very limited norm data, this comparison should provide a more informative indication of whether a patient’s performance really does fall outside the distribution of performance in the healthy population.

The L-POST subtests

Fine shape discrimination

The Fine Shape Discrimination subtest taps into the participant’s ability to discriminate between fine, local shape differences within a globally similar class of objects. We used three classes of parameterized, novel objejcts (i.e., unfamiliar without an immediate association with everyday objects) called spikies, cubies, and smoothies (the smoothies are illustrated in Fig. 5), which have been used in human fMRI and comparative monkey research to identify how different areas of the ventral visual stream and particularly, the occipital and temporal lobes process shapes (Op de Beeck, Baker, DiCarlo, & Kanwisher, 2006; Op de Beeck, Torfs, & Wagemans, 2008). Within this stimulus set, objects can differ in their global shape envelope and in their local features (protrusions). Within each trial, we manipulated only local features. The correct alternative is the same exemplar as the target object, while the incorrect alternatives show different exemplars from the same class. This subtest therefore examines whether the participant can still extract the changes in a number of small details within a globally similar shape. Performance on this task is potentially related to performance on the Embedded figure detection and Recognition of missing part subtests, where participants have to extract information regarding parts in a larger whole. Additionally, participants have to represent small differences to the shape that will rely on mid-level shape processing in the ventral stream (Op de Beeck et al., 2006; Op de Beeck et al., 2008) and cannot be resolved using a coarse or low spatial frequency representation of the input in this subtest. On different trials, we used exemplars from the different classes (two spikies, two cubies, and one smoothie) developed by Op de Beeck and colleagues, which have a different shape, as compared with the example shown here (Fig. 5). In order to reduce the degree to which participants can solve this subtest on the basis of a very simple pixel-by-pixel comparison strategy, all of the alternative versions have been enlarged (relative to the target).

Fig. 5
figure 5

Target and alternatives for the Fine shape discrimination subtest

Shape ratio discrimination (Efron)

The Shape Ratio Discrimination (Efron) subtest is an adaptation of a classic neuropsychological test designed by Efron (1969) and later used by Warrington (1985) to assess visual form agnosia (see also Goodale, Milner, Jakobson, & Carey, 1991). In this task, the three alternatives all have exactly the same surface area as the target, but only one of these shapes has exactly the same height and width, thus also maintaining the same aspect ratio (Fig. 6).

Fig. 6
figure 6

Example of the Shape ratio discrimination (Efron) subtest

Dot lattices

The Dot Lattices subtest assesses the patient’s sensitivity to grouping by proximity using dot lattices. Dot lattices are simple arrays of dots that can be perceived as being grouped along different orientations (Kubovy, Holcombe, & Wagemans, 1998; Kubovy & Wagemans, 1995). Grouping strength along different orientations depends on the distance between adjacent dots and, in particular, on the ratio between the distances associated with one orientation and another. The orientation with the shorter distance will tend to be perceived as grouped, but the strength of this grouping depends on the exact ratio. This way of testing grouping by proximity has been used frequently in the recent literature, including combinations with grouping by similarity (e.g., Kubovy & Van den Berg, 2008) and grouping by alignment or good continuation (e.g., Claessens & Wagemans, 2005, 2008). However, because proximity is a basic grouping principle, we included it in this form in the L-POST. In this test, participants have to identify the grouped orientation, based on a spacing ratio 0.8. In order to avoid an entirely low-level matching strategy, however, the correct alternative has a slightly different ratio of 0.88 but results in the same perceived dominant orientation. One of the incorrect alternatives has the opposite ratio to the target, while the other incorrect alternative has the same ratio but is rotated by 24°. As a result, both incorrect alternatives yield a different dominant orientation, as compared with the target stimulus (see Fig. 7 for an example). By using different densities on the five trials of this subtest, we induce variation in the stimuli. Stimuli were constructed with the GERT toolbox (Demeyer & Machilsen, 2012).

Fig. 7
figure 7

Example of the Dot lattices subtest, in which we target grouping by proximity

Radial frequency pattern fragmented outline

The Radial Frequency Pattern (RFP) Fragmented Outline subtest requires the participant to select the correct outline of a shape on the basis of a fragmented version of the same shape contour. Fragmentation has been used in many clinical tests before (e.g., Gollin Incomplete Figures Test, Gestalt Completion Test), but the way we manipulate it here allows for a more controlled focus on mid-level grouping and shape formation. The subtest taps into the participant’s ability to group different elements into a closed figure and, more specifically, to use the principle of good continuation in grouping the different line fragments (Koffka, 1922). The effective use of good continuation may, in turn, depend upon an “association field” (Field, Hayes, & Hess, 1993), in which proximal edges of the same orientation lead to a mutual facilitation effect. The stimuli for this (and the next two subtests) are based on RFPs (Wilkinson, Wilson, & Habak, 1998). RFPs were used here because they offer a well-parameterized stimulus space for constructing novel shapes (i.e., unfamiliar without an immediate association with everyday objects), on the basis of a number of simple features. The frequency of the sine wave components of these RFPs determines the number of “bumps” along the shape, the phase determines their position, and the amplitude determines the size of each bump. This enables one to easily control the overall complexity of the resulting shapes. All RFP shapes in the L-POST were generated by summing three radial frequency sine wave components with a constant frequency and amplitude but with a random phase angle. Next, the result is plotted in polar coordinates. The fragmented figure is slightly smaller than the target to avoid an entirely low-level or pixel-based matching strategy. The three shape alternatives are line drawings of closed shapes based on RFPs of similar complexity (Fig. 8). Stimuli were constructed with the GERT toolbox (Demeyer & Machilsen, 2012).

Fig. 8
figure 8

Example of the RFP fragmented outline subtest

RFP contour integration

The RFP Contour Integration subtest is complementary to the RFP Fragmented Outline subtest. In this subtest, not only are participants presented with a fragmented shape, but also the shape is constructed from and embedded in a field of Gabor elements. This test requires that the participant can group the elements along the contour (based on collinearity) and segment this shape as a figure from the Gabor noise field in the background. In other words, this subtest focuses on the interplay between grouping target elements and segregating target from background elements, a typical mid-level process that has been the focus of much recent research (Machilsen, Novitskiy, Vancleef, & Wagemans, 2011; Machilsen, Pauwels, & Wagemans, 2009; Machilsen & Wagemans, 2011; Vancleef et al., 2013). The field of Gabor elements was constructed using the GERT toolbox (Demeyer & Machilsen, 2012). The outline of the target shape was defined by the co-alignment of the Gabor elements along the contour; all other Gabor elements in the background have a random orientation. The contour in the target display has the same, but a slightly larger, shape as the correct alternative, which is a black outline of the same shape on a white background. The incorrect alternatives also consist of complete contours on a white background, but with different shapes of equal complexity, as compared with the target shape (Fig. 9).

Fig. 9
figure 9

Example of the RFP contour integration subtest

RFP texture segmentation

The RFP Texture Segmentation subtest also requires the participant to match stimuli on the basis of their shape, but in this case, the shape of the target is constructed using a difference in the texture of the figure and background elements. Previous research has highlighted that an element array composed of distinct shapes (Beck, 1966) or orientations (Julesz, 1981) can create a compelling percept of distinct regions. More recent work has also helped to characterize the neural dynamics of the segmentation of a figure from a background on the basis of such texture differences (Lamme, Rodriguez-Rodriguez, & Spekreijse, 1999; Lamme, Van Dijk, & Spekreijse, 1992). We constructed texture stimuli with the GERT toolbox (Demeyer & Machilsen, 2012). The density of the texture displays is explicitly controlled such that the spacing between the elements inside and outside of the target shape is kept constant. By using RFPs, we also keep the complexity of the shapes roughly equivalent within and between trials. The correct alternative for the texture target in this subtest is a black silhouette of the same shape (Fig. 10). The correct shape is, however, slightly larger than the target, again to avoid a very low-level matching strategy. The incorrect alternatives have different (RFP-defined) shapes of equal complexity.

Fig. 10
figure 10

Example of the RFP texture segmentation subtest

Global motion detection

The Global Motion Detection subtest measures the ability of the participant to detect coherent motion in moving dots on the basis of the principle of common fate. This Gestalt principle results from the grouping of elements that move in the same direction. This principle is arguably manifest in the detection of a coherent set of dots moving in one direction among randomly moving distractor dots, using random dot kinematograms (Williams & Sekuler, 1984). The recognition of coherent motion for these stimuli is likely to rely on area MT (Britten, Shadlen, Newsome, & Movshon, 1993) and the MST complex in humans (Grossman et al., 2000) located on the temporo-parietal-occipital junction. In this subtest, the target motion direction is illustrated with arrows both pointing and moving in one direction (Fig. 11). In the alternatives, random dot kinematograms are presented with 75% of the dots moving coherently in one direction. The incorrect alternatives simply have different directions to the target motion; one of these directions is opposite, and the other at right angles. We used only upward and downward motion as target stimuli.

Fig. 11
figure 11

Example of the Global motion detection subtest. The red arrows are used to illustrate the direction of motion of 75% of the dots in each kinematogram; they are not shown in the actual test displays

Fig. 12
figure 12

Example of the Kinetic object segmentation subtest, in which we target grouping based on the law of common fate, as well as the extraction and maintenance of a simple shape description. Again, the red arrows and shading are used here for illustration; they are not present in the actual test displays

Kinetic object segmentation

The Kinetic Object Segmentation subtest also exploits the brain’s sensitivity to coherent motion signals, but in this case, rather than simply pooling these signals to determine an overall motion direction, distinct motion signals are presented in different parts of the image, generating a (kinetic) contour at the border where the direction of the motion changes (Fig. 12). These distinction motion signals therefore structure the image and create a compelling percept of a shape (or figure) surrounded by a distinct background. Neuroimaging research has revealed a number of areas that respond to these stimuli and, more specifically, one area, labeled KO (kinetic occipital), that seems to be very selectively sensitive for shapes defined by motion boundaries (Orban et al., 1995; Van Oostende, Sunaert, Hecke, Marchal, & Orban, 1997). Kinetic boundaries have also been used in recent psychophysical research (e.g., Segaert, Nygård, & Wagemans, 2009). Our stimulus was constructed using the GERT toolbox (Demeyer & Machilsen, 2012). The stimulus consists of an array of Gabor elements. Each Gabor element remains stationary but has lighter and darker regions that shift in position. The darker region can be seen to drift over the lighter part of each Gabor. The relative positions of the darker and lighter regions of the Gabor are randomized over the image. The kinetic contours in the image are constructed by having the Gabor elements in the different parts of the image drift at a different phase, such that (in the example below) when the Gabors inside the circle appear to be drifting down, those in the background appear to be drifting up. Participants have to match the structure formed by the drifting Gabor elements to the same static structure, using a white figure and black background.

Biological motion

In the Biological Motion subtest, we assess the ability to use motion and structure from motion cues to be able to match patterns of dots on the basis of their conformity to biological motion. The term biological motion was introduced by Johansson (1973) to refer to the ambulatory movement patterns of terrestrial bipeds and quadripeds. Biological motion displays depict a moving human figure using a few isolated points of light attached to the major joints of the body. Naive observers readily interpret the moving pattern of dots as representing a human figure. There is a wealth of psychophysical research with so-called point-light walkers (for a review, see, e.g., Blake & Shiffrar, 2007). Subsequent fMRI (Grossman et al., 2000) and patient neuropsychology (Saygin, 2007) has revealed a number of areas, but particularly the Superior Temporal Sulcus (STS), that appear to be critical for the representation and perception of this stimulus. In order to test for sensitivity to biological motion, a coherent walker was presented as a target (Fig. 13). A coherent walker, but with a different facing direction, is the correct alternative (in the example, the target is walking to the right; the correct option shows a walker heading directly toward the observer). The change in facing direction was used because it was clear from pilot testing that some patients would solve the task just on the basis of the local motion profile of one dot when walkers of the same facing direction were used. The other two options are constructed from spatially scrambled dot patterns with the same motion parameters (Troje, 2002). The five trials within this subtest differ in terms of the viewpoint, identity, and speed of the walkers. This turns this subtest into one of the most demanding of the L-POST.

Fig. 13
figure 13

Example of the Biological motion subtest, in which we target dynamic grouping, integration into a human body-and-action representation, and viewpoint invariance

Dot counting

The Dot Counting subtest requires participants to rapidly assess the number of dots flashed on a display (Fig. 14). Humans have a remarkable ability to almost instantaneously recognize the number of dots presented, when this number is kept relatively low. This ability is regarded as distinct from general counting and is referred to as subitizing (Kaufman, Lord, Reese, & Volkmann, 1949). There is some consensus that participants can subitize up to around four or five objects. Above this number, additional factors come into play in numerosity judgments, which are supported by distinct neural substrates (Demeyere, Lestou, & Humphreys, 2010). In our subtest, observers need to be able to count between four and seven dots. This range was selected to push the limits of what a patient could do, such that we might detect more than just patients with a deficit to subitizing per se, by testing numbers that require more than just subitizing. This dot-counting test is already included in another (paper-and-pencil) neuropsychological battery (VOSP). One of the limitations of these paper-and-pencil versions, however, is that rather than having to rely on visual subitizing mechanisms, patients sometimes serially count the dots on the page, even using their finger to guide their attention. We decided to render this strategy impossible for this subtest by having the dots flash on and off every 200 ms and reappear at random locations. The incorrect alternatives are distributed to ensure that the correct number is not always the middle number (as is the case in Fig. 14).

Fig. 14
figure 14

Example of the Dot counting subtest, in which we target grouping at short presentation times, as well as visual short-term memory, to be able to track numbers over several consecutive frames

Figureground segmentation

The Figure–Ground Segmentation subtest assesses the ability to assign figure–ground relationships in basic shape perception. On every trial, a square is presented, with four curves cut out of it. Two of these curves are simply cut out of the square (and thus reveal the background behind the figure), while another two of these curves are the result of placing a circular disk above the square. These circular disks allow for the perceptual interpretation that these curves are not inherent to the square but, rather, that the square continues as a background, with these circles standing as figures on top (Nakayama, Shimojo, & Silverman, 1989). Resolving this figure–ground assignment thus enables one to interpret the square as a more completed shape, which actually continues under these two disks in an example of amodal completion (Michotte, Thinès, & Crabbé, 1964). The correct alternative is therefore correct only in the sense that it is the most plausible match to the shape shown above, given the potential figure–ground relationships. The two incorrect alternatives always include one curved whole that is unambiguously cut out from the main square and one that is ambiguous with respect to figure–ground assignment. Note that we also change the surface properties (texture and colour) of the figure, as well as the background from target to alternatives, to further enhance invariant shape processing (see Fig. 15 for an example).

Fig. 15
figure 15

Example of the Figure–ground segmentation subtest, in which we target figure–ground segmentation based on amodal completion

Embedded figure detection

The Embedded Figure Detection subtest is derived from a long-established test of an observer’s ability to detect a basic figure (or part) when it is embedded in a more complex context (or whole). This kind of part–whole encoding is a critical feature of mid-level vision. An inability to detect and extract a local part from an embedded context could suggest a deficit in hierarchical part–whole encoding. Performance on this test is known to vary among the normal population and is often used as a motivation for the idea that some people are more ‘local’ in their information processing (e.g., Witkin, 1962, but see Milne & Szczerbinski, 2009). This local bias in visual information processing is also evident in autism—in fact, manifesting as an advantage for people with autism (Frith, 1989; although see White & Saldaña, 2011). Here, we have constructed a simplified version of the test, to assess whether the participant is able to extract a simple stimulus from a more complex context (Fig. 16). Only the correct alternative contains exactly the same simple configuration. The distractors share a good deal of structurally similar parts and features, which turns this subtest into one of the most difficult ones from the L-POST. Although the complexity of the figures varies between trials, we try to keep this constant within each trial.

Fig. 16
figure 16

Example of the Embedded figure detection subtest, in which we target the processing of parts and wholes. Specifically, participants should be able to destroy a spontaneous whole percept into its fragments and recombine these into the simpler shape representation of the target, consisting of all the correct features, parts, and spatial relations

Recognition of missing parts

With the right situational or task demands, healthy observers can often miss important changes in the details of their environment (Simons & Levin, 1998). Informal observations of patient populations can sometimes indicate more extreme versions of such a phenomenon. For example, a patient might recognize a given object but fail to recognize that a part of this object is missing or incorrect. In the Recognition of Missing Parts subtest, we have developed a formalization of this inability to detect changes to important details in the context of a meaningful object. In each target, we present an object in which a specific part is missing. The same part is missing in the correct alternative. In one of the incorrect alternatives, a different part is missing, and the other incorrect alternative is an intact image of the object with no parts missing (see Fig. 17). This kind of part–whole encoding is a central feature of mid-level vision, which may also relate to a local or global focus (on the whole object vs. the informative part), that is likely to be related to performance on the previous Embedded figure detection subtest. However, in this case, the completion is enhanced by top-down object knowledge, rather than by perceptual organization factors as such. In other words, patients with intact gist processing but disturbed focus on structural details (presence or absence of parts, location on gaps) would have difficulties with this subtest.

Fig. 17
figure 17

Example of the Recognition of missing parts subtest, in which we target the processing of parts and wholes. Specifically, participants should not automatically complete the image into the most familiar object representation but should spot the gap (missing part) and maintain its relative location

Recognition of objects in isolation

The Recognition of Objects in Isolation subtest requires visual object recognition and access to semantic/conceptual knowledge about the objects. This subtest is, however, not really intended as a test of object recognition per se but, rather, forms a pair with the next subtest, which assesses whether the same object can be recognized in a cluttered background. While the L-POST is not intended to target object recognition, it was apparent from the range of existing tests in the clinical literature that the contribution of mid-level vision to object recognition (in terms of scene segmentation) was an issue that was not tested. Indeed, this neglect of the importance of scene segmentation in object recognition perhaps goes hand in hand with an implicit neglect of this problem in much of the literature on visual object recognition in theoretical research with normal observers (Wichmann, Drewes, Rosas, & Gegenfurtner, 2010). Similarly, research in which grouping and segmentation are tested with representations of real-life objects is still relatively rare (for exceptions, see Nygård, Sassi, & Wagemans, 2011; Nygård, Van Looy, & Wagemans, 2009; Sassi, Machilsen, & Wagemans, 2012; Sassi, Vancleef, Machilsen, Panis, & Wagemans, 2010). Testing for a patient’s differential ability to recognize objects in cluttered scenes (as opposed to white backgrounds) also seems to be much more clinically relevant in diagnosing the kinds of visual problems that might really affect a patient’s daily life. In this subtest, participants are presented with a colored object on a clear white background. The response alternatives are written names of three objects (Fig. 18). If a participant is not able to read the object names, the clinician administering the test will need to read them aloud. The objects are exact mirror-reversed versions of the objects used in the Recognition of Objects in Context subtest. In the randomization of the subtests of the L-POST, this subtest, in fact, always occurs after the 15th subtest (Recognition of objects in a scene) and simply serves as a baseline to establish whether the patient can recognize these objects when they are presented in isolation, thus allowing one to pin down a role (and thus a deficit) specifically in scene segmentation if participants pass this subtest but fail the next.

Fig. 18
figure 18

Example of the Recognition of objects in isolation subtest

Recognition of objects in a scene

The Recognition of Objects in a Scene subtest forms a pair with the previous subtest and requires the observer not only to recognize the object in the scene, but also to be able to search for and segment this object from its cluttered background (Fig. 19). To achieve this, the object is placed in a semantically neutral background. The background can be said to be neutral because the target object is not incongruous with its background (it is not out of place), but neither would the background necessarily lead one to predict the target object per se (the five trials include a cup on a desk, a pair of scissors on a desk, a small statue on a bookshelf, a watch on a bathroom shelf, and a photo frame on a bookshelf). The objects are not at the center of these images, nor do they particularly stand (or pop) out. The alternatives are written names of three objects. The two incorrect alternatives are names of objects that could potentially appear in the same scene but are not present in the photograph. If a participant is not able to read the object names, the clinician administering the test is allowed to read the alternatives aloud.

Fig. 19
figure 19

Example of the Recognition of objects in a scene subtest, in which we target object recognition in a cluttered background