Elsevier

Advances in Computers

Volume 83, 2011, Pages 109-181
Advances in Computers

Chapter 3 - CAPTCHAs: An Artificial Intelligence Application to Web Security

https://doi.org/10.1016/B978-0-12-385510-7.00003-5Get rights and content

Abstract

Nowadays, it is hard to find a popular Web site with a registration form that is not protected by an automated human proof test which displays a sequence of characters in an image, and requests the user to enter the sequence into an input field. This security mechanism is based on the Turing Test—one of the oldest concepts in Artificial Intelligence—and it is most often called Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA). This kind of test has been conceived to prevent the automated access to an important Web resource, for example, a Web mail service or a Social Network. There are currently hundreds of these tests, which are served millions of times a day, thus involving a huge amount of human work. On the other side, a number of these tests have been broken, that is, automated programs designed by researchers, hackers, and spammers have been able to automatically serve the correct answer. In this chapter, we present the history and the concept of CAPTCHAs, along with their applications and a wide review of their instantiations. We also discuss their evaluation, both from the user and the security perspectives, including usability, attacks, and countermeasures. We expect this chapter provides to the reader a good overview of this interesting field.

Introduction

During more than 60 years, the purpose of Artificial Intelligence (AI) has been to design and build a machine able to think as a human person. The term AI was coined by John McCarthy in the summer of 1956, during a historic meeting held at the Dartmouth College by him and other leaders of this field over decades. While first works in AI were surprising, and some of these and other researchers believed this problem would be solved in 20 years, it was not. Evaluating how far we are today from meeting this ambitious goal is out of the scope of this chapter, but no one can deny that this field has provided hundreds of ideas that have been at the heart of Computer Science, and often driven it.

For instance, Semantic Networks emerged as a mechanism to formalize the semantics of Natural Languages to enable Machine Translation and to, ultimately, make computers understand and express through human languages. While machines still do not have this ability, Semantic Networks are a representation language that has for instance evolved to languages used to describe software models, like class diagrams in Object Oriented Programming. Semantic Networks are also the underlying mechanism to the Semantic Web and several other technologies. Computer Science is plagued with subproducts of AI, and Information Technologies (IT) Security is not an exception—modern spam filters include Machine Learning and Natural Language Processing techniques; Machine Learning is also used in antivirus research, intrusion detection, etc.; and there are many other examples.

In 1950, Alan Turing, one of the fathers of AI, faced the question: “Can machines think?” He conceived a test to evaluate it, the Turing Test. Theoretically, using this test it is possible to decide if a machine has reached the human levels of communication and reasoning. Current computers are still not able to pass that test, but as with Semantic Networks, other applications of this test have emerged.

Perhaps the most prominent application of the Turing Test to the field of Information Security is the concept of CAPTCHA. This word, which stands for Completely Automated Public Turing test to tell Computers and Humans Apart,1 makes reference to an automated test aimed at confirming that the user of an application is a human being instead of a robot, that is, a program which would probably misuse a service or resource. The concept, often named also reverse Turing test or Human Interactive Proof, has emerged in the context of Web security. This can be explained by the huge and ever-increasing popularity of Web-based services that range from low level ones like online data and information storage, to more complex and extremely popular ones like Web-based e-mail, word processors, image processors, etc., and ultimately, Social Networks as the ultimate communication platform for hundreds of millions of users. However, the concept can be applied to any other kind of IT service or tool.

The Turing Test was conceived by Alan Turing and presented in a rather philosophical paper [124] discussing the properties of a machine able to think like a human being. The test is based in the “imitation game,” described as follows: given a man and a woman, an interrogator (the player or judge) has to guess who is who by addressing questions to them. The goal of one of them is make the interrogator fail, while the other has to help the player to make a correct guess. The experiment or game is done using typewritten text, so the voices may not help the player. Questions made by the interrogator may include, for example, “Will you please tell me the length of your hair?” but the answers may be absolutely correct, partly false, etc. To answer the question “Can machines think?” Turing proposed to substitute the man or the woman by a computer. If after the game, the player makes a wrong guess, then we can deduce that the machine has reached the human levels of performance at communication and intelligence.

This test has been considered as a real “think proof” for many years, but not exempt of criticism as human intelligence is still hard to define (e.g.,  [99], [101]). Moreover, no machine has yet passed the test, although a public contest, The Loebner Prize in AI [75], is yearly carried out since 1991, and computers and programs have greatly evolved since Turing posed his question. However, a number of systems have been inspired by this test, Eliza [135] being the most popular one. Weizenbaum's Eliza is a relatively simple pattern-matching program able to simulate the conversation of a therapist or psychoanalyst. For instance, given a sentence by a person: “I believe I do not love my mother,” the program might answer: “Please tell me more about your mother,” based on an rule that fires when the person writes the word “mother.” Conversational programs have evolved to commercial products that are used for, for example, Customer Relationship Management [128].

The first mention of ideas related to Automated Turing Tests seems to appear in an unpublished manuscript by Moni Naor [85], who in 1996 proposed a theoretical framework that would serve as the first approach in testing humanity by automated means. In Naor's humanity test, the human interrogator from the original Turing Test was substituted by a computer program. The original goal of his proposal was to present a scheme that would discourage computer software robots from misusing services originally intended to be used by humans only, much in the same sense of stopping an automation based attack though human identification. Basically, he proposed an adaptation of the way identification is handled in cryptographic settings to deal with this situation. In cryptography, when one party A wants to prove its identity to another party B, the process is a proof that A can effectively compute a (keyed) function that a different user (not having the key) cannot compute. The identification process consists of a challenge selected by B and the response computed by A. What would replace the keyed cryptographic function in the proposed setting would be a task where humans excel, but machines have a hard-time competing with the performance of a 3 year old. By successfully performing such a task, the user proves that he or she is human.

In Naor's work, the collection of problems should possess the following properties:

  • 1.

    It is easy to generate many instances of the problem, together with their unambiguous solution, without requiring human intervention at all.

  • 2.

    Humans can solve a given instance effortlessly with very few errors. Providing the answer should also be easy.

  • 3.

    The best known programs for solving such problems fail on a nonnegligible fraction of the problems, even if the method of generating instances is known. The number of instances in a challenge will depend on this fraction.2

  • 4.

    An instance specification is succinct both in the amount of communication needed to describe it and in the area it takes to present it to the user.

Naor suggested a number of areas from vision and natural language processing as possible candidates for such problems: gender recognition, facial expression understanding, finding body parts, deciding nudity, naive drawing understanding, handwriting understanding, speech recognition, filling in words, and disambiguation. As will be seen later in this chapter, many of his suggestions have been applied throughout years to develop automated Turing Tests. Luis von Ahn et al. would later formalize and substantiate Naor's conceptual model in what would be known as CAPTCHA [129].

Major differences between the Turing test and Naor's proposal include:

  • In a Naor test, the judge is also a computer, and the goal is not to verify that the other part in the communication is a computer as proficient as a person, but to confirm that he or she is actually a person. That is the reason why these tests are often called reverse Turing tests.

  • In the Turing test, the communication is conversational. However, Naor's proposal involves a variety of sensory inputs.

  • In the Turing test, the conversation lasts until the player is able to take a (possibly wrong) decision. However, in these reverse Turing tests, the player has only a chance (posing a challenge to the user and screening their answer) to take the decision. The challenge may be very difficult to verify that only a person can solve it, but not as difficult as an average human user is unable to solve it.

The first known application of reverse Turing tests (named CAPTCHAs from now on) was developed by a technical team at the search engine AltaVista. In 1997, AltaVista sought ways to block or discourage the automatic submission of URLs to their search engine. This free “add-URL” service was important to AltaVista since it broadened its search coverage. Yet some users were abusing the service by automating the submission of large numbers of URLs, in an effort to skew AltaVista's importance ranking algorithms. Andrei Broder, Chief Scientist of AltaVista, and his colleagues developed a filter. Their method is to generate an image of printed text randomly so that machine vision (optical character recognition, OCR) systems cannot read it but humans still can. In January 2002, Broder stated that the system had been in use for “over a year” and had reduced the number of “spam add-URL” by “over 95%.” A U.S. patent was issued in April 2001.

In September 2000, Udi Manber of Yahoo! described this “chat room problem” to researchers at Carnegie Mellon University (CMU): “bots” were joining online chat rooms and irritating the people there by pointing them to advertising sites. How could all bots be refused entry to chat rooms? CMU's Prof. Manual Blum, Luis A. von Ahn, and John Langford developed a (hard) GIMPY CAPTCHA, which picked English words at random and rendered them as images of printed text under a wide variety of shape deformations and image occlusions, the word images often overlapping. The user was asked to transcribe some number of the words correctly. A simplified version of GIMPY (EZ-GIMPY), using only one word-image at a time, was installed by Yahoo!, and was used in their chat rooms to restrict access to only human users. A sample of several GIMPY CAPTCHA images are shown in the Fig. 1.

Other researchers were stimulated by this particular problem, and the Principal Scientist of Palo Alto Research Center, Henry Baird, leaded the development of PessimalPrint, a CAPTCHA that uses a model of document image degradations that approximates ten aspects of the physics of machine-printing and imaging of text. This model included spatial sampling rate and error, affine spatial deformations, jitter, speckle, blurring, thresholding, and symbol size. Their paper [32] was the first refereed technical publication on CAPTCHAs.

These works and many others discussed in this chapter have been widely used by the World Wide Web-related industry, from search engines like Yahoo! or Google, to Web e-mail providers (Gmail, Hotmail, etc.), and ultimately, nearly all major Social Networking sites like Facebook or MySpace, to prevent automated subscription, posting, and in general, automated misuse of their services to human users. CAPTCHAs have evolved to become a commodity, as they are integrated as plugins in Content Management Systems (CMSs), for example, Drupal, or even as Web Services. A major instance of a reverse Turing test provided as a Web Service is reCAPTCHA [47] (shown in the Fig. 2), started by the previously mentioned researcher Louis von Ahn. This service can be integrated in any Web page to typically protect a Web form, and it has become very popular, serving more than 30 million instances of the challenge every day. Now this service is provided by Google.

Moreover, the popularity of CAPTCHAs is so big that companies have emerged to take advantage of it, by using them as a platform for serving ads to the consumers [117], as shown in Fig. 3. A very interesting feature of CAPTCHA ads is that it is the very user who writes the brand message into the text field provided, so they effectively read and even learn the message.

On the opposite side, the popularization of CAPTCHA technologies has pushed a number of (quite often successful) attempts to circumvent these kinds of challenges. Attacks include automated bots able to understand the CAPTCHA itself, attacks to the programming structure of the form to avoid solving the CAPTCHA, and crowd sourcing. These attacks are further discussed in the Section 5.1.

Section snippets

General Description of CAPTCHAs

The general goal of a CAPTCHA is to defend an application against the automated actions of undesirable and malicious bot programs on the Internet. To achieve this goal, CAPTCHAs are programs designed to generate and grade tests that most humans can pass, but current computer programs cannot. A CAPTCHA works as a simple two-round authentication protocol as follows [141]:

  • S(ervice)  C(lient): a CAPTCHA challenge

  • C  S: response

Such challenges are based on hard, open AI problems [129]. It is important

Types of CAPTCHAs

There are many possible classifications of CAPTCHAs. For the sake of clarity, we divided them into two broad categories, represented in the Fig. 8: visual and Nonvisual. Among the first group, visual, there are in turn two main categories: OCR-based and image-based. Among the second group, nonvisual, there are again two main categories: audio-based and cognitive-based. In the following sections, the main proposals in each category are reviewed.

An important issue regarding this classification is

Evaluation of CAPTCHAs

CAPTCHAs have been proposed as a security mechanism to prevent automated systems to abuse services intended only for human users. So far, we have presented a number of CAPTCHA proposals that make use of a wide range of techniques for obscuring an underlying message (most often, a sequence of characters) for making its deciphering hard for bots, while trying to keep it solvable by humans. As a consequence, there is a clear trade-off between making things hard for bot while keeping them easy for

Security and Attacks on CAPTCHAs

CAPTCHAs are a mechanism to protect Web resources against massive access. Therefore, the security of a CAPTCHA solution relies not only on the hardness of the underlying AI problem, but on the implementation and other details. As discussed in Section 2.2, relevant properties about CAPTCHA security include the ability to make robots fail, to be based in an open algorithm, to implement a challenge able to resist for many years, to be based on really random generators, and to employ large

Alternatives to CAPTCHAs

In previous sections, we have presented a number of CAPTCHA technologies and their major pitfalls in terms of human efficiency and bot protection. Many CAPTCHAs nowadays pose usability problems that imply a limiting barrier for persons with disabilities. Moreover, CAPTCHA technologies are being routinely broken, and there are even relatively cheap commercial services in the hackers underground that provide CAPTCHA solving [18].

In consequence, protection against abuse of Web-based services

Conclusions and Future Trends

In this chapter, we have presented a survey of CAPTCHAs, a widespread security mechanism for protecting Web forms and other services from automated attacks, performed by hackers, spammers, etc., to abuse them. We have described their main characteristics, types, evaluation and security aspects, along with some alternatives to them.

From this chapter, two main conclusions arise:

  • Some of the available CAPTCHAs, either in the research or in the commercial domains, are still hard to break. However,

José María Gómez Hidalgo holds a Ph.D. in Mathematics, and has been a lecturer and researcher at the Universidad Europea de Madrid for 10 years, where he has been the Head of the Department of Computer Science. Currently he is R&D and Training Director at the security firm Optenet. His main research interests include several Artificial Intelligence topics like Natural Language Processing and Machine Learning, with applications to Information Access, Adversarial Information Retrieval and

References (147)

  • A. Basso et al.

    Preventing massive automated access to web resources

    Computers & Security

    (2009)
  • C.J. Hernandez-Castro et al.

    Pitfalls in CAPTCHA design and implementation: the Math CAPTCHA, a case study

    Computers & Security

    (2010)
  • Ads Captcha—Captcha Advertising, when ads meets captcha

  • OpenCMS, the Open Source Content Management System/CMS

  • Amazon Mechanical Turk

  • Jcap (Captcha Validation Javascript)

  • A. Elias et al.

    Enhanced captchas: using animation to tell humans and computers apart

  • Akismet—Stop Comment Spam and Trackback Spam

  • B. Paul et al.

    CAPTCHAs: The Good, the Bad, and the Ugly

  • B.-Y. Ricardo et al.

    Adversarial Information Retrieval in the Web

    UPGRADE (European Journal for the Informatics Professional)

    (2007)
  • H.S. Baird

    Document Image Defect Models and their Uses

  • H.S. Baird et al.

    Implicit captchas

  • H.S. Baird et al.

    Scattertype: a legible but hard-to-segment captcha

  • H.S. Baird et al.

    A highly legible captcha that resists segmentation attacks

  • H.S. Baird et al.

    Scattertype: a reading captcha resistant to segmentation attack

  • B. Richard et al.

    Towards human interactive proofs in the text-domain

  • J.P. Bigham et al.

    Evaluating existing audio captchas and an interface optimized for non-visual use

  • D. Bradbury

    Humans + porn = solved captcha

    Network Security

    (2007)
  • B. Elie et al.

    How good are humans at solving captchas? a large scale evaluation

  • C. John

    The Evolution of Malicious IRC Bots

  • captchas.net: Free CAPTCHA-Service

  • CMU Sphinx—Speech Recognition Toolkit

  • The Bongo CAPTCHA

  • The ESP-PIX CAPTCHA

  • A.A. Chandavale et al.

    Algorithm to break visual captcha

  • C. Kumar et al.

    Designing human friendly human interaction proofs (hips)

  • C. Kumar et al.

    Using machine learning to break visual human interaction proofs (hips)

  • K.-T. Chen et al.

    Identifying mmorpg bots: a traffic analysis approach

    EURASIP J. Adv. Signal Process

    (2009)
  • M. Chew et al.

    Baffletext: a human interactive proof

  • M. Chew et al.

    Image recognition CAPTCHAs

  • M. Chew et al.

    Collaborative filtering captchas

  • C. Richard et al.

    Making captchas clickable

  • A.L. Coates et al.

    Pessimal print: a reverse turing test

  • C. Jing-Song et al.

    A captcha implementation based on 3d animation

  • C. Jing-Song et al.

    A 3-layer dynamic captcha implementation

  • D. Dancho

    Inside India's CAPTCHA solving economy

  • D. Greg Kochanski et al.

    A reverse turing test using speech

  • R. Datta et al.

    Exploiting the human machine gap in image recognition for designing captchas

    Information Forensics and Security, IEEE Transactions on

    (2009)
  • A. Desai et al.

    Drag and drop: a better approach to captcha

  • Drupal—Open Source CMS

  • C. Dwork et al.

    On Memory-Bound Functions for Fighting Spam

  • J. Elson et al.

    Asirra: a captcha that exploits interest-aligned manual image categorization

  • F. Joseph et al.

    Turing Trade: a hybrid of a Turing test and a prediction market

  • R. Ferzli et al.

    A captcha based on the human visual systems masking characteristics

  • P.B. Godfrey

    Text-based captcha algorithms

  • P. Golle

    Machine learning attacks against the asirra captcha

  • P. Golle et al.

    Preventing bots from playing online games

    Comput. Entertain.

    (2005)
  • reCAPTCHA: Stop Spam, Read Books

  • G. Rich et al.

    What's up captcha?: a captcha based on image orientation

  • J. Grossman et al.

    Hacking Intranet Websites from the Outside—Javascript Malware Just Got a Lot More Dangerous

  • Cited by (27)

    • A survey of CAPTCHA technologies to distinguish between human and computer

      2020, Neurocomputing
      Citation Excerpt :

      In summary, it is generally recognized in the field of network security that CAPTCHA design with both security and usability has become increasingly difficult [170,213]. According to the literature review [4,30,214,215], most of the existing CAPTCHAs use highly differentiated objects in their tests, which are vulnerable to being exploited by programs. Considering that users can more easily identify subtle differences between similar objects than machines [128], current research focuses on using similar objects for man-machine discrimination tests.

    • All about uncertainties and traps: Statistical oracle-based attacks on a new CAPTCHA protection against oracle attacks

      2020, Computers and Security
      Citation Excerpt :

      Some of such suggestions were subsequently used to create CAPTCHAs for real-world systems, while other researchers developed alternative designs. The general principle behind CAPTCHAs is to use theoretically hard-AI problems as the basis of generate challenges that can only be solved by humans but not computers alone (Chow et al., 2019; Hidalgo and Alvarez, 2011; Li et al., 2012). Many CAPTCHAs are based on image classification problems.

    • Development of two novel face-recognition CAPTCHAs: A security and usability study

      2016, Computers and Security
      Citation Excerpt :

      In this section, we provide a brief overview of the various types of Captchas, operationalize the usability criteria for our usability study, and provide a structured presentation of literature findings on security and usability of Captchas. The most common types of Captchas are text-based, audio-based, image-based or video-based (Basso and Bergadano, 2010; Hidalgo and Alvarez, 2011; Roshanbin and Miller, 2013; Yan and El Ahmad, 2008b).3 The noCaptcha version of reCaptcha, which has been introduced recently by Google, implements a different approach: noCaptcha analyzes how a user interacts with a website and verifies that s/he is human by having her/him click on the “I'm not a robot” checkbox.4

    • Geo-reCAPTCHA: Crowdsourcing large amounts of geographic information from earth observation data

      2015, International Journal of Applied Earth Observation and Geoinformation
      Citation Excerpt :

      CAPTCHAs can be classified based on the medium used for the testing, such as texts, images or audio files. An overview of algorithms and methods for all classes of CAPTCHAs is given by Hidalgo and Alvarez (2011), and more recently by Roshanbin and Miller (2013). The most common CAPTCHA is the text-based CAPTCHA (e.g., “BaffleText” by Chew and Baird, 2003).

    View all citing articles on Scopus

    José María Gómez Hidalgo holds a Ph.D. in Mathematics, and has been a lecturer and researcher at the Universidad Europea de Madrid for 10 years, where he has been the Head of the Department of Computer Science. Currently he is R&D and Training Director at the security firm Optenet. His main research interests include several Artificial Intelligence topics like Natural Language Processing and Machine Learning, with applications to Information Access, Adversarial Information Retrieval and Information Security including spam filtering and children protection in the Internet. He has taken part in around 15 research projects, leading some of them. José María has coauthored a number of research papers in the topics above. He is Program Committee member for several conferences and reviewer for a number of research journals. He has also reviewed research project proposals for the European Commission. Currently he serves as Chairman of the European Normalization Committee CEN/TC 365 “Internet Filtering.”

    Gonzalo Alvarez received his M.S. degree in Telecommunications Engineering from the University of the Basque Country (Spain) in 1995 and his Ph.D. degree in Computer Science from the Polytechnic University of Madrid (Spain) in 2000. He is a tenured scientist at the Spanish National Research Council (CSIC), at the Information Processing and Coding Department at the Applied Physics Institute. He is interested in cryptology and Internet security, fields in which he has authored over 400 articles in journals and conferences and many books. His scientific and dissemination experience is complemented with a deep practical knowledge about real world Internet security acquired through several projects both publicly and privately funded as security architecture designer, secure application developer, and security auditor.

    View full text