The first papers to use the World-Wide Web as a research tool were presented at the 1996 Society for Computers in Psychology conference in Chicago. Most of these papers were published in the following May issue of Behavior Research Methods, Instruments & Computers, the name of the current journal until 2005. Some of the papers presented include papers examining how to do research on the web (e.g., Schmidt, 1997) and two papers that actually reported the results of experiments conducted on the web in 1995 (Krantz, Ballard, & Scher 1997; Reips, 1996, 1997). As such, web research has been conducted online for more than 20 years now (Reips, 2015) and it seems a good time to examine the growth and health of the use of the web as a research tool. (See also Wolfe (2017) in this issue for another perspective.)

A brief history

Before examining the current state of web research, it seems good to review how the field got to this point in a brief overview. When the first researchers were using the web, they had no guide other than curiosity about this method. As such, these first researchers were careful and exploratory. The primary focus of a number of the early studies was to determine if the web yielded reliable and valid results (e.g., Krantz, 1997; Krantz & Dalal, 2000) and if it could help in solving some of the issues that limited laboratory research, such as low power, limited external/ecological validity, and low generalizability (Reips, 1996, 1997, 2000). It is a bit of an understatement, given the large number of online studies conducted today, that many of the early findings were encouraging. Still, there were issues faced by these researchers that were thought to potentially impact the quality of the data. Some of the issues that have been studied were the high rates of drop-out of web participants, the possibility of data fraud, multiple submissions, differences in measurement, and the possibility that web samples differed from laboratory samples and – for some research purposes - from the population at large. These issues led some of these early researchers to examine the issues and if possible find ways to ameliorate them. Many researchers collected both laboratory and web samples to compare the results (e.g., Krantz, 1997 – and other examples). In this way, differences between web and laboratory samples could be observed. Sometimes the laboratory and web led to different results (see Krantz & Dalal, 2000 for a summary of some of these early studies), leading to what later became known as the (non-)equivalence debate (e.g., Buchanan, 2002, 2007). Ulf-Dietrich Reips (e.g., 1996, 1999) examined and developed techniques such as the high hurdle and warm-up techniques to reduce the degree of dropout during the study (Frick, Bächtiger, & Reips, 1999). He also pioneered the multiple-site entry technique to develop a way to determine if different ways of accessing the study or different sampling leads to different responses in studies (Reips, 2000, 2002). Researchers also examined different methods to test for data fraud (Schmidt, 2007).

The growth of the use of the web was rapid (Reips, 2001). Data in Fig. 1 from three prominent sites that list online psychological studies show the rapid growth of psychological research in the early years of web research (Krantz, 1996; Reips, 1995; Reips & Lengler, 2005). Other evidence of the growth of web research as a legitimate method can be found in the writing of textbooks on the topic. The first textbook was by Birnbaum (2001), who developed an approach he termed the lowest common denominator (or “bare bones”) using the simplest techniques possible to minimize the barrier between participant and experiment. Particularly in the early days of research when bandwidth was more limited, using sophisticated methods for web research could impose a limitation on the sample collected. The method used html, simple javascript, and cgi for data collection. The NSF and APA had sponsored several advanced training institutes with Birnbaum, Göritz, Krantz, McClelland, McGraw, Reips, Schmidt, and Williams (materials at http://ati.fullerton.edu/ or in a more recent version: http://iscience.uni-konstanz.de/archive/reips/upto2005site/) to help scholars learn these techniques. After attending one of these training institutes, Fraley (2004) developed a text with a more advanced technical approach to developing online studies. However, this text still relied on cgi for data collection – a method many modern servers do not employ. Early edited books include Internet für Psychologen (Batinic, 1997), Online Research (Batinic, Werner, Gräf, & Bandilla, 1999), Psychological Experiments on the Internet (Birnbaum, 2000), Dimensions of Internet science (Reips & Bosnjak, 2001), Online Social Sciences (Batinic, Reips, & Bosnjak, 2002), and The Oxford Handbook of Internet Psychology (Joinson, McKenna, Postmes, & Reips, 2007) with a section on Internet-based research. Moreover, in 2010, the APA published an advanced text on web research (edited by Gosling & Johnson, 2010). Notable are two early special journal issues in Experimental Psychology, edited by Reips and Musch (2002) and in Social Science Computer Review, edited by Taylor (2002).

Fig. 1
figure 1

The rapid increase in the number of studies posted on two of the major sites for posting studies in the early years of Web Research. Data from the Web Experimental Psychology Lab (Reips, 1995) include data from his later site, the web experiment list (Reips & Lengler, 2005).

More recently, there have been several articles that have examined the use of crowd-sourcing as a method of collecting data. The most common method has been to use Amazon’s Mechanical Turk as a method (e.g., Burmester, Kwang, & Gosling, 2011; Chandler, Mueller, & Paolacci, 2014). Participants are called workers as they are paid and might do any number of tasks, not just psychological studies. The idea is that these workers would be better motivated to complete the studies since they are paid, small sums usually, for completion. Many studies find that data quality is comparable to studies posted on the open web (Buhrmester et al., 2011) but there are issues of non-naïveté among participants as they often repeat similar types of studies (Chandler, et al., 2014) and because they organize in unions and use forums to communicate about the tasks. Our own work finds Turkers to produce lower quality data than participants from other online sources, e.g. in a personality test development task they had faster response times and increasingly more so, and out of 64 items with different means, Turkers scored more in the middle of the scale in 50 items (Reips, Buffardi, & Kuhlmann, 2011).

A picture of the current state

At the current time, there is vigorous use of the web for psychological research and related fields. Experienced researchers and students use the web frequently to conduct research. For example, Krantz (1996) posted links to over 500 studies last year. Many of the studies are being conducted by student researchers, both undergraduate and graduate.

However, there does not seem to be a coherent approach to educating these new researchers. Neither Birnbaum (2001) nor Fraley (2004) deal with methods of data communication that many servers do not support. Moreover, these books do not cover more recent methods for data communication like AJAX and JSON. In addition, a quick review of textbooks for undergraduate research methods classes find that none of them cover online methodologies (e.g., Lewandowski Jr., Ciarocco, & Strohmertz, 2016; Nester & Schutt, 2015). Several cover specialized methods such as qualitative methodologies, case studies, and single-subject designs, but no mention is made of doing research online, let alone the specialized techniques appropriate for doing research online. This lack of coverage stands in stark contrast to the number of undergraduates engaging in online research as part of their undergraduate education. Both authors can attest to posting a large number of links to undergraduate research studies on their sites (Krantz, 1996; Reips & Lengler, 2005). This use of online research methodology by undergraduates has also been mentioned by colleagues (e.g., Mangan, personal communication, 17 November, 2016) and is evident from many departments licensing commercial online software and the many invitations to teach workshops and summer schools the authors and other pioneers of Internet-based research received and keep receiving. It seems, given the frequency of posting of online studies by undergraduates, that it is more likely that students will encounter online research methods than these important but less common methods such as those commonly mentioned in undergraduate textbooks.

Anecdotal evidence from the authors has found some consequences of this lack of education for the use of online research methods. In our experience in posting links to online research methods, we have continuously experienced issues with inadequately constructed studies. For example, it is not uncommon to have studies with titles that are full of demand characteristics. One author (JK) just recently had a research supervisor e-mail him to correct a title just for this reason, a first. Titles are often needlessly long, written obviously for an academic audience, and will not communicate clearly to a general audience. Reips (2002) lists five common methodological and security issues he frequently observed in Internet-based experimenting, i.e. unprotected directories, public access to confidential data, revealing the experiment’s design and/or structure, ignoring the Internet’s technical variance, and – very frequently – improper use of form elements. For example, he states that about one-third of studies submitted at the time for inclusion with the Web Experimental Psychology Lab (Reips, 1995, 2001) or the web experiment list (Reips & Lengler, 2005) contain dysfunctional or biasing form elements, such as selection menus with pre-selected content options that will enter the pre-selected value in case of skipping over the item. Figure 2 from Reips (2010) shows several of these widespread mistakes as they appear in real examples from the web. Even more problematic – as ethically questionable – are studies that carelessly use materials originally intended for limited offline use (e.g., face picture databases), use deception, or address sensitive topics, which carries special issues when the researcher is not be present with the participant. There are even studies wanting to be posted that lack basic design issues such as browser or smartphone compatibility, contact information, or informed consent for the participant.

Fig. 2
figure 2

Example portion of an error-struck web questionnaire showing several errors in design and use of form elements that will inevitably lead to biased results. From Reips (2010), reprinted with kind permission from APA

These problems could well be the result of the lack of education in online research methods mentioned above. However, as scholars, we are all aware of the problems of anecdotal evidence. Thus, a survey of current researchers using the web was conducted. The survey serves as an update of the survey by Musch and Reips (2000) conducted on early experimental researchers on the web. Many of the questions were copied over with some new questions added and a few modifications to deal with the changes since the original survey was conducted. While this survey asks some of the same questions as Gureckis et al. (2015), there are a few differences. The questions, while overlapping, are not the same. The present survey is more comparable to Musch and Reips and asks a broader array of questions. Moreover, the present survey examines those that have done online research while Gureckis et al. also included participants that had not conducted an online study. Where there is overlap, the similarity or differences will be noted.

Method

Participants

In the Musch and Reips (2000) study there were 21 researchers, mostly faculty, who were most of the early experimenters on the web who ran true experiments. The participants were recruited electronically via online groups and pages, and personal e-mails. Participants in the current survey were 71 researchers recruited via online posting of the study on Krantz (1996) and personal e-mails to researchers. These participants had varying levels of education. Almost half (46.5%) of the participants had a doctorate, 25.4% had some post-baccalaureate education, 15.5% had a bachelor’s degree while the rest were either in college or had some other form of qualification.

Materials

Most of the questions for the current survey were taken from Musch and Reips (2000), but limited to a single page, whereas the original survey was on three pages. These questions were of varied format including Likert, multiple selection, and open ended. The original study focused on the use of web experiments, the current study opened the questions to consider any form of Internet-based research on psychological topics. Most of the original researchers focused on experiments, but survey methodologies have become the dominant form of online psychological study. The questions asked addressed issues of why responding researchers used the online methodologies, concerns about online methods, technology they used, and questions about their particular study. Some questions from the original study were not repeated here as the focus was on determining the early history of web experimenting as well as attitudes. Additional questions were added to the current survey asking the current participants about testing of their study and their knowledge of experimental methodology, online methodology, and knowledge of the literature of online research. The current survey is still available at: http://psych2.hanover.edu/research/SeniorProjects/2016/Survey/

Procedure

The survey was administered online. The participant was first given an informed consent page. Clicking to continue to the survey was construed as consent to participate. No identifying information was collected about the participants at the time, nor was the survey hosted on any site that might independently collect such identifying information. The participant then proceeded to the survey and after submitting the data was directed to a debriefing page. As there was no deception, the debriefing merely reiterated the informed consent and gave contact information for the first author.

Results

One of the first questions examined by Musch and Reips (2000) is a set of questions asking “How important were the following factors for your decision to conduct your research on the web?” The questions were asked on a 7-point Likert scale scored from 0 to 6 with the higher the rating the more important the reason. The answers ranged from “Not important at all” to “Very important.” The same questions were asked in the current study with an additional question to rate the ease of doing the study. Figure 3 shows the average and 95% confidence intervals for the ratings on these questions. The dark bars are from the Musch and Reips (2000) study and the lighter bars are from the current participants. Overall, it seems that similar factors were important to the early and current researchers. Number of participants and statistical power are very important to both the original and current participants. Large sample size also turned up as important to almost all of the respondents in Gureckis et al. (2015). It is interesting to note that this desire for larger samples sizes does not seem to translate to overall greater statistical power (Wolfe, 2017). One apparent difference in the present results and those of Gureckis et al. is the fact that almost all participants reported fast data collection as important, the most often reported benefit in that study, while speed is important but not the most important item in either the present survey or Musch and Reips (2000). The difference may lay in the way participants responded. In Gureckis, the number of participants that selected each option was recorded, and in the present study and that by Musch and Reips, the participants rank ordered the importance of speed of data collection. These combined studies suggest that speed is important to almost all researchers but perhaps rarely the most important criteria in doing a web study. The ability to replicate lab studies and reach special populations remains less important. The lack of interest in reaching special populations is somewhat perplexing as this is one of the unique abilities of online research to greatly extend the boundaries of psychological knowledge (Birnbaum, 1999; Mangan & Reips, 2007). The cost of the study is more important in the current sample. This factor was mentioned by about 75% of the respondents in Gureckis et al. (2015). This change may reflect the larger number of undergraduate students, the changes in ease of funding research, or a change in the population that does online research. A combination of reasons is also possible. The newer question about the ease of doing the study is also very highly rated. The current data suggests that cost and ease of study are the two most important factors in doing online research currently, though number of participants is nearly as highly rated.

The next question asked by Musch and Reips (2000) asked researchers to rank responses to a series of issues related to the question, “How problematic do you think were the following potential problems in your study?” The same scale and anchors were used. The results from both Musch and Reips (2000) and the current sample are shown in Fig. 4. The pattern of concerns is very similar for both groups. Most of the concerns are at the midpoint of the rankings of importance or below. Two issues that trend towards being more important in the current sample, are manipulation/fraud and ethical problems but these issues are still not seen as very important. It is possible that the original sample was more concerned with hardware issues but that study focused on experiments that might be more impacted by hardware and there was great variation in that small sample over that concern.

Fig. 3
figure 3

Ranking by researchers of how important different reasons were for engaging in online research. The dark bars are from Musch and Reips (2008) and the light bars are from the current study. The overall pattern is similar but it seems that cost has become much more important to current researchers. Ease of study is also important to these researchers

The next two sets of questions were asked just of the current sample. The first unique question to be examined deals with how they “determine the quality of your study design and instruments?” Rigorous testing of a study is particularly important on the web since a participant might use a wide range of devices and be in a wide range of environments (Krantz & Dalal, 2001). The importance has only grown given the increased usage of mobile devices, particularly phones, to run studies, and for example Reips (e.g., 2002, 2010) regularly emphasizes various stages of pre-testing of online study materials with different types of pre-testers (experimenter, experts, friends, sample from sampled populations) in courses to undergraduates, because lack of pre-testing is one of the largest predictors of failures in Internet-based research. Figure 5 shows the percentage of respondents from the current sample that indicated that they used any of these testing methods. They could select any number of options. As can be seen, the most common way to test their study is to use “pre-existing and tested materials.” No other testing method reaches 50% of the participants. Two researchers indicated an “other” method of testing. In the comments section for this question, one indicated using a non-researcher to run the survey with comments and the other indicated using the method of survey testing outlined by de Vaus (1996).

Fig. 4
figure 4

Ranking by researchers of how problematic they believed different issues were. The plotting is the same as for Fig. 1. The pattern of concerns is similar for both samples. It seems that the current sample is more concerned about fraud and ethical issues than were the original sample of researchers

The final questions to be examined here regard the familiarity of the participants with the research, in general, and online research in particular. The participants ranked their experience as a researcher, a web researcher, and with the literature on web research on a visual analog scale (Reips & Funke, 2008). The anchors were “Novice,” scored 0, and “Highly Skilled,” scored 200. The results of their responses to these questions are shown in Fig. 6. The bars indicate their mean responses. The error bars are standard deviations and the dots are each of the individual responses. There are only 64 responses to these questions. As can be seen clearly from the graph, there is a wide range of responses to these questions with the mean capturing little of the information. The lowest average ranking is in the self-reported knowledge of the literature on web research, but participants used nearly the full range of the scale on all three question. While the range is still large on the self-reported knowledge of the web research literature, it does suggest that many researchers are not reading extensively before performing an online study. Bolstering this conclusion, there is a troublingly strong positive correlation between web research experience and familiarity with the web literature, r (62) = 0.75, p < 0.01. The least experienced web researchers do not seem to be spending the time reading the literature and learning its content before conducting their study. Adding this correlation to the observation of the lack of information on web research methods in textbooks and the reliance on previously used materials, it suggests a lack of preparation for doing online research among the least experienced researchers (Fig. 6).

Fig. 5
figure 5

Percentage of participants indicating that they use any of the listed methods of testing their study

Fig. 6
figure 6

Ranking of the participants’ experience as researchers, web researchers, and with the literature on web research. The scale goes from 0 to 200, with the higher the ranking the greater the experience

Discussion

The findings of the current survey indicate that, compared to Musch and Reips (2000), current researchers are more concerned about the low cost of doing an online study than before. The ease of doing a study is also highly rated. In addition, current participants trend toward being more concerned about fraud and ethical issues than original participants, but this increase is tempered by the fact that they still do not indicate great concern with either issue. Researchers seem to rely on previously validated materials for testing their studies and indicate a wide range of familiarity with the literature on web research.

On the positive side, many researchers do take the time to gain experience with web research in particular and the literature on doing web research. It is clear that some researchers are aware of the need to test their studies and validate their particular study.

However, there are several signs of concern. Beginning with the lack of presentation on web research in undergraduate textbooks there are a string of issues that suggest that many web researchers do not approach conducting an online study thinking about the unique issues raised by these online methods. First, researchers seem primarily motivated by number of participants, cost, and ease when choosing the web as a research platform (Fig. 3). It is particularly noteworthy that the ability to access special populations is not a highly ranked reason for doing web-based research. These choices suggest that the principal motivation for doing online research is convenience rather than considerations of whether this method is the best for getting the answers sought. When these observations are combined with the modest level of concern about issues of online research, the level of testing the study, and the fact that some researchers engage in online research with both little experience and little knowledge of web research methods, there is cause for concern that too much of the web research being conducted is being conducted inadequately.

It would be interesting to see what happens as more researchers use the emerging technologies to help them develop online studies (e.g., de Leeuw, 2015; Gureckis, et al., 2016; Lange, Kühn, & Filevich, 2015; Litman, Robinson, & Abberbock, 2016). Some of these methods are linked to crowd-sourcing, particularly Amazon Turk (Gureckis et al., 2016; Litman et al., 2016), but others seem designed to help research for studies using the web at large (de Leeuw, 2015; Lange et al., 2015). On the positive side, many of the best practices of web research can be incorporated into the online study methodology, which will reduce the need for the researchers to be informed about these best practices. For example, Reips designed WEXTOR (http://wextor.eu) from the beginning to automatically guide and nudge study authors into using best practices (e.g. non-obvious file naming) when creating web experiments with the tool. However, it seems unwise to completely rely on the study development platform to take care of all the pertinent design principles. A useful feature of these technologies would be tutorials and queries to help researches know what practices they ought to follow. For example, testing a study on multiple platforms is quite important. In experiments, this step can be vital (Krantz, 2001). A built-in query could ask if the study has been tested on different platforms when it is about to be published, much like you get queries when you try to delete a file. With so much of the literature available online these days, links to pertinent papers in the tool would be helpful as well. It still should be noted that these development tools do not help with one of the most persistent issues in all psychological research performed on computers, and that is the reliance on consumer grade equipment (Wolfe, 2017).

Our survey has its limitations. For example, as a self-report it cannot tell the difference between what is claimed and what is true about behavior and knowledge. While many researchers reported their materials had been validated, we do not know if they were validated for use on the Internet – and, in fact, many researchers may not be aware of the related literature that determined that an instrument needs to be tested in the mode it is later to be used in; online study materials need to be validated for online use (Buchanan, 2007; Buchanan, Johnson, & Goldberg, 2005).

Of course the rapid development of Internet-based research methods, including the use of mobile devices for tracking throughout a period of time (e.g., Stieger, Lewetz, & Reips, manuscript submitted for publication) and novel ways of using these devices, such as the accelerometer (Kuhlmann, Reips, & Stieger, 2017), to do non-reactive measurements could make textbook authors leery of adding web-based methods as they are in constant flux. However, most students will not be using the most advanced methods and the development of the apps for these studies is beyond the ability of most students. Most students will be conducting either experiments or surveys over the web and as such would benefit from basic instruction in web-based research methods. Perhaps the best response from the data here is for those faculties who have students who conduct research on the web to contact publishers of their research methods textbook to add content related to web-based research methods.