Tutorial: Applying Machine Learning in Behavioral Research

Turgeon, Stéphanie; Lanovaz, Marc J.

doi:10.1007/s40614-020-00270-y

Tutorial: Applying Machine Learning in Behavioral Research

Original Research
Published: 10 November 2020

Volume 43, pages 697–723, (2020)
Cite this article

Perspectives on Behavior Science Aims and scope Submit manuscript

1340 Accesses
19 Citations
2 Altmetric
Explore all metrics

Abstract

Machine-learning algorithms hold promise for revolutionizing how educators and clinicians make decisions. However, researchers in behavior analysis have been slow to adopt this methodology to further develop their understanding of human behavior and improve the application of the science to problems of applied significance. One potential explanation for the scarcity of research is that machine learning is not typically taught as part of training programs in behavior analysis. This tutorial aims to address this barrier by promoting increased research using machine learning in behavior analysis. We present how to apply the random forest, support vector machine, stochastic gradient descent, and k-nearest neighbors algorithms on a small dataset to better identify parents of children with autism who would benefit from a behavior analytic interactive web training. These step-by-step applications should allow researchers to implement machine-learning algorithms with novel research questions and datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applications of Artificial Intelligence Methodologies to Behavioral and Social Sciences

Article 12 December 2019

Tutorial: Lessons Learned for Behavior Analysts from Data Scientists

Article 25 May 2023

Notes

These data are available at https://osf.io/yhk2p/.
There was no significant linear association between the features.
The last line of your Anaconda Prompt or Terminal screen should begin with <myenv>. If it begins with <base>, you have not activated your environment correctly.
Do not copy the line numbers (on the left). These numbers are meant to guide the reader through each code block. A line with no number indicates that the line is a continuation of the line above. It should also be noted that Python code is case sensitive.
For example: C:/Users/Bob/Documents/. If you copy the file location from the property menu of Windows Explorer, you need to replace the backslashes with forward slashes.
For those unfamiliar with matrices, we can call and manipulate specific locations in the matrix using a bracket [i, j], where i is the row number and j the column number. Python begins indexing (numbering of rows and columns) at 0 and the last value is excluded from ranges. Therefore, data_matrix[0, 1] refers to the first row (index = 0) and second column (i.e., index = 1). In the current example, data_matrix[:, 2:4] refers to all rows for the third and fourth columns of the .csv file (indices = 2 and 3).
Lines that are part of a loop (i.e., indented lines of code) must be preceded by a tab. In our code block, the spaces at the beginning of the lines (i.e., following the numbers) represent this tab. If you struggle with indentation or running the code, we recommend that you consult and use our ML_step-by-step.py file available freely in the online repository.
We did not include artificial neural networks because they require larger datasets than our current sample size.

References

Bishop, C. M. (2006). Pattern recognition and machine learning. New York, NY:Springer.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324.
Article Google Scholar
Burgos, J. E. (2003). Theoretical note: Simulating latent inhibition with selection ANNs. Behavioural Processes, 62(1–3), 183–192. https://doi.org/10.1016/s0376-6357(03)00025-1.
Article PubMed Google Scholar
Burgos, J. E. (2007). Autoshaping and automaintenance: A neural-network approach. Journal of the Experimental Analysis of Behavior, 88(1), 115–130. https://doi.org/10.1901/jeab.2007.75-04.
Article PubMed PubMed Central Google Scholar
Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new perspective. Neurocomputing, 300, 70–79. https://doi.org/10.1016/j.neucom.2017.11.077.
Article Google Scholar
Christodoulou, E., Ma, J., Collins, G. S., Steyerberg, E. W., Verbakel, J. Y., & Van Calster, B. (2019). A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology, 110, 12–22. https://doi.org/10.1016/j.jclinepi.2019.02.004.
Article PubMed Google Scholar
Chung, J. Y., & Lee, S. (2019). Dropout early warning systems for high school students using machine learning. Children & Youth Services Review, 96, 346–353. https://doi.org/10.1016/j.childyouth.2018.11.030.
Article Google Scholar
Coelho, O. B., & Silveira, I. (2017, October). Deep learning applied to learning analytics and educational data mining: A systematic literature review. Brazilian Symposium on Computers in Education, 28(1), 143–152. https://doi.org/10.5753/cbie.sbie.2017.143.
Article Google Scholar
Dawson, N. V., & Weiss, R. (2012). Dichotomizing continuous variables in statistical analysis: A practice to avoid. Medical Decision Making, 32(2), 225–226. https://doi.org/10.1177/0272989X12437605.
Article PubMed Google Scholar
Dietterich, T. (1995). Overfitting and undercomputing in machine learning. ACM Computing Surveys, 27(3), 326–327. https://doi.org/10.1145/212094.212114.
Article Google Scholar
Ding, L., Fang, W., Luo, H., Love, P. E. D., Zhong, B., & Ouyang, X. (2018). A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memory. Automation in Construction, 86, 118–124. https://doi.org/10.1016/j.autcon.2017.11.002.
Article Google Scholar
Hagopian, L. P. (2020). The consecutive controlled case series: Design, data-analytics, and reporting methods supporting the study of generality. Journal of Applied Behavior Analysis, 53(2), 596–619. https://doi.org/10.1002/jaba.691.
Article PubMed Google Scholar
Harrison, P. L., & Oakland, T. (2011). Adaptive Behavior Assessment System-II: Clinical use and interpretation. San Diego, CA:Academic Press.
Irwin, J. R., & McClelland, G. H. (2003). Negative consequences of dichotomizing continuous predictor variables. Journal of Marketing Research, 40(3), 366–371. https://doi.org/10.1509/jmkr.40.3.366.19237.
Article Google Scholar
Jessel, J., Metras, R., Hanley, G. P., Jessel, C., & Ingvarsson, E. T. (2020). Evaluating the boundaries of analytic efficiency and control: A consecutive controlled case series of 26 functional analyses. Journal of Applied Behavior Analysis, 53(1), 25–43. https://doi.org/10.1002/jaba.544.
Article PubMed Google Scholar
Lanovaz, M. J., Giannakakos, A. R., & Destras, O. (2020). Machine learning to analyze single-case data: A proof of concept. Perspectives on Behavior Science, 43(1), 21–38. https://doi.org/10.1007/s40614-020-00244-0.
Article PubMed PubMed Central Google Scholar
Lee, W.-M. (2019). Python machine learning. Indianapolis, IN:Wiley.
Leijten, P., Raaijmakers, M. A., de Castro, B. O., & Matthys, W. (2013). Does socioeconomic status matter? A meta-analysis on parent training effectiveness for disruptive child behavior. Journal of Clinical Child & Adolescent Psychology, 42(3), 384–392. https://doi.org/10.1080/15374416.2013.769169.
Article Google Scholar
Linstead, E., Dixon, D. R., French, R., Granpeesheh, D., Adams, H., German, R., . . . Kornack, J. (2017). Intensity and learning outcomes in the treatment of children with autism spectrum disorder. Behavior Modification, 41(2), 229–252. https://doi.org/10.1177/0145445516667059
Article PubMed Google Scholar
Linstead, E., German, R., Dixon, D., Granpeesheh, D., Novack, M., & Powell, A. (2015). An application of neural networks to predicting mastery of learning outcomes in the treatment of autism spectrum disorder. In 2015 IEEE 14th international conference on machine learning & applications, December 2018, Miami, FL (pp. 414–418). IEEE. https://doi.org/10.1109/ICMLA.2015.214
Lomas Mevers, J., Muething, C., Call, N. A., Scheithauer, M., & Hewett, S. (2018). A consecutive case series analysis of a behavioral intervention for enuresis in children with developmental disabilities. Developmental Neurorehabilitation, 21(5), 336–344. https://doi.org/10.1080/17518423.2018.1462269.
Article PubMed Google Scholar
MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7(1), 19–40. https://doi.org/10.1037/1082-989x.7.1.19.
Article PubMed Google Scholar
McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282.
Article Google Scholar
Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2018). Deep learning for healthcare: Review, opportunities and challenges. Briefings in Bioinformatics, 19(6), 1236–1246. https://doi.org/10.1093/bib/bbx044.
Article PubMed Google Scholar
Ninci, J., Vannest, K. J., Willson, V., & Zhang, N. (2015). Interrater agreement between visual analysts of single-case data: A meta-analysis. Behavior Modification, 39(4), 510–541. https://doi.org/10.1177/0145445515581327.
Article PubMed Google Scholar
Peng, C. Y. J., Lee, K. L., & Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. Journal of Educational Research, 96(1), 3–14. https://doi.org/10.1080/00220670209598786.
Article Google Scholar
Qian, Y., Zhou, W., Yan, J., Li, W., & Han, L. (2015). Comparing machine learning classifiers for object-based land cover classification using very high resolution imagery. Remote Sensing, 7(1), 153–168. https://doi.org/10.3390/rs70100153.
Article Google Scholar
Rajaguru, H., & Chakravarthy, S. R. S. (2019). Analysis of decision tree and k-nearest neighbor algorithm in the classification of breast cancer. Asian Pacific Journal of Cancer Prevention, 20(12), 3777–3781. https://doi.org/10.31557/APJCP.2019.20.12.3777.
Article PubMed Google Scholar
Raschka, S., & Mirjalili, V. (2019). Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2 (3rd ed.). Birmingham, UK: Packt Publishing.
Rojahn, J., Matson, J. L., Lott, D., Esbensen, A. J., & Smalls, Y. (2001). The Behavior Problems Inventory: An instrument for the assessment of self-injury, stereotyped behavior, and aggression/destruction in individuals with developmental disabilities. Journal of Autism & Developmental Disorders, 31(6), 577–588. https://doi.org/10.1023/a:1013299028321.
Article Google Scholar
Rooker, G. W., Jessel, J., Kurtz, P. F., & Hagopian, L. P. (2013). Functional communication training with and without alternative reinforcement and punishment: An analysis of 58 applications. Journal of Applied Behavior Analysis, 46(4), 708–722. https://doi.org/10.1002/jaba.76.
Article PubMed Google Scholar
Sadiq, S., Castellanos, M., Moffitt, J., Shyu, M., Perry, L., & Messinger, D. (2019). Deep learning based multimedia data mining for autism spectrum disorder (ASD) diagnosis. 2019 international conference on data mining workshops (ICDMW), November 2019, Beijing, China (pp. 847–854). https://doi.org/10.1109/ICDMW.2019.00124.
Sankey, S. S., & Weissfeld, L. A. (1998). A study of the effect of dichotomizing ordinal data upon modeling. Communications in Statistics: Simulation & Computation, 27(4), 871–887. https://doi.org/10.1080/03610919808813515.
Article Google Scholar
Shelleby, E. C., & Shaw, D. S. (2014). Outcomes of parenting interventions for child conduct problems: A review of differential effectiveness. Child Psychiatry & Human Development, 45(5), 628–645. https://doi.org/10.1007/s10578-013-0431-5.
Article Google Scholar
Slocum, T. A., Detrich, R., Wilczynski, S. M., Spencer, T. D., Lewis, T., & Wolfe, K. (2014). The evidence-based practice of applied behavior analysis. The Behavior Analyst, 37(1), 41–56. https://doi.org/10.1007/s40614-014-0005-2.
Article PubMed PubMed Central Google Scholar
Stefanski, L. A., Carroll, R. J., & Ruppert, D. (1986). Optimally hounded score functions for generalized linear models with applications to logistic regression. Biometrika, 73(2), 413–424. https://doi.org/10.1093/biomet/73.2.413.
Article Google Scholar
Turgeon, S., Lanovaz, M. J., & Dufour, M.-M. (2020). Effects of an interactive web training to support parents in reducing challenging behaviors in children with autism. Behavior Modification. Advance online publication. https://doi.org/10.1177/0145445520915671.
Vabalas, A., Gowen, E., Poliakoff, E., & Casson, A. J. (2019). Machine learning algorithm validation with a limited sample size. PloS One, 14(11), e0224365–e0224365. https://doi.org/10.1371/journal.pone.0224365.
Article PubMed PubMed Central Google Scholar
Visalakshi, S., & Radha, V. (2014). A literature review of feature selection techniques and applications: Review of feature selection in data mining. 2014 IEEE international conference on computational intelligence & computing research, December 2014, Coimbatore, India (pp. 1–6). https://doi.org/10.1109/ICCIC.2014.7238499
Wong, T.-T. (2015). Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition, 48(9), 2839–2846. https://doi.org/10.1016/j.patcog.2015.03.009.
Article Google Scholar

Download references

Author information

Authors and Affiliations

École de psychoéducation, Université de Montréal, C.P. 6128, succursale Centre-Ville, Montreal, QC, H3C 3J7, Canada
Stéphanie Turgeon & Marc J. Lanovaz
Centre de recherche de l’Institut universitaire en santé mentale de Montréal, Montreal, QC, Canada
Marc J. Lanovaz

Authors

Stéphanie Turgeon
View author publications
You can also search for this author in PubMed Google Scholar
Marc J. Lanovaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc J. Lanovaz.

Ethics declarations

Funding

This study was funded in part by a Graduate Scholarship from the Social Sciences and Humanities Research Council of Canada (SSHRC) to the first author and a salary award from the Fonds de recherche du Québec - Santé (#269462) to the second author.

Ethical Approval

All procedures performed in this study were in accordance with the ethical standards of the Canadian Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans and with the 1964 Helsinki declaration and its later amendments.

Informed Consent

Parents provided informed consent for them and their child.

Conflict of Interest

The authors declare that they have no conflict of interest.

Availability of Code and Data

The code and data are freely available at https://osf.io/yhk2p/.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article was written in partial fulfillment of the requirements for the PhD degree in Psychoeducation at the Université de Montréal by Stéphanie Turgeon.

Appendix

Free Online Resources

Learn More About Python

Learn Python—https://www.learnpython.org/

Google's Python Class—https://developers.google.com/edu/python

Python for Beginners—https://www.python.org/about/gettingstarted/

Learn More About Machine Learning

An Introduction to Machine Learning—https://www.digitalocean.com/community/tutorials/an-introduction-to-machine-learning

Google’s Introduction to Machine Learning—https://developers.google.com/machine-learning/crash-course/ml-intro

Introduction to Machine Learning for Beginners—https://towardsdatascience.com/introduction-to-machine-learning-for-beginners-eed6024fdb08

Learn More About Machine Learning in Python

Cross Validation in Python: Everything You Need to Know About—https://www.upgrad.com/blog/cross-validation-in-python/

An Implementation and Explanation of the Random Forest in Python—https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76

Implementing SVM and Kernel SVM with Python's Scikit-Learn—https://stackabuse.com/implementing-svm-and-kernel-svm-with-pythons-scikit-learn/

How To Implement Logistic Regression From Scratch in Python—https://machinelearningmastery.com/implement-logistic-regression-stochastic-gradient-descent-scratch-python/

Develop k-Nearest Neighbors in Python From Scratch—https://machinelearningmastery.com/tutorial-to-implement-k-nearest-neighbors-in-python-from-scratch/

Hyperparameter Tuning—https://towardsdatascience.com/hyperparameter-tuning-c5619e7e6624

Sci-Kit Learn: 3.2. Tuning the Hyper-Parameters of an Estimator—https://scikit-learn.org/stable/modules/grid_search.html

Rights and permissions

Reprints and permissions

About this article

Cite this article

Turgeon, S., Lanovaz, M.J. Tutorial: Applying Machine Learning in Behavioral Research. Perspect Behav Sci 43, 697–723 (2020). https://doi.org/10.1007/s40614-020-00270-y

Download citation

Accepted: 09 October 2020
Published: 10 November 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s40614-020-00270-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tutorial: Applying Machine Learning in Behavioral Research

Abstract

Access this article

Similar content being viewed by others

Applications of Artificial Intelligence Methodologies to Behavioral and Social Sciences

Tutorial: Lessons Learned for Behavior Analysts from Data Scientists

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Ethical Approval

Informed Consent

Conflict of Interest

Availability of Code and Data

Additional information

Publisher’s Note

Appendix

Free Online Resources

Learn More About Python

Learn More About Machine Learning

Learn More About Machine Learning in Python

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tutorial: Applying Machine Learning in Behavioral Research

Abstract

Access this article

Similar content being viewed by others

Applications of Artificial Intelligence Methodologies to Behavioral and Social Sciences

Tutorial: Lessons Learned for Behavior Analysts from Data Scientists

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Ethical Approval

Informed Consent

Conflict of Interest

Availability of Code and Data

Additional information

Publisher’s Note

Appendix

Appendix

Free Online Resources

Learn More About Python

Learn More About Machine Learning

Learn More About Machine Learning in Python

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation