“Interactive Machine Learning in High-Expertise Domains”
National Institute of Health
Machine learning and data mining methods have emerged as cornerstone technologies for transforming the deluge of data generated by modern society into actionable intelligence. For applications ranging from business intelligence to public policy to clinical guidelines, the overarching goal of “big data” analytics is to identify, analyze, and summarize the available evidence to support decision makers. While ubiquitous computing has greatly simplified data collection, successful deployment of machine learning techniques is also generally predicated on obtaining sufficient quantities of human-supplied annotations. Accordingly, judicious use of human effort in these settings is crucial to building high-performance systems in a cost-effective manner.
In this talk, I will describe machine learning methods for reducing annotation costs and improving system performance via interactive protocols. Specifically, I will present models capable of exploiting domain-expert knowledge through the use of labeled features -- both within the active learning framework to explicitly reduce the need for labeled data during training and the more general setting of improving classifier performance in high-expertise domains. Furthermore, I will contextualize this work within the scientific systematic review process, highlighting the importance of interactive learning protocols in a particular scenario where information must be reliably extracted from multiple information sources, synthesized into a cohesive report, and updated as new evidence is made available in the scientific literature. I will demonstrate that we can partially automate many of the aspects of this important task, thus reducing the costs incurred when interacting with highly-trained experts.
Kevin Small received his Ph.D. degree in computer science from the University of Illinois at Urbana-Champaign (Cognitive Computation Group) in 2009. From 2009 to 2012, he held positions as a postdoctoral researcher at Tufts University (Machine Learning Group) and as a research scientist at Tufts Medical Center (Center for Evidence-based Medicine). He is presently developing methods for using data-driven techniques to inform scientific policy within the Division of Program Coordination, Planning, and Strategic Initiatives at the National Institutes of Health. Kevin’s primary research interests are in the areas of machine learning, data mining, natural language processing, and artificial intelligence. Specifically, his research results concern using interactive learning protocols to improve the performance of machine learning algorithms while reducing sample complexity.