Silent Speech Recognition

  • A participant using the silent speech interface
  • Design of the silent speech interface
  • Researchers working on movement-driven synthesizer to help patients who have trouble with speech

Project Overview

People with laryngectomy (surgical removal of the larynx due to cancer) and speech motor impairment (e.g., due to neurological diseases) struggle to communicate with others because of their impaired oral communication. Limited treatment options are available to improve the quality of their speech.

The purpose of the Silent Speech Recognition project is to develop an articulatory movement-driven speech synthesizer that can enable those patients to speak using their tongue and lips (rather than by typing on an AAC device, which is limited by the slow manual input). The current focus of this project is to develop algorithms that can convert articulatory movement time-series data to text with high speed and accuracy. In the future, the algorithms will be integrated with a portable data collection device and text-to-speech synthesizer.

Stage of Development

This project is currently in the pilot stage of development. Nine core technical papers and three relevant papers have been published, based on data collected from healthy speakers. At the UT Dallas Communication Technology Center, data collection from speakers with impaired motor speech for the next stage of development will begin in the near future. This project is partly supported by a NIH R01 grant. A R03 grant application has been recently submitted to the National Institute on Deafness and Other Communication Disorders to fully support this project.

Development Team

The Silent Speech Recognition project is a collaboration among the Callier Center for Communication Disorders and Department of Computer Science, UT Dallas; Department of Computer Science and Engineering, University of Nebraska-Lincoln; and Massachusetts General Hospital in Boston. The team of researchers includes:

Jun Wang, PhD

Wang is an Assistant Professor of Biomedical Engineering and Communication Sciences and Disorders at UT Dallas. He earned his PhD degree in computer science with a specialty on speech production from the University of Nebraska-Lincoln in December 2011. He was a post-doctoral research associate at the Neurogenic Communication Disorders Consortium, University of Nebraska-Lincoln and University of Nebraska Medical Center, before he joined UT Dallas as a Research Scientist in Fall 2012. His research focuses on silent speech recognition/interface, normal and disordered speech production/recognition, and articulation-to-speech synthesis.

Jordan R. Green, PhD

Green is a professor in the Department of Communication Sciences and Disorders, Institute of Health Professions, Massachusetts General Hospital (MGH), Boston, MA. His research focuses on normal and disordered speech production, including motor speech impairment (e.g., due to amyotrophic lateral sclerosis), speech motor learning, speech development, and early chewing development.

Ashok Samal, PhD

Samal is a professor in the Department of Computer Science and Engineering, University of Nebraska-Lincoln. His research focuses on data mining, spatial data analysis, image analysis. He has been working on applying machine learning and data-mining techniques to different fields, including speech communication, geographic information system, crime analysis and biosciences.

Balakrishnan Prabhakaran,

Prabhakaran is a professor of computer science at UT Dallas, specializing in multimedia systems. He is focusing on health-care data and video analytics; streaming of 3D video, animations and deformable 3D models; content protection and authentication of multimedia objects; and collaborative virtual environments. He previously has worked on multimedia databases, authoring and presentation, resource management and scalable web-based multimedia presentation servers.