Research Areas


Research subjects

Research in the Speech Disorders & Technology Lab (SDTL) is highly interdisciplinary with a focus on speech motor control and disorders, and relevant assistive technology. Those topics are across several disciplines including speech science, speech disorders, neuroscience, computer science, biomedical engineering and electrical and computer engineering. Specifically, the research topics in the lab are, but not limited to, below:

  • Assistive speech technologies including silent speech recognition/interface, dysarthric speech recognition and analysis, speech synthesis
  • Motor speech disorders (e.g., due to amyotrophic lateral sclerosis or ALS)
  • Computational neuroscience/neurotechnology for speech

Please contact Dr. Wang if you are interested in participating in a tongue motion or any other study in the lab as a subject. Compensation with either cash or credit is provided.


 

Media

UT Dallas News Center

UT Dallas News Center article features the lab's research on silent speech recognition/interface, May, 2018.


Nature

Nature Outlook (Comments and Opinion) article features the ALS diagnosis research in the lab, October, 2017.


UT Dallas news Center

UT Dallas News Center article features the Lab PI (Dr. Jun Wang) and other colleagues for receiving a UT System grant to investigate the brain, September, 2015.


 

Research Demos

Integrating Articulatory Information in Deep-learning Text-to-Speech Synthesis

Articulatory information has been shown to be effective in improving the performance of hidden Markov model (HMM)-based text-to-speech (TTS) synthesis. Recently, deep learning-based TTS has been demonstrated outperforming HMM-based approaches. This works investigated integrating articulatory information in deep learning-based TTS. The integration of articulatory information was achieved in two ways: (1) direct integration, and (2) direct integration plus a forward-mapping network, where the output articulatory features were mapped to acoustic features by an additional DNN. Experimental results show adding articulatory information significantly improved the performance (Picture adapted from Cao et al., 2017).

Articulatory Information in Deep-learning Text-to-Speech Synthesis

 


Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization

The physiological variation of tongue sizes across speakers has been a barrier for developing speaker-independent silent speech recognition from articulatory movements. In this approach, we use Procrustes matching (or bi-dimensional scaling) to translate and rotate the individuals articulatory data, for example, to rotate all individuals articulatory data to make the upper lip (UL) and lower lip (LL) form a vertical line (TT notes tongue tip and TB tongue body back). Experimental results shows the effectiveness of this approach alone or together with other data-driven normalization approaches (Hahm et al., SLPAT 2015; Wang et al., Interspeech 2014, 2015).

Speaker-independent silent speech recognition with across-speaker articulatory normalization

 


 

DJ and his Friend: A Demo of Conversation Using the Real-Time Silent Speech Interface

This demo shows how the silent speech interface is used in a daily conversation. DJ (the user) is using the silent speech interface to communicate with his friend (not shown on the screen). DJ is mouthing (i.e., without producing any voice) and the silent speech interface displays the text on the screen, and produces synthesized sounds (female voice) (Wang et al., SLPAT 2014). (See Demo 2 with Leslie)

 


 

Demo of Algorithm for Word Recognition from Continuous Articulatory Movements

In the demo below, the top panel plots the input (x, y, and z coordinates of sensors attached on the tongue and lips); the bottom panel shows the predicted sounds (time in red) and actual sounds (time in blue). This algorithm conducts segmentation (detection of onsets and offsets of the words) and recognition simultaneously from the continuous tongue and lip movements (Wang et al., Interspeech 2012; SLPAT 2013).

 


 

Demo of Algorithm for Sentence Recognition from Continuous Articulatory Movements

In the demo below, the top panel plots the input (x, y, and z coordinates of sensors attached on the tongue and lips); the bottom panel shows the predicted sounds (time in red) and actual sounds (time in blue). This algorithm conducts segmentation (detection of onsets and offsets of the sentences) and recognition simultaneously from the continuous tongue and lip movements (Wang et al., ICASSP 2012).

 


 

Articulation-to-Speech Synthesis

The participant is mouthing three corner vowels /a/, /i/, and /u/ (without producing any voice); a computer behind him is actually producing the synthesized sounds.

 


 

Quantitative Articulatory Vowel Space

The left part of the graphic is the quantitative articulatory vowel space I derived from more than 1,500 vowel samples of tongue and lip movements collected from ten speakers, which resembles the long-standing descriptive articulatory vowel space (right part). I’m now investigating the scientific and clinical applications of the quantitative articulatory vowel space (Wang et al., Interspeech 2011; JSLHR 2013).

Quantitative articulatory vowel space

 


 

Articulatory Consonant Space

Using the same approach, articulatory consonant spaces were derived using about 2,100 consonant samples of tongue and lip movements collected from ten speakers. See the figure below (2D on the left and 3D on the right). Both consonant spaces are consistent with the descriptive articulatory features that distinguish consonants (particularly place of articulation). Another interesting finding is a third dimension is not necessary for the articulatory vowel space, but very useful for consonant space. I’m now investigating the scientific and clinical applications of the articulatory consonant space as well (Wang et al., JSLHR 2013).

Articulatory consonant space

 


 

Amyotrophic Lateral Sclerosis (ALS) Project

Amyotrophic lateral sclerosis (ALS) project

I’m part of the comprehensive assessment of bulbar dysfunction of ALS study. My own work focuses on the articulatory sub-system of ALS bulbar system (Green et al., ALSFD 2013).

 


 

Opti-speech: A Real-Time, 3D Tongue Motion Feedback System for Speech Training

I was in the early stage of the opti-speech project, which is currently in Clinical Trial Phase II. The goal was to develop a system that provides real-time feedback of tongue motion during speech for training and therapy purposes. In the demo below, the user is pronouncing some basic English sounds while she can see her tongue motion simultaneously (Katz et al., Interspeech 2014).