Pitches of concurrent vowels


When two vowels on different fundamental frequencies (F0's) are presented concurrently, listeners can report the phonemic identities of both vowels with an accuracy significantly better than chance, and they can often determine which vowel has the higher pitch. By contrast, when the difference in F0 is small or absent, these 'double vowels' evoke a single pitch and are identified less accurately. Similarly, when the duration of the vowels is brief, listeners are more likely to misidentify the vowels and/or rank their pitches incorrectly. The present study used a matching paradigm to obtain judgments of the pitches perceived in double vowels. Four experienced listeners adjusted the F0 of a tone complex with 25 harmonics of equal amplitude to assign two pitch matches to double vowels whose durations were either 200 or 50 ms. The reference stimuli were pairwise combinations of the vowels / i /, / a /, / u /, / æ /, and / 3 /. One F0 was 100 Hz; the other was 0, 0.25, 0.5, 1, 2, or 4 semitones higher.

Aggregate histograms of matched F0 were created for each double vowel by pooling data across subjects and trials. When the F0 difference was small (one-half semitone or less), the histograms usually contained a single mode in the frequency region spanned by the two F0's, the majority of listeners' matches clustered around the mean of the two F0's, and they generally reported only one pitch. As the F0 difference increased from 1 to 4 semitones, the histograms more often displayed two distinct modes close to the F0's of the constituent vowels. With larger F0 differences, 50-ms vowels generated more variable matching histograms than 200-ms stimuli. With 200-ms stimuli, listeners were more likely to report two pitches and their matches were more often clustered around the two F0's. However, consistent evidence of bimodality emerged only for 200-ms stimuli with the largest F0 separation of four semitones. Listeners generally matched the F0 of one vowel more precisely and consistently than the other, and they generally matched the F0's of 200-ms stimuli more accurately than the 50-ms stimuli. These results indicate that the pitches of concurrent vowels emerge less clearly when the stimuli are brief.

The stimuli were analyzed by a model of pitch perception (Meddis & Hewitt, J. Acoust. Soc. Am. 89, 2866-2882, 1991) that applies a 64-channel auditory filterbank, compresses the waveform in each channel using a model of haircell transduction, applies a form of autocorrelation analysis, and performs cross-channel summation to generate a 'pooled autocorrelogram' that can be compared directly with the matching histograms. The model accounts for several aspects of the perceptual data, including the location and number of prominent modes in many of the matching histograms.

Assmann, P. F. and Paschall, D. D. (1993). Perception of concurrent vowels: Pitch judgments. Abstracts of the Sixteenth Midwinter Research Meeting of the Association for Research in Otolaryngology, p. 258 (poster presentation).

Assmann, P.F. and Paschall, D.D. (1994). Pitches of concurrent vowels. Journal of the Acoustical Society of America 95, 4pPP16 (A). (poster presentation).

Assmann, P.F. and Paschall, D.D. (1998). Pitches of concurrent vowels. J. Acoust. Soc. Am. 103, 1150-1160.

Paschall, D.D. and Assmann, P.F. (1998). Ranking the pitches of concurrent vowels. Proc. 16th Int. Cong. Acoust. 3:  2009-2010.
 

Other projects