Frequency-shifted voice demo

F0 scale factor,  FF scale factor

F0 x FF scatterplot


The graph is a scatterplot of the geometric mean of the frequencies of the formants (F1, F2, F3) as a function of the voice fundamental frequency (F0) for a sample of 3000+ vowels recorded from 10 adult males, 10 adult females, and 3 groups of children aged 3, 5, and 7 years (Assmann & Katz, 2000; 2005). The graph image is divided into a 25 x 25 grid, yielding 625 combinations of F0 and formant frequency (FF). Clicking on a point in the grid will play a synthesized version of a sentence originally spoken by an adult male, but frequency-shifted to have the selected F0 and FF value. Each point sounds like a slightly different voice. Voices selected from the left side of the graph are lower in pitch than those on the right. Voices selected from the bottom of the graph appear to come from larger individuals (i.e., people with larger vocal tracts) than those selected near the top. Frequency-shifted versions that have F0 and FF combinations overlapping or near the acoustic measurements sound more natural than combinations that are not found in natural voices. Analysis-resynthesis was performed using the STRAIGHT vocoder (Kawahara, 1997, 1999). For further details see Assmann et al. (2002) and Assmann and Nearey (2003).

References

Assmann, P.F. and Katz, W.F. (2000). Time-varying spectral change in the vowels of children and adults. Journal of the Acoustical Society of America 108(4): 1856-1866.

Assmann, P.F. and Katz, W.F. (2005). Synthesis fidelity and vowel identification. Journal of the Acoustical Society of America 117(2), 886-895.

Assmann, P.F., Nearey, T.M., and Scott, J.M. (2002). Modeling the perception of frequency-shifted vowels. Proceedings of the 7th International Conference on Spoken Language Processing, pp. 425-428.

Assmann, P.F. and Nearey, T.M., (2003). Frequency shifts and vowel identification. Proceedings of the 15th International Congress of Phonetic Sciences, pp. 1397-1400, Barcelona, Spain.

Kawahara, H. (1997). Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited. Proceedings of the ICASSP, pp. 1303-1306.

Kawahara, H. Masuda-Katsuse, I. de Cheveigné, A. (1999). Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction. Speech Communication 27, 187-207.