Frequency-shifted voice demo

F0 scale factor,  FF scale factor

F0 shift factor=0.35, FF shift factor=0.69
F0 shift factor=0.39, FF shift factor=0.72 F0 shift factor=0.42, FF shift factor=0.74 F0 shift factor=0.46, FF shift factor=0.76 F0 shift factor=0.50, FF shift factor=0.78
F0 shift factor=0.55, FF shift factor=0.81 F0 shift factor=0.59, FF shift factor=0.83 F0 shift factor=0.65, FF shift factor=0.86 F0 shift factor=0.71, FF shift factor=0.89 F0 shift factor=0.77, FF shift factor=0.91
F0 shift factor=0.84, FF shift factor=0.94 F0 shift factor=0.92, FF shift factor=0.97 F0 shift factor=1.00, FF shift factor=1.00 F0 shift factor=1.09, FF shift factor=1.03 F0 shift factor=1.19, FF shift factor=1.06
F0 shift factor=1.30, FF shift factor=1.10 F0 shift factor=1.41, FF shift factor=1.13 F0 shift factor=1.54, FF shift factor=1.16 F0 shift factor=1.68, FF shift factor=1.20 F0 shift factor=1.83, FF shift factor=1.24
F0 shift factor=2.00, FF shift factor=1.27 F0 shift factor=2.18, FF shift factor=1.31 F0 shift factor=2.38, FF shift factor=1.35 F0 shift factor=2.59, FF shift factor=1.40 F0 shift factor=2.83, FF shift factor=1.44



Click the button in the middle to hear the unshifted original voice. The other buttons represent frequency-shifted versions of this voice, ordered consecutively in rows. The voice in the upper left corner of the grid simulates a large vocal tract and low fundamentals; the voice in the lower right corner simulates a small vocal tract with a high fundamental. Position the mouse over each button to see the fundamental frequency (F0) shift factor and the spectrum envelope (or formant frequency, FF) shift factor, which scales the length of the simulated talker's vocal tract. The combinations of F0 and FF scale factors were selected based on a linear regression of these properties measured in natural vowels. Analysis-resynthesis performed using the STRAIGHT vocoder (Kawahara, 1997, 1999).


Kawahara, H. (1997). Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited. Proceedings of the ICASSP, pp. 1303-1306.

Kawahara, H. Masuda-Katsuse, I. de Cheveigné, A. (1999). Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction. Speech Communication 27, 187-207.