Improving TTS Synthesis for Emotional Expressivity by a Prosodic Parameterization of Affect based on Linguistic Analysis


Mostafa Al Masum Shaikh, Antonio Rui Ferreira Rebordao and Keikichi Hirose, University of Tokyo

Affective Speech Synthesis is quite important for various applications like storytelling, speech based user interfaces, computer games, etc. However, some studies revealed that Text-To-Speech (TTS) systems have tendency for not conveying a suitable emotional expressivity in their outputs. Due to the recent convergence of several analytical studies pertaining to affect and human speech, this problem can now be tackled by a new angle that has at its core an appropriate prosodic parameterization based on an intelligent detection of the affective clues of the input text. This, allied with recent findings on affective speech analysis, allows a suitable assignment of pitch accents, other prosodic parameters and signal properties that adhere to F0 and match the optimal parameterization for the emotion detected in the input text. Such approach allows the input text to be enriched with meta- information that assists efficiently the TTS system. Furthermore, the output of the TTS system is also post- processed in order to enhance its affective content. Several preliminary tests confirm the validity of our approach and encourage us to continue its exploration.