Analysis of emotion in speech using perceived and automatically extracted prosodic features


Suk-Myung Lee, Jeung-Yoon Choi, Yonsei University

This study investigates the relationship between emotional states and prosody. A prosody detection algorithm was applied to emotional speech to extract accents and intonational boundaries automatically and these were compared with hand-labeled prosodic units. The measurements used in the detection algorithm are derived from duration, pitch, harmonic structure, spectral tilt, and amplitude. The utterances are part of a Korean emotional database subset in which 10 sentences were spoken by 6 speakers over 4 emotions (neutral, joy, sadness and anger). By comparing the probabilities of occurrence and temporal patterns of events that were detected prosodic events between neutral speech and emotional speech, our experiments find different distributions for each emotion. Overall, joy and anger tended to have more events classified as accents compared to other emotions. Also, sadness had more events corresponding to boundaries. In addition, joy had more events classified as accents at the beginning of utterances, while anger had more accents at the ends of utterances. These results indicate that prosodic characteristics can be useful for classification of emotion and in synthesizing emotional speech.