Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech

Registration Now Open for

ADOS-2 Clinical Training Workshop

November 25-27. Click here for info!

TitleSpeaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech
Publication TypeJournal Article
Year of Publication2015
AuthorsCao, H, Verma, R, Nenkova, A
JournalComput Speech Lang
Volume28
Issue1
Pagination186-202
Date Published2015 Jan
ISSN0885-2308
Abstract

We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.

DOI10.1016/j.csl.2014.01.003
Alternate JournalComput Speech Lang
PubMed ID25422534
PubMed Central IDPMC4240517
Grant ListR01 MH073174 / MH / NIMH NIH HHS / United States