| 513 | 16 | 81 |
| Downloads | Citas | Reads |
In bimodal dimensional emotion recognition, there was a defect that incomplete information could lead to low prediction performance.The decision-level fusion method for feature fusion mostly depended on support vector regression algorithm, but this algorithm could not effectively deal with large samples.To address the above problems, motion capture(Mocap) data was added based on acoustic and text features. A decision-level fusion dimension emotion recognition method based on stochastic gradient descent(SGD) was proposed for the multi-modal data.Combined with multi-task learning mechanism, different deep learning models were used to train the acoustic, text and Mocap features, and multi-modal dimensional emotion recognition was achieved based on the decision-level fusion method.Experimental results on the IEMOCAP data set showed that Mocap data was more helpful to improve the value of the valence dimension.The combination of additional emotion data could help improve the prediction performance of dimensional emotion recognition.The mean value of concordance correlation coefficient obtained by decision-level fusion based on SGD was higher than other regression algorithms.
[1] ZHAO J F,MAO X,CHEN L J.Speech emotion recognition using deep 1D & 2D CNN LSTM networks[J].Biomedical signal processing and control,2019,47(4):312-323.
[2] TRIPATHI S,BEIGI H.Multi-modal emotion recognition on IEMOCAP dataset using deep learning[EB/OL].[2021-03-15].https://www.researchgate.net/publication/324558351.
[3] ATMAJA B T,AKAGI M.Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM[J].Speech communication,2021,126:9-21.
[4] RUSSELL J A,MEHRABIAN A.Evidence for a three-factor theory of emotions[J].Journal of research in personality,1977,11(3):273-294.
[5] 高晓雅,李逸薇,张璐,等.基于多任务学习的正逆向情绪分值回归方法[J].郑州大学学报(理学版),2020,52(1):60-65.GAO X Y,LI Y W,ZHANG L,et al.Emotion regression approach with both forward and reverse values based on multi-task learning[J].Journal of Zhengzhou university (natural science edition),2020,52(1):60-65.
[6] 李霞,卢官明,闫静杰,等.多模态维度情感预测综述[J].自动化学报,2018,44(12):2142-2159.LI X,LU G M,YAN J J,et al.A survey of dimensional emotion prediction by multimodal cues[J].Acta automatica sinica,2018,44(12):2142-2159.
[7] 刘杰,刘欢,李寿山,等.基于双语对抗学习的半监督情感分类[J].郑州大学学报(理学版),2020,52(2):59-63.LIU J,LIU H,LI S S,et al.Semi-supervised sentiment classification with bilingual adversarial learning[J].Journal of Zhengzhou university (natural science edition),2020,52(2):59-63.
[8] YOON S,BYUN S,JUNG K.Multimodal speech emotion recognition using audio and text[C]//Proceedings of the IEEE Spoken Language Technology Workshop.Piscataway:IEEE Press,2018:112-118.
[9] YOON S,BYUN S,DEY S,et al.Speech emotion recognition using multi-hop attention mechanism[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway:IEEE Press,2019:2822-2826.
[10] ZHANG B Q,KHORRAM S,PROVOST E M.Exploiting acoustic and lexical properties of phonemes to recognize valence from speech[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway:IEEE Press,2019:5871-5875.
[11] SCHONEVELD L,OTHMANI A,ABDELKAWY H.Leveraging recent advances in deep learning for audio-visual emotion recognition[J].Pattern recognition letters,2021,146:1-7.
[12] PORIA S,MAJUMDER N,HAZARIKA D,et al.Multimodal sentiment analysis:addressing key issues and setting up the baselines[J].IEEE intelligent systems,2018,33(6):17-25.
[13] TZIRAKIS P,TRIGEORGIS G,NICOLAOU M A,et al.End-to-end multimodal emotion recognition using deep neural networks[J].IEEE journal of selected topics in signal processing,2017,11(8):1301-1309.
[14] PENG Z C,DANG J W,UNOKI M,et al.Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech[J].Neural networks,2021,140:261-273.
[15] ATMAJA B T,AKAGI M.Dimensional speech emotion recognition from speech features and word embeddings by using multitask learning[J].APSIPA transactions on signal and information processing,2020,9:1-12.
[16] BUSSO C,BULUT M,LEE C C,et al.IEMOCAP:interactive emotional dyadic motion capture database[J].Language resources and evaluation,2008,42(4):335-359.
[17] 李海峰,陈婧,马琳,等.维度语音情感识别研究综述[J].软件学报,2020,31(8):2465-2491.LI H F,CHEN J,MA L,et al.Dimensional speech emotion recognition review[J].Journal of software,2020,31(8):2465-2491.
[18] GIANNAKOPOULOS T.pyAudioAnalysis:an open-source python library for audio signal analysis[J].PLoS one,2015,10(12):e0144610.
[19] PENNINGTON J,SOCHER R,MANNING C.Glove:global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2014:1532-1543.
Basic Information:
DOI:10.13705/j.issn.1671-6841.2021299
China Classification Code:TP391.41;TN912.34
Citation Information:
[1]HU Xinrong,CHEN Zhiheng,LIU Junping ,et al.Decision-level Fusion Dimension Emotion Recognition Method Based on SGD[J].Journal of Zhengzhou University(Natural Science Edition),2022,54(04):49-54.DOI:10.13705/j.issn.1671-6841.2021299.
Fund Information:
国家自然科学基金项目(61103085); 湖北省高等学校优秀中青年科技创新团队计划项目(T201807); 湖北省高校知识产权推进工程项目(GXYS2018009); 湖北省教育厅科学研究计划重点项目(D20191708)
2021-07-14
2021
2022-02-25
2022-04-02
2022
1
2022-04-06
2022-04-06
2022-04-06