基于图文对比融合的图像人物情感识别

打开文本图片集
关键词:情感识别;视觉语言模型;情境感知;多模态融合
中图分类号:TP391.41 文献标志码:A 文章编号:1001-3695(2025)07-007-1972-06
doi:10.19734/j.issn.1001-3695.2024.12.0497
Abstract:Context-based recognition of human emotions in images has becomean increasingly popular task in recentyears, withaplication value in manyfields.Most existing methodsonly encode thehuman subjectandthe background separately,extracting isolatedfeaturesforsimple interaction,lackinganefectivefeaturefusionmechanismbetweenthesubjectandthecontextualbackground.Aimedtoaddresstheisueoftheinteractionbetweencomplexbackgroundsandthehumansubject,thispaperproposedanewnetwork forhumanemotionrecognitioninimages basedontext-imagecontrastivefusion.Firstly,itdesigned promptwords toextracttextualdescriptionsoftheemotionalstatebetweenthecontextualbackgroundandthetargethumansubjectbyfullyutilizedtheextensivesocialcontext informationandreasoningcapabilitiesof largevisual-language models.Secondly,it proposedatext-imagecontrastivefusionmodule,which fusedthecroppedtargethumansubjectimagefeatureswithhe textdescriptionfeaturesobtainedbasedonthepromptwordsthrough thismodule.Finaly,thefusionalgorithmintroduceda contrastive lossfunction tounifytherepresentationof imageencodingand text encoding,allowing for more accuratecaptureof efectiveemotionalexpresions during fusion.Experimentalresultsshowthat thenetorkcanlearnmoreefectiveemotioalfeature representations,and the network achieves superior results on the EMOTIC dataset with an mAP of 37.30% . The proposed methodbetterintegratesthefeaturesof thehumansubjectandthebackgroundintheimage,therebyimprovingtheaccuracyof human emotion recognition in images.
Key words:emotion recognition;vision-language model;context awareness;multimodal fusion
0 引言
人物情感识别系统已经应用到医疗健康、智慧教育、人机交互等领域,潜移默化地影响着人们的生活,情感识别在真实场景中面临着复杂多变的情况,如何根据情境线索识别人物情感具有重要意义。(剩余12948字)