基于提示学习的模态缺失下多模态联合表征方法研究

  • 打印
  • 收藏
收藏成功


打开文本图片集

中图分类号:TN911.7-34;TP391.4 文献标识码:A 文章编号:1004-373X(2026)07-0145-09

Abstract:Inmultimodallearning,modalitymissingduetoprivacyconstraints,devicefailures,anddatacollctionliitations significantlydegradesmodelperformanceandpracticability.Inviewofthis,thestudyproposesadual-mechanismunifiedframework toenhancerobustnesand generalizationinsuchscenarios.Theframework integratesaqueryprompthierarchicalcolaboration prompting(HCP)mechanismandaconditionalatentionrouting(CAR)mechanism:thefrmerintroducescro-modalqueryvectorsat eachfeatureherarchyleveltoguideinformationfusion,enablinghierarchicalaggregationforjointmodelingoflocalandglobal semantics,therebyallviatingthalignentchaengescausedbymodalityhterogeneity;thelatteremploysaconditionalrouting strategybasedonthepreseneorbsenceofmodalitistoseletivelyctivateatentioncomputationpaths,hichallwsteodelto skipredundantcomputationsassociatedwithmisingmodalities,andefectivelypreventingfeaturecontamination.Meanwhile,the fine-tuning parameter count is constrained to approximately 3% of the total parameters of the pre-trained model,which significantly reducestrainingcost.Extensive experimentsonpublicdatasets including UPMCFood-1O1,HatefulMemes,andMM-IMDb demonstrate that,under various modality mising scenarios,the proposed methodoutperforms existing mainstream approaches byup to approximately 8% in AUROC on Hateful Memes, 3.6% in accuracy on UPMC Food-101,and 3% inMacro ⋅F1 on MM-IMDb.These resultsconfimtheefectivenessandsynergygainof teproposedmecanismsindealingwithmodalitymissing.Theframeworkoffrs ascalable and resource-efficient solution for multimodal applications in constrained or incomplete environments.

Keywords:multimodal learning;modalitymissing;cross-modalalignment; promptlearnng;pre-training;parameterficient fine-tuning

0 引言

人类通过视觉、语言、音频等多模态信息来理解世界,这些信息在本质上是互补的。(剩余16542字)

monitor
客服机器人