基于提示学习的模态缺失下多模态联合表征方法研究

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

中图分类号：TN911.7-34；TP391.4 文献标识码：A 文章编号：1004-373X（2026）07-0145-09

Abstract：Inmultimodallearning，modalitymissingduetoprivacyconstraints，devicefailures，anddatacollctionliitations significantlydegradesmodelperformanceandpracticability.Inviewofthis，thestudyproposesadual-mechanismunifiedframework toenhancerobustnesand generalizationinsuchscenarios.Theframework integratesaqueryprompthierarchicalcolaboration prompting（HCP）mechanismandaconditionalatentionrouting（CAR）mechanism：thefrmerintroducescro-modalqueryvectorsat eachfeatureherarchyleveltoguideinformationfusion，enablinghierarchicalaggregationforjointmodelingoflocalandglobal semantics，therebyallviatingthalignentchaengescausedbymodalityhterogeneity;thelatteremploysaconditionalrouting strategybasedonthepreseneorbsenceofmodalitistoseletivelyctivateatentioncomputationpaths，hichallwsteodelto skipredundantcomputationsassociatedwithmisingmodalities，andefectivelypreventingfeaturecontamination.Meanwhile，the fine-tuning parameter count is constrained to approximately 3% of the total parameters of the pre-trained model，which significantly reducestrainingcost.Extensive experimentsonpublicdatasets including UPMCFood-1O1，HatefulMemes，andMM-IMDb demonstrate that，under various modality mising scenarios，the proposed methodoutperforms existing mainstream approaches byup to approximately 8% in AUROC on Hateful Memes， 3.6% in accuracy on UPMC Food-101，and 3% inMacro ⋅F1 on MM-IMDb.These resultsconfimtheefectivenessandsynergygainof teproposedmecanismsindealingwithmodalitymissing.Theframeworkoffrs ascalable and resource-efficient solution for multimodal applications in constrained or incomplete environments.

Keywords：multimodal learning;modalitymissing;cross-modalalignment; promptlearnng;pre-training;parameterficient fine-tuning

0 引言

人类通过视觉、语言、音频等多模态信息来理解世界，这些信息在本质上是互补的。（剩余16542字）

试读结束

购买全文6.00元下一篇基于超声波技术的矿浆粒度智能化监测系统

现代电子技术

2026年07期

¥12.00/本