基于信息互补与交叉注意力的跨模态检索方法

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:信息互补;交叉注意力;图卷积网络;跨模态检索

中图分类号:TP391 文献标志码:A 文章编号:1001-3695(2025)07-015-2032-07

doi:10.19734/j.issn.1001-3695.2025.01.0003

Abstract:WiththerapidgrowthofmultimodaldataontheInternet,cross-modalretrievaltechnologyhasatractedwidespread atention.However,some multimodaldataoftenlacksemanticinformation,whichleadstotheinabilityof modelstoaccurately extracttheinherentsemanticfeatures.Aditionally,somemultimodaldatacontainredundantinformationunrelatedtosemantics,whichinterfereswiththemodelextractionofkeyinformation.Toaddresstis,thispaperproposedacrossmodalretrieval methodbasedoninformationcomplementarityandcross-atention(ICCA).The methodusedaGCN tomodeltherelationships betweenmulti-labelsanddata,supplementing the mising semantic informationinmultimodaldataandthe missing sampledetailinformationinmulti-bels.Moreover,acrossattntionsubmoduleusedulti-labelinformationtoflerouttedudant semantic-irelevantdata.Toachievebetter matchingofsemanticallysimilarimagesand textsinthecommonrepresentation space,this paperproposed asemantic matching lossThislossintegrated multi-labelembeddings intothe image-text matching process,further enhancingthesemanticqualityof thecommonrepresentation.Experimentalresultsonthree widelyuseddatasets NUS-WIDE,MIRFlickr-25K,and MS-COCO demonstrate that ICCA achieves mAPvaluesof0.808,0.859,and0.837, respectively, significantly outperforming existing methods.

KeyWords:informationcomplementarity;cross-attention;graph convolutional network(GCN);cros-modalretrieval

0 引言

近年来,随着互联网技术的飞速发展,视频、图像、文本等多媒体数据呈现出急剧增长的趋势。(剩余18541字)

目录
monitor