结合多尺度与多层级聚合的卷轴画图像描述模型

  • 打印
  • 收藏
收藏成功


打开文本图片集

DOI:10.16652/j.issn.1004-373x.2025.17.007引用格式:,,.结合多尺度与多层级聚合的卷轴画图像描述模型[J].现代电子技术,2025,48(17):41-47.

关键词:图像描述;卷轴画图像;多尺度特征;非对称卷积;多层级聚合解码;Transformer中图分类号:TN919-34;TP391 文献标识码:A 文章编号:1004-373X(2025)17-0041-07

Scroll painting image caption model combining multi-scale and multi-level aggregation

YUE Chaoyang1,HU Wenjin1², ZHANG Fujun1 (1.KeyLabotofigsticduraloutinstrutiostnUesityou; 2.SchoolofMathematicsandComputerScience,NorthwestMinzu University,Lanzhou73oo3o,China)

Abstract:Thecrollpaintingimageshavediferentsizesandacertainspatialdistributioncharacteristics,andthetrasforer based encodinglayerisprone tolosingkeyimage information,soascroll painting imagecaptionmodel MMAcombiningmultiscaleandmulti-levelaggregationisproposed.Inthestageofencoding,byintroducingasymmetricconvolutionandmulti-scale featuremodules,theabilityof theconvolutionlayertoobtainspatialinformationcanbeimprovedefectivelyandtheglobaland localmulti-scalecontextualinformationofthescrollpaintingimagecanbeintegratedtoobtainafeaturerepresentationwithrich semanticinforatio.Amulti-evelaggationnetworkisdsigndintestageofeoding.Byggatingtefeatursofit codinglayers,thesemanticinformationofthehigh-levelcodinglayerandthecontentinformationofthelow-levelcodinglayeare utilizedeffctivelytherebyaleviatingtheinformationlossefetivelyExperimentalresultsshowthatthemodelachievsood resultsonthescrollpintingdataset,improving BLEU-4andMETEORby26.7%and0.9%,respectively,incomparisonwiththe NIC (neural image caption) model,and generates more accurate description sentences.

Keywords:imagecaption;scroll paintingimage;multi-scale feature;asymmetricconvolution;multi-levelaggregation decoding; Transformer

0 引言

卷轴画作为我国独特的绘画艺术形式,通过对卷轴画进行图像描述能够帮助人们赏析和理解卷轴画,促进文化交流,同时为卷轴画的研究和数字化保护提供技术支持。(剩余10500字)

monitor