基于多尺度注意力与动态软掩膜的无监督图像拼接方法

打开文本图片集
关键词:图像拼接;深度学习;图像配准;图像融合;无监督
中图分类号:TP311 文献标志码:A 文章编号:1001-3695(2025)12-033-3785-08
doi:10.19734/j.issn.1001-3695.2025.04.0102
Unsupervised image stitching approach based on multi-scale attention and dynamic soft mask
Zhao Wenlongl,Wang Liewei1,2,Wang Junhua²,Yang Jixiang1 (1.SchoolofComputerScience&Enginering,AnuiUniersityofScience&Technology,HuainanAnhui232l,China;2.Nanjing Paiguang Intelligent Perception Information Technology Co.,Ltd.,Nanjing 21oo32,China)
Abstract:Toaddresstheissesofartifactsanddistortioninexistigimage stitchingtechnologiescausedbyrelianceonfeature matching,this paper proposedanunsupervised image siching method.Themethodconsistedof twostages:unsupervisedimageregistrationandunsupervisedimagefusion.Intheregistrationstage,thepaper improvedresidual blocksbyfusing multiscale features andthe eficientlocalatention mechanism(ELA)toperformcross-scale featurefusionanddynamic feature enhancement.Meanwhile,itconstructedaninteractive perceptionenhancementmoduleby incorporatingcros-attentio topromotethedeepinteractionandfusionoffeatureinformationbetweenimagepairs.Additionaly,thepaperproposedamulti-scale progresive transformationregistrationmodule.This moduleadoptedahierarchicaloptimizationstrategytograduallycalibrate theimagetransformationrelationship,whichsignificantlyimprovedthealignmentaccuracy.Inthefusionstage,thepaperintroduceda dynamic soft mask prediction mechanism.Based on pixel-level continuousweight learning,this mechanism achieved smooth transitions anddetail preservation inoverlapping regions.To support unsupervised training,thepaperconstructed areal-world image stitchingdatasetcoveringcomplex lightingandmulti-parallax scenes.Experiments showthatcompared with existing traditionaland dep learning stitching algorithms,themethodachieves PSNR and SSIMvaluesof 27.31 andO.84,respectively.Visuall,it provides better stitching effctsand stronger anti-interferencecapabilities.
Key Words: image stitching;deep learning;image registration; image fusion;unsupervised
0 引言
图像拼接作为计算机视觉领域的核心研究之一,旨在将从不同位置、角度捕获的多张具有一定重叠区域的图像拼接成一张具有更宽视野的全景图像[1]。(剩余17870字)