采用注意力与模板在线更新的可见光-红外目标跟踪网络

打开文本图片集
中图分类号:TP391.4文献标志码:ADOI:10.7652/xjtuxb202508018 文章编号:0253-987X(2025)08-0187-12
RGB-Thermal Object Tracking Network Based on Attention Mechanism and Online Template Update
HAN Xiangdong1,ZHONG Ao²,LIU Chongao²,SUN Yanxin1 , ZHANG Xiangyong',XU Linzhi² 1. Intelligent Technology Center,China Ordnance Industry Testing and Research Institute,Xi'an ,China; 2.School of Electronic Engineering,Xidian University,Xi’an 7loo71,China)
Abstract: To address the insuficient feature interaction between RGB and thermal modalities and the inadequate modeling of dynamic target variations in current RGB-thermal object tracking algorithms,a dual-modal tracking network based on attention and online template updates is proposed, incorporating a convolutional masked autoencoder model. Using the convolutional masked autoencoder as the backbone network,the model extracts RGB and thermal features through a dual-embedding layer with shared-weight backbone architecture,deeply exploring the intrinsic relationships between RGB and thermal data. To enhance the correlation between the template and search images,a channel-spatial self-attntion mechanism is introduced to strengthen their interaction and extract discriminative heterogeneous complementary features across modalities. An online template update module is proposed,which dynamically updates the template and incorporates a template scoring head. By leveraging a confidence-based fusion mechanism,it balances the stability of the initial template and the adaptability of the online template,mitigating model drift caused by target variations over time. Experimental results demonstrate that the proposed algorithm achieves precision and success rates of 93.3%/75.6% and 87.2%/63.8% on the GTOT and RGBT234 datasets,respectively,enabling accurate tracking under dynamic target conditions. Visualization analysis shows that the algorithm adaptively complements dual-modal heatmaps and maintains precise target localization even when one modality fails.
Keywords: visual object tracking;RGB-thermal; self-attention mechanism;online template update; convolutional masked autoencoder
视觉目标跟踪在视频序列中根据首帧目标位置准确预测目标在后续帧中的位置,在视频监控、自动驾驶以及军事侦察等方面都有广泛应用[1-3]。(剩余20902字)