多模态特征融合的RGB-T目标跟踪网络

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

RGB-T tracking network based on multi-modal feature fusion

JIN Jing，LIUJianqin*，ZHAIFengwen

（School ofElectronic and Information Engineering，Lanzhou Jiaotong University， Lanzhou730070，China） * Corresponding author，E-mail：1970477938@qq. com

Abstract： In recent years，RGB-T tracking methods have been widely used in visual tracking tasks due to the complementarity of visible image and thermal infrared images. However，the existing RGB-T moving target tracking methods have not yet made fulluse of the complementary information between the two modalities，which limits the performance of the tracker. The existing Transformer-based RGB-T tracking algorithms are stillshort of direct interaction between the two modalities，which limits the full use of the original semantic information of RGB and TIR modalities. To solve this problem，the paper proposed a Multi-modal Feature Fusion Tracking Network for RGB-T（MMFFTN）. Firstly，after extracting the preliminary features from the backbone network，the Channel Feature Fusion Module（CFFM） was introduced to realize the direct interaction and fusion of RGB and TIR channel features. Secondly，in order to solve the problem of unsatisfactory fusion effect caused by the diference between RGB and TIR modality， a Cross-Modal Feature Fusion Module（CMFM） was designed and the global features of RGB and TIR were further fused through an adaptive fusion strategy to improve the tracking accuracy. The proposed tracking model was evaluated in detail on three datasets： GTOT，RGBT234 and LasHeR.Experimental results demonstrate that MMFFTN improves the success rate and precision rate by 3.0% and 4.7% ，respectively comparedwith the current advanced Transformer-based tracker ViPT. Compared with the Transformer-based tracker SDSTrack，the success rate and accuracy are improved by 2.4% and 3.3% ， respectively.

Key Words： RGB-T tracking；transformer；channel feature fusion；cross-modal feature fusion

1引言

近年来，运动目标跟踪作为计算机视觉领域的基础任务之一，在机器人[1]、无人机任务、自动驾驶汽车、视频监控2等多个领域展现出了广泛的应用价值。（剩余17900字）

试读结束

购买全文6.00元下一篇多维度聚合Transformer的图像超分辨率重建

光学精密工程

2025年12期

¥60.00/本