基于知识蒸馏的Transformer视觉跟踪器

  • 打印
  • 收藏
收藏成功


打开文本图片集

A Transformer-based visual tracker via knowledge distillation

LI Na*,LIU Mengqiao,PAN Jinting,HUANG Kai, JIA Xingxuan (School of Communication and Information Engineering, Xi' an University of Posts and Telecommunications,Xi'an 7l0l2l,China) *Correspondingauthor, E -mail:linall4@xupt.edu. cn

Abstract:To achieve high-precision and real-time tracking with limited computing resources,a transformer-based visual tracker via knowledge distilation was proposed. By introducing the image dynamic correction module,our tracker fused the search image of the current frame with the predicted image based on optical flow,which could efectivelydeal with challenges such as fast motion and motion blur.In order to reduce model complexity,the knowledge distillation learning strategy was adopted to compress the model. By introducing homoscedastic uncertainty into the lossfunction,loss weights of different subtasks could be learned through our network,thereby avoiding the cumbersome and dificult manual parameter tuning. Additionally,during training for the student network,a random blurring strategy was employed to enhance model robustness.Two tracking frameworks with different complexities,named KTransT-T and KTransT,were proposed and compared with 12 algorithms on 5 public datasets. Experimental results show that KTransT-T has significant advantages in precision and success rate,while KTransT has lower model complexity and competitive tracking performance. KTransT runs at a speed of up to 158 frames per second,which can meet the requirements of real-time tracking.

Key words: computer vision;object tracking;transformer;knowledge distilation;homoscedastic uncertainty

1引言

视觉目标跟踪是指给定感兴趣目标在视频第一帧中的初始信息,并在后续帧中预测并更新目标的位置和状态。(剩余19147字)

试读结束

monitor
客服机器人