基于文本先验的尺度提取跨模态单自绝对深度估计算法

  • 打印
  • 收藏
收藏成功


打开文本图片集

中图分类号:TP391文献标识码:A

Textual Scale Prior Guided Cross-modal Monocular Metric Depth Estimation Algorithm

TAN Li, WANG Wenlong,LIU Zhenyu,YANG Jie

Abstract: Precise depth recovery relying solely on image information is hindered by scale information loss caused by perspective projection. To address this limitation, Textual Prior Guided Scale Extraction Cross-Modal Monocular Metric Depth Estimation (TPSE-MMDE) is proposed,which consists three parts: a scale extraction module,a relative depth estimation module an absolute depth estimation module. And the scale extraction module includes a scene feature extraction submodule,a scale prediction submodule, a shift prediction submodule. Scene-level global features are decoupled from high-dimensional semantics via the scene feature extraction submodule. Subsequently, the scale shift prediction submodules separately predict the corresponding parameters required for absolute depth estimation. Upon obtaining relative depth features from the relative depth estimation model,the metric depth estimation module employs an inverse depth linear transformation to map the relative features into absolute depth with realworld physical units. Experiments are conducted on the indoor dataset NYU Depth V2 the outdoor dataset KITTI. Results indicate that key evaluation metrics are improved compared with the baseline model. Furthermore,the inference frame rate is increased by 107% , the average inference time is reduced by 51.7% ,enhancing the accuracy robustness the model.

Keywords: textual prior; monocular metric depth estimation; cross-modal; deep learning; computer vision

随着深度学习的兴起,单目深度估计任务在机器人导航、自动驾驶和三维重建等任务中得到广泛应用[1-2]。(剩余9040字)

monitor