开放世界多维度特征融合场景图生成算法研究

打开文本图片集
中图分类号:TP391 文献标识码:A
Open-world Multidimensional Feature Fusion Scene Graph Generation
GU Feifan1,ZHOU Mengmeng²,SONG Shimiao1,GE Jiashang 1 ,YANG Jie' (1.College of Mechanical and Electrical Engineering,Qingdao University,Qingdao 266o71,China; 2.Qingdao QCIT Technology Co.,Ltd.,Qingdao 266100,China)
Abstract: The open-world scene graph generation task has difficulty in detecting unknown objects and their relationships. To address this issue,a relation-reasoning model based on multidimensional feature fusion (MDFF) is proposed. The proposed model is combined with an open-world object detector to form a two-stage open-world scene graph generation algorithm. First,the pretrained open-world object detector identifies objects in the input images. The MDFF model then performs relationship inference based on the detection results. Comparative experiments are conducted on the VG -150 dataset using traditional methods and the MDFF model. The experimental results indicate that the MDFF model achieves 7% improvement in recall rate for predicate classification tasks. Moreover,the performance improves by 3% in open-world scene graph generation and zero-shot inference tasks. Furthermore,ablation studies have confirmed the effectiveness of different feature dimensions on model performance improvement.
Keywords: scene graph generation; feature fusion; object detection; deep learning
在开放世界环境中,场景图生成任务复杂,特别是在未知场景和未见物体时,生成准确且具有高的泛化能力的场景图成为研究的核心问题[1]。(剩余8352字)