基于独立循环神经网络的分层视频摘要算法

打开文本图片集
Abstract: To address the limitations of existing video summarization algorithms in preserving activity integrity due to neglected shot-level information and the ineffectivenessof traditional RNNs and LSTMs in capturing long-range dependencies for lengthy videos,this paper proposes a hierarchical video summarization algorithm based on independent recurrnt neural networks (HIRVS)by leveraging the inherent hierarchical structure of video sequences. Specifically,HIRVS is divided into three components: (1) Visual features for each shot are generated by the IndRNN,where the final hidden state represents a temporally weighted aggregation of all frame features within that shot;(2) Shot-level feature sequencesare modeled for temporal relationships using a bidirectional IndRNN,capturing long-range dependencies between shots; (3)A self-attention video encoder is introduced to extract global dependencies across the entire video.
Key shots are then selected based on predicted importance scores to generate the video summary. Experiments are conducted on two public datasets,SumMe and TvSum. On SumMe,an F-score of 51.0% is achieved,representing a 1.2% improvement over VOGNet. On TvSum,an F-score of 61. 3% is obtained, surpassing the current state-of-the-art method VJMHT by 0.3% . Experimental results validate the effectiveness of HIRVS for video summarization tasks, demonstrating improved summary generation efficiency.
Keywords:video summarization;independent recurrent neural networks; hierarchicalstructure;self-attention networks
1引言
随着5G时代的到来和智能设备的广泛普及,视频数据的数量呈现出前所未有的爆炸式增长[1]。(剩余13828字)