融合U-net网络的纯卷积视频预测模型

  • 打印
  • 收藏
收藏成功


打开文本图片集

中图分类号:TP391.41 文献标志码:A

DOI: 10.7652/xjtuxb202506012 文章编号:0253-987X(2025)06-0112-10

A Pure Convolutional Model Fused with U-net Network for Video Prediction

XIE Yumei1 ,CAI Yuanli²,GAO Haiyan³,GUANG Xiangfeng1,TANG Weiqiang4 (1.SchoolofElectronicInformationScience,FujianJiangxiaUniversityFuzhou35olo8,China;2.FacultyofElectrical and Information Engineering,Xi'an jiaotong University,Xi'an 71o49,China;3. Schoolof Electrical Engineering and Automation,Xiamen Universityof Technology, Xiamen,Fujian 361024,China;4.College of Electrical and Information Engineering,Lanzhou University of Technology,Lanzhou 73oo5o3,China)

Abstract: To address the issues of insufficient spatiotemporal feature extraction and inadequate image detail preservation in deep learning-based video prediction,a pure convolutional video prediction model (CUnet) fused with the U-net network,using the Inception unit from the SimVP model,is proposed. CUnet model consists of 3 core modules. Firstly,the Cell module uses 2D convolutional layers to extract spatial features and feeds these features into multiple Inception units to capture spatiotemporal features. Secondly, the DeCell module captures spatiotemporal features through Inception units and performs upsampling operations using 2D deconvolutional layers to restore the original image size. Finally,U-net is introduced as the backbone network to organically integrate the Cell module and the DeCell module,effectively preserving the detailed information of the image and achieving high-quality image reconstruction. The experimental results showed that on the TaxiBJ dataset,compared with the currently bestperforming TAU model, the prediction accuracy of the CUnet model had increased by 5.23% : On the Human3.6M dataset,compared with the currently best-performing FFINet model,the prediction accuracy of the CUnet model had increased by 12.88% .The CUnet model demonstrates exceptional predictive capabilities,offering valuable insights for the application of pure convolutional neural networks in the field of video prediction.

Keywords: deep learning; video prediction; spatiotemporal features; U-net; pure convolutionalneural network

视频预测是通过对历史帧的学习,实现对未来帧的精准预测。(剩余16553字)

monitor