基于GPU的Winograd 卷积算法并行化

  • 打印
  • 收藏
收藏成功


打开文本图片集

中图分类号:TP183 文献标志码:A 文章编号:1001-3695(2025)08-026-2446-06

doi:10.19734/j. issn.1001-3695.2024.11.0502

GPU-based parallelization of Winograd convolution algorithm

Wang Xin†,Zhen Xueru (KeyLboratodedrotroghstr(tcto)gUsitui)

Abstract:This paper proposedaninovativeWinogradparalelconvolutionalgorithmbasedonGPU toaddress theproblemof excessivecomputationalloadinmodernconvolutionalneuralnetworks.Thealgorithmusedload-balanced task mapping,optimized thedataloadingstrategyto hidelatency,andcombined thedynamic padding methodtofullexplore thesynergybetwen theWinogradconvolution algorithmandtheGPUarchitecture.Experimentalresultsshowthatonmultipleconvolutionallayers of the classic convolutional l network model ResNet,the proposed algorithm outperforms the standard Winograd convolutionalgorithmintheNVIDIAcuDNN8.3.Olibrary.Itachievesaspeed-upratioofupto2.46ontheTuringarchitecture RTX 2080Ti GPUand maintainshigh computational accuracy.Compared with the standard Winograd convolutionalgorithm based on GPU,the algorithm significantly improves the efficiency of convolutional computation.

Key Words:Winograd algorithm;parallel computing;CUDA;convolutional neural network

0 引言

卷积神经网络(convolutionalneuralnetwork,CNN)作为深度学习(deeplearning,DL)中的核心技术,已经在图像分类[1]和目标分割[2]等多个领域得到了广泛应用。(剩余14861字)

目录
monitor