基于GPU的Winograd 卷积算法并行化

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

中图分类号：TP183 文献标志码：A 文章编号：1001-3695（2025）08-026-2446-06

doi：10.19734/j. issn.1001-3695.2024.11.0502

GPU-based parallelization of Winograd convolution algorithm

Wang Xin†，Zhen Xueru （KeyLboratodedrotroghstr（tcto）gUsitui）

Abstract：This paper proposedaninovativeWinogradparalelconvolutionalgorithmbasedonGPU toaddress theproblemof excessivecomputationalloadinmodernconvolutionalneuralnetworks.Thealgorithmusedload-balanced task mapping，optimized thedataloadingstrategyto hidelatency，andcombined thedynamic padding methodtofullexplore thesynergybetwen theWinogradconvolution algorithmandtheGPUarchitecture.Experimentalresultsshowthatonmultipleconvolutionallayers of the classic convolutional l network model ResNet，the proposed algorithm outperforms the standard Winograd convolutionalgorithmintheNVIDIAcuDNN8.3.Olibrary.Itachievesaspeed-upratioofupto2.46ontheTuringarchitecture RTX 2080Ti GPUand maintainshigh computational accuracy.Compared with the standard Winograd convolutionalgorithm based on GPU，the algorithm significantly improves the efficiency of convolutional computation.

Key Words：Winograd algorithm；parallel computing；CUDA；convolutional neural network

0 引言

卷积神经网络（convolutionalneuralnetwork，CNN）作为深度学习（deeplearning，DL）中的核心技术，已经在图像分类[1]和目标分割[2]等多个领域得到了广泛应用。（剩余14861字）

试读结束

购买全文6.00元下一篇基于强化学习的灾区应急无人机网络服务公平性最大化方案

计算机应用研究

2025年08期

¥12.00/本