融合CTC/Attention联合解码的拉萨方言抗噪语音识别研究

打开文本图片集
中图分类号:TP319 文献标识码:A
Abstract:LhasaDialectisanimportantbranchofTibetanlanguage,andconductingresearchonspeechrecogntionofLhasa Dialectholdssignificanttheoreticalandpracticalvalue.AddresingtheissuesofcarceresearchonnoisyLhasaDialectspeechand highWordErorRate(WER),thisstudyintegratesspeehenhancementalgorithmsandspeechrecognitionmodelsintoanoiserobustspechrecognitionsystemforLhasaDialectMeanwhile,itimprovestheConformerspeechecognitionmodelbyintroducing theCTC/tentionjointdecodingmechanism,formingtheConformer-CTC/Atentionspeehrecognitionmodeltofurtherreduce WER.ExperimentalresultsshowthattheconstructedConformer-CTC/AtentionspechrecognitionmodelachievesaWERof 25.46% onthepublicTibetanlanguagespeechdataset,whichisgenerallylowerthanthatof mainstreammodels.Furthermore,the WERof the constructed noise-robust speech recognition model forLhasa Dialect on noisyspeech is reduced by 8.9% compared with that of non-noise-robust speech recognition models.
Keywords:Lhasa Dialect;Speech Recognition; Conformer;Speech Enhancement
0引言
方言不仅是藏语的重要分支,也是广播电视领域的通用语言,在藏语文化传播与日常语音交互中占据核心地位。(剩余8162字)