俄汉平行语音语料库词性标注一致性检查方法

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:俄汉平行语音语料库;语音识别;词性标注;隐马尔可夫模型;双向循环神经网络;一致性检查中图分类号:TN911.6-34;TP183 文献标识码:A 文章编号:1004-373X(2025)20-0142-05

Method of part-of-speech tagging consistency check in Russian-Chinese parallel speech corpus

CHONG Huifang (National UniversityofDefense Technology,Nanjing21Oo39,China)

Abstract:Russian-Chineseparalel speechcorpushasextensiveapplicationvalueinlanguageresearch,cross-cultural communication,languageengineeringandotherfields.Onthisbasis,amethodof parts-of-speechtagingconsistencycheck in Rusian-Chineseparalelspeechcorpusisproposed toensuretheaccuracyanduniformityofparts-of-speech taging inthe corpus,improvethequalityofparts-of-speech tagging inRusian-Chineseparalelspeechcorpus,improvetheproceing eficiencyof Rusian-Chinesebilingual task,andprovideareliabledata basisforsubsequent languageresearch,machine translationandotherapplications.ThebidirectionalrecurrntneuralnetworkisusedtoidentifytheRusian-Chinesepaalel speechinthecorpus,andconvertthespeechdataintotextdatatoformtheRussian-Chinese paraleltext.Thesetof part-ofspechtaggingofRussan-ChineseparaleltextsisobtainedbymeansofhiddenMarkovmodel,andtheoptialpartof-speech taggingsequenceissolvedbymeansofViterbialgoritm,sastorealizethepartof-speechtagingofRuia-Chiesepaalel texts.Accordingtotheclusteringidea,thebarycentricclusteringalgorithmisusedtojudgewhetherthepart-of-speechtaggingof Rusian-Chineseparaleltexts isonsistent,soastorealiethecosistencycheckofpart-of-spechtaggingofRusian-Chiese paralelspeechcorpus.Theexperimentalresultsshowthattheproposedmethodcanefectivelyrealizetheconsistencycheckof part-of-speech tagging in Russian-Chinese parallel speech corpus,and has good accuracy and reliability.

Keywords:Russian-Chinese paralel speech corpus;speech recognition;part-of-speech tagging; Hidden Markovmodel; bidirectional recurrent neural network;consistency check

0 引言

俄汉平行语料库作为语言资源的重要组成部分,为俄汉语言对比研究、机器翻译、自然语言理解等提供了丰富的数据支持[1-2]。(剩余5948字)

monitor
客服机器人