基于多特征CNN与DTW的藏语发音相似度计算方法
            
                        
                        
            	
            
                 
                
                
            
            
                
                    
                    打开文本图片集
            
            中图分类号:TP311.1 文献标志码:A 文章编号:2095-2945(2025)29-0010-0
Abstract:Speechsimilarityevaluation technologyhasbecomea hot spot inspeech information processing research today. Thereisarelativelylackofresearchonspeech similarityevaluationofTibetanandotherminoritylanguagesinChina.Basedon this,thispaperdiscusesusingaone-dimensionalconvolutionalneuralnetwork(1D-CNN)hybridmodeloffeaturecompresion anddynamictimewarping(DTW)alignmenttodesignamulti-featurefusionframework,andusescontrastlossandanti-noise trainingtosolvetheproblemsoffeatureredundancy,temporaldeformationrobustnessandlowresourcegeneralizationinTibetan pronunciationsimilaritycalculation,andfinallyrealizetheevaluationofpronunciationsimilarityofcommonlyusedTibetanwords. Experimentsonaself-builtdatasetcovering three majordialectsshow that theoverallaccuracyofthemodelreaches 93.2% , verifying the effectiveness of this method.
Keywords:speechsimilarityevaluation;one-dimensionalconvolutionalneuralnetwork(1D-CNN);dynamictimewarping; multi-feature fusion; Tibetan pronunciation
本文提出了一种基于1D-CNN特征压缩与DTW对齐的混合模型来计算藏语发音相似度,不仅能够有效应对藏语发音相似度计算中的特征冗余性、时间形变鲁棒性,还可很好地适应低资源泛化性等难题[1]。(剩余4047字)