面向说话人日志的多原型驱动图神经网络方法

打开文本图片集
Multi-prototype driven graph neural network for speaker diarization
Abstract:Recently,theutilizationof graphneuralnetwork forsesson-levelmodelinghasdemonstrateditseficacyforspeakerdiarization.However,mostof existing variantssolelyrelyonlocalstructure information,gnoringtheimportanceof global speakerinformation,whichcannotfullycompensateforthelackof speakerinformationinthespeakerdiarizationtask.This paper proposedamulti-prototypedriven graphneuralnetwork(MPGNN)forrepresentationlearning,whichefectivelycombined local and global speaker information within each session and simultaneously remaps X -vector to a new embedding space that was moresuitableforclustering.Specifically,,the designof prototypelearning withadynamicandadaptive approach wasacritical component,where more accurateglobal speaker informationcould becaptured.Experimentalresultsshowthatthe proposed MPGNN approach significantly outperforms the baseline systems,achieving diarization error rates(DER)of 3.33% , 3.52% , (204号 5.66% ,and 6.52% on the AMI_SDM and CALLHOME datasets respectively.
Keywords:speakerdiarization;graphneural network;local structure information;global speaker information;multiprototype learning
0 引言
说话人日志(speakerdiarization,SD)的目标是解决“谁在何时说话”的问题,即在给定的包含多个说话人交流的长音频信号中,同时实现说话人识别和说话人定位。(剩余15780字)