面向说话人日志的多原型驱动图神经网络方法

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

Multi-prototype driven graph neural network for speaker diarization

Abstract：Recently，theutilizationof graphneuralnetwork forsesson-levelmodelinghasdemonstrateditseficacyforspeakerdiarization.However，mostof existing variantssolelyrelyonlocalstructure information，gnoringtheimportanceof global speakerinformation，whichcannotfullycompensateforthelackof speakerinformationinthespeakerdiarizationtask.This paper proposedamulti-prototypedriven graphneuralnetwork（MPGNN）forrepresentationlearning，whichefectivelycombined local and global speaker information within each session and simultaneously remaps X -vector to a new embedding space that was moresuitableforclustering.Specifically，，the designof prototypelearning withadynamicandadaptive approach wasacritical component，where more accurateglobal speaker informationcould becaptured.Experimentalresultsshowthatthe proposed MPGNN approach significantly outperforms the baseline systems，achieving diarization error rates（DER）of 3.33% ， 3.52% ，（204号 5.66% ，and 6.52% on the AMI_SDM and CALLHOME datasets respectively.

Keywords：speakerdiarization；graphneural network；local structure information；global speaker information；multiprototype learning

0 引言

说话人日志（speakerdiarization，SD）的目标是解决“谁在何时说话”的问题，即在给定的包含多个说话人交流的长音频信号中，同时实现说话人识别和说话人定位。（剩余15780字）

试读结束

购买全文6.00元下一篇邻域变异的黑猩猩多峰优化算法

计算机应用研究

2025年06期

¥12.00/本