基于特征融合的音频伪造检测方法

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:音频深度伪造检测;深度学习;特征融合;声码器伪迹

中图分类号:TN912.3 文献标志码:A 文章编号:1001-3695(2025)07-025-2109-07

doi:10.19734/j.issn.1001-3695.2024.11.0460

Abstract:Advancements inartificialinteligence have madedistinguishingsynthesized speech fromgenuinespeech increasinglychallenging,complicating audio deepfake detection.Existing methods often exhibit low acuracy,poor generalization, and weakrobustness.Thisstudy proposed MFF-STViT,amethod integratingthreeaudio features with vocoderartifactfeatures through anovelfeature fusionmoduletoenhance representation.The fused features were processdusing animproved Transformer model,STViT,toreduce redundancyand improve detectionperformance.Onthe ASVspoof2019LA testset,the method reduced the equal error rate(EER)by 71.38% on average. On the ASVspoof2O21 LA dataset, it achieved average reductions of 44.41% in EERand 18.11% intheminimum tandem detection cost function(min-tDCF).For the ASVspoof2021 DF dataset, the average EER decreased by 57.81% ,with reductions exceeding 80% in specific partitions. These findings demonstrate the efectiveness of MFF-STViT in improving accuracy,generalization,and robustness.

Keywords:audio deepfake detection;deep learning;feature fusion;vocoder artifacts

0 引言

近年来,自动说话人确认(automaticspeakerverification,ASV)系统因其采集方式简便、特异性高、成本低等优点被广泛应用于语音邮件、电话银行、呼叫中心、生物特征认证、法医应用等领域[1]。(剩余19472字)

目录
monitor