基于特征融合的音频伪造检测方法

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

关键词：音频深度伪造检测；深度学习；特征融合；声码器伪迹

中图分类号：TN912.3 文献标志码：A 文章编号：1001-3695（2025）07-025-2109-07

doi：10.19734/j.issn.1001-3695.2024.11.0460

Abstract：Advancements inartificialinteligence have madedistinguishingsynthesized speech fromgenuinespeech increasinglychallenging，complicating audio deepfake detection.Existing methods often exhibit low acuracy，poor generalization， and weakrobustness.Thisstudy proposed MFF-STViT，amethod integratingthreeaudio features with vocoderartifactfeatures through anovelfeature fusionmoduletoenhance representation.The fused features were processdusing animproved Transformer model，STViT，toreduce redundancyand improve detectionperformance.Onthe ASVspoof2019LA testset，the method reduced the equal error rate（EER）by 71.38% on average. On the ASVspoof2O21 LA dataset， it achieved average reductions of 44.41% in EERand 18.11% intheminimum tandem detection cost function（min-tDCF）.For the ASVspoof2021 DF dataset， the average EER decreased by 57.81% ，with reductions exceeding 80% in specific partitions. These findings demonstrate the efectiveness of MFF-STViT in improving accuracy，generalization，and robustness.

Keywords：audio deepfake detection；deep learning；feature fusion；vocoder artifacts

0 引言

近年来，自动说话人确认（automaticspeakerverification，ASV）系统因其采集方式简便、特异性高、成本低等优点被广泛应用于语音邮件、电话银行、呼叫中心、生物特征认证、法医应用等领域[1]。（剩余19472字）

试读结束

购买全文6.00元下一篇基于多视图舌象特征融合的中医证型辨识

计算机应用研究

2025年07期

¥12.00/本