基于情感引导-扩散模型的藏族音乐生成网络

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

关键词：藏族音乐生成；扩散模型；情感引导；tokendrop；self-conditioning中图分类号：TP183 文献标志码：A 文章编号：1001-3695（2025）08-005-2283-07doi：10.19734/j. issn.1001-3695.2025.01.0014

Emotion-driven diffusion model for Tibetan music generation

Song Ziniuab，Peng Chunyanab†，Wang Longhuiab，Zheng Yuhuia，b （a.Collegeofomputerb.StateKeyLaboratoryfetanIntellgence，QinghaNmalUniversityXinngoCina）

Abstract：Artificialintellgencehasachievedremarkableprogressinmusiccreation，yetresearchontheautomaticgenration of Tibetanmusicremainslimited.Currentstudiesface threekeychallenges：inadequate expresionofspecificemotionsefficiencyinhandlinghigh-dimensionalfatures，andinsuficientcontextualconsistencyingeneratedmusic.Toddresseseissues，this paperproposedanemotion-drivendifusionmodel（EDDM）basedontheVAE-difusion framework.This modelutilizedavariationalautoencoder（VAE）toextractessentiallatentfeaturesfromaudiodataand modelsthemduringthedifusion processEDDMintroducedthreecoreinnovations：embeddedanemotionfeatureencoderviacross-attentiontoenableprecise expresion of Tibetan music’sunique emotionsand styles，introduced a token drop strategy to filter redundant features and enhancediversityandrobustness，and proposed aself-conditioning mechanism to ensure contextual coherencebyleveraging prior-step informationfornext-stepgeneration.ExperimentalresultsshowthatEDDMachievesstate-of-the-artperformance， outperforming existing methods in objective metrics such as FAD （2.35↓ ）， JSD （0.08↓ ），and NDB （18↑ ），while also exceling insubjectiveevaluationsbyproducingmusicwithstrongemotionalexpresionandfeatureconsistency，showcasingits innovationandvalueinethnic musicgeneration.TheemotionallyguidedTibetanmusic generatedinthis workispubliclyavailableat https：//szn1998.github.io/.

Key words：Tibetan music generation；diffusion model; emotion-driven； token drop；self-conditioning

0 引言

近年来，人工智能赋能音乐生成领域取得了显著的进展[1，2]。（剩余18876字）

试读结束

购买全文6.00元下一篇增强现实机器人的虚实同步手势交互方法

计算机应用研究

2025年08期

¥12.00/本