人工智能大语言模型价值对齐评估研究综述

打开文本图片集
[中图分类号]G250.7[文献标志码]A[DOI]10.19764/j.cnki.tsgjs.20250631
[本文引用格式].人工智能大语言模型价值对齐评估研究综述[J].,2025(5):142-152
Survey on Value Alignment Evaluation of AI Large Language Models
Pu Hongyu,HeYunfan,Zhao Xing
[Abstract]Sortingoutthecoreindicatorsandmethdsforthevaluealignmentevaluationofargelanguagemodelscanprovidethetcal supportforpromotigtheconstrucionofsafeandrelableAlsystemsndfacitethesafedeploymentandaplicationflelangage models.Bysortingoutthaluealignmentevaluatioindicatorsoflagelanguagemodelsusingthe4Hframework(elpfunessasss honestyadtrobityitoudtattgtaaiesfggdlseatiai models(includinguestioansweringdatasetestingandgametheoryevaluation)withynamicevaluatiomodelsincudingvaluelent evaluatininredteamodealueligmentevauatinbsedingelgemodelandalueligmentevaatiningentde)e caclarifythecurrentresearchfocusofuelgnmentevaluationforAllagelanguagemodelshresearchmethodofthtwmodelshave difentfuilcaiosFoamplaticvaatisepdtettaseilamictioitble tointeractivetetsselstiroialosiacialiessfosatessmplae scenariomodelingleadstoadetachmentbetweenevaluationconclusionsandreal-worldperformance.Basedonthis,thefutureesearch directionsofvaluealignmentevaluationforAllargelanguagemodelsshouldfocusonthreeaspects:theparadigminovationfautomated alignmentvaluatioframewrkststruciofalueligmentevauatiomecansmsinouualntextsdtsiglue alignment evaluationmodels inmultimodal interactionscenarios.
Keywords]Artificial intelligence; Large language model; Value alignment; Evaluation system
0引言
随着人工智能的逐步爆发,其价值对齐问题备受关注,2025年2月的巴黎人工智能行动峰会上,61国共同签署了《巴黎人工智能宣言》,强调在人工智能道德规范方面,要致力于确保人工智能系统遵循人类的价值观与道德准则,使其符合公共利益[1],即要实现人工智能的价值对齐。(剩余22497字)