人工智能大语言模型价值对齐评估研究综述

  • 打印
  • 收藏
收藏成功


打开文本图片集

[中图分类号]G250.7[文献标志码]A[DOI]10.19764/j.cnki.tsgjs.20250631

[本文引用格式].人工智能大语言模型价值对齐评估研究综述[J].,2025(5):142-152

Survey on Value Alignment Evaluation of AI Large Language Models

Pu Hongyu,HeYunfan,Zhao Xing

[Abstract]Sortingoutthecoreindicatorsandmethdsforthevaluealignmentevaluationofargelanguagemodelscanprovidethetcal supportforpromotigtheconstrucionofsafeandrelableAlsystemsndfacitethesafedeploymentandaplicationflelangage models.Bysortingoutthaluealignmentevaluatioindicatorsoflagelanguagemodelsusingthe4Hframework(elpfunessasss honestyadtrobityitoudtattgtaaiesfggdlseatiai models(includinguestioansweringdatasetestingandgametheoryevaluation)withynamicevaluatiomodelsincudingvaluelent evaluatininredteamodealueligmentevauatinbsedingelgemodelandalueligmentevaatiningentde)e caclarifythecurrentresearchfocusofuelgnmentevaluationforAllagelanguagemodelshresearchmethodofthtwmodelshave difentfuilcaiosFoamplaticvaatisepdtettaseilamictioitble tointeractivetetsselstiroialosiacialiessfosatessmplae scenariomodelingleadstoadetachmentbetweenevaluationconclusionsandreal-worldperformance.Basedonthis,thefutureesearch directionsofvaluealignmentevaluationforAllargelanguagemodelsshouldfocusonthreeaspects:theparadigminovationfautomated alignmentvaluatioframewrkststruciofalueligmentevauatiomecansmsinouualntextsdtsiglue alignment evaluationmodels inmultimodal interactionscenarios.

Keywords]Artificial intelligence; Large language model; Value alignment; Evaluation system

0引言

随着人工智能的逐步爆发,其价值对齐问题备受关注,2025年2月的巴黎人工智能行动峰会上,61国共同签署了《巴黎人工智能宣言》,强调在人工智能道德规范方面,要致力于确保人工智能系统遵循人类的价值观与道德准则,使其符合公共利益[1],即要实现人工智能的价值对齐。(剩余22497字)

monitor
客服机器人