人工智能大语言模型价值对齐评估研究综述

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

[中图分类号］G250.7[文献标志码］A［DOI]10.19764/j.cnki.tsgjs.20250631

[本文引用格式].人工智能大语言模型价值对齐评估研究综述[J].，2025（5）：142-152

Survey on Value Alignment Evaluation of AI Large Language Models

Pu Hongyu，HeYunfan，Zhao Xing

[Abstract]Sortingoutthecoreindicatorsandmethdsforthevaluealignmentevaluationofargelanguagemodelscanprovidethetcal supportforpromotigtheconstrucionofsafeandrelableAlsystemsndfacitethesafedeploymentandaplicationflelangage models.Bysortingoutthaluealignmentevaluatioindicatorsoflagelanguagemodelsusingthe4Hframework（elpfunessasss honestyadtrobityitoudtattgtaaiesfggdlseatiai models（includinguestioansweringdatasetestingandgametheoryevaluation）withynamicevaluatiomodelsincudingvaluelent evaluatininredteamodealueligmentevauatinbsedingelgemodelandalueligmentevaatiningentde）e caclarifythecurrentresearchfocusofuelgnmentevaluationforAllagelanguagemodelshresearchmethodofthtwmodelshave difentfuilcaiosFoamplaticvaatisepdtettaseilamictioitble tointeractivetetsselstiroialosiacialiessfosatessmplae scenariomodelingleadstoadetachmentbetweenevaluationconclusionsandreal-worldperformance.Basedonthis，thefutureesearch directionsofvaluealignmentevaluationforAllargelanguagemodelsshouldfocusonthreeaspects：theparadigminovationfautomated alignmentvaluatioframewrkststruciofalueligmentevauatiomecansmsinouualntextsdtsiglue alignment evaluationmodels inmultimodal interactionscenarios.

Keywords]Artificial intelligence; Large language model; Value alignment; Evaluation system

0引言

随着人工智能的逐步爆发，其价值对齐问题备受关注，2025年2月的巴黎人工智能行动峰会上，61国共同签署了《巴黎人工智能宣言》，强调在人工智能道德规范方面，要致力于确保人工智能系统遵循人类的价值观与道德准则，使其符合公共利益[1]，即要实现人工智能的价值对齐。（剩余22497字）

试读结束

购买全文6.00元下一篇语义关联模型在网络学术资源遴选中的应用研究

图书馆建设

2025年05期

¥9.90/本