关于大语言模型一体化评测的研究和实践

打开文本图片集
中图分类号:TP391.1
文献标识码:A 文章编号:2096-4706(2025)11-0059-06
Research and Practice on Integrated Evaluation of Large Language Models
HEQi,HANXiao,MAOHaotian,QIUJianmin (ChinaTelecomCorporationLimitedJiangsu Branch,Nanjing21oo37,China)
Abstract: With the increasing application of LLMs, how to accurately, objectivelyand comprehensively evaluate the ability of large models has becomeanimportanttopicofcommon concern inacademia and idustry.Inrecentyears,Jiangsu Telecom hasactivelycarriedoutthe exploration and practice of LLMs,and reconstructed multiple applications in the BMO domains through large models.Thispaperintroduces theintegratedevaluationschemeandsystempracticeofJiangsuTelecom basedonthecurrntopensourcebig modelecology.Thisschemecanagilelyaccessthelatestreleasedopensourcelargemodels, and realize theblind testselectionoflarge models basedonpracticalapplications,providing ausefulreference forbuilding a morescientificand perfectLargeLanguageModel evaluationsystem.
Keywords:LLMs; evaluation; framework
0 引言
在大模型应用实践初期,往往通过算力分配的方式,由各应用方自行开展大模型实践。(剩余5772字)