Xunzi the LLM—A Way for People to Access Ancient Chinese Texts大型语言模型“荀子” 让人们接触中国古籍

打开文本图片集
Thousands of years ago, texts appeared on animal bones, bronzes, bamboo slips, and silk brocades before they were written on paper. But now these ancient Chinese texts have a new container.
In December 2023, a research team from Nanjing Agricultural University has rolled out Xunzi, a large language model (LLM) and XunziChat in association with Gulian, a professional ancient Chinese text publisher.
Wang Dongbo, the leader of the research team, said that the large language model was named after Xunzi because Xunzi was not only a prominent Confucian philosopher during the late Warring States Period (475 BC—221 BC), but also a pioneer in presenting and explaining theories of linguistics in ancient China.
When asked why he and his partners made the large language model, Wang explained that traditional Chinese characters, vertical layout, and the absence of pausing and punctuation are all obstacles that readers have to overcome when they read traditional texts.
To create Xunzi the LLM, Wang and his partners first did a lot of research. Since 2013, his team has worked tirelessly to digitize Chinese classics like the Siku Quanshu, or the Complete Library in Four Sections. “The hard work involves a large-scale corpus of two billion Chinese characters, which has laid a solid foundation for the large language model,” said Wang.
几千年前,文字先是写在兽骨、青铜器、竹简和织锦上,然后才被人们写在纸上。(剩余2292字)