大模型驱动的多模态文档知识融合框架研究

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

中图分类号：TP391.1 文献标识码：A 文章编号：2096-4706（2026）06-0046-10

Research on Multimodal Document Knowledge Fusion Framework Driven by Large Model

YAN Xiaofeng'，LANG Yujie¹，YIN Lanbing'，HUANG Chongfu² （1.China Industrial Control SystemsCyberEmergencyResponse Team，Beijingooo4o，China; 2.Morefun Network Technology Co.，Ltd.， Shenzhen 518172， China）

Abstract： Traditional text information analysis paradigms have fundamental limitations in thedeep parsing of complex multimodal dcuments，whichleadstotheinabitytoeffectivelyminethecore informationvaluecontained therein.This study aimstoutilize thebreakthrough capabilities of large models to constructa knowledge fusion framework to achieve structured parsingandcross-modal fusionofmultimodal knowledge.Thispaperdesignsaclosed-loopintellgentframeworkcoveringfour stages：acqusitionasing，tegatioandfusinisframekseslgeodelasthogniientertoaltelliet acquisition，depparsing，higqalityintegration，andfinalcross-modalknowledgefusionofouments.heknowldgefusion frameworkconstructedinthis studyrealizesthe systematic transformatio frommultimodal documents tostructured kowledge， andits keyeficiencyhasasignificant improvementcomparedwith traditional paradigms.Thisresearch framework highly depends ontheunderlyingcapabiliesoflarge modelsandhasalargeconsumptionofcomputingresources.Itisnecessaryto explore the technical paths of model lightweight and inference optimization in the future.

Keywords： multimodal document; knowledge fusion; Large Language Model; information analysis; method system

0 引言

在全球数字化背景下，开源情报（OSINT）已成为支撑国家安全与商业决策的关键支柱。（剩余10162字）

试读结束

购买全文6.00元下一篇基于Real-ESRGAN预处理的YOLOv7菜品识别方法

现代信息科技

2026年06期

¥18.00/本