大模型驱动的多模态文档知识融合框架研究

打开文本图片集
中图分类号:TP391.1 文献标识码:A 文章编号:2096-4706(2026)06-0046-10
Research on Multimodal Document Knowledge Fusion Framework Driven by Large Model
YAN Xiaofeng',LANG Yujie¹,YIN Lanbing',HUANG Chongfu² (1.China Industrial Control SystemsCyberEmergencyResponse Team,Beijingooo4o,China; 2.Morefun Network Technology Co.,Ltd., Shenzhen 518172, China)
Abstract: Traditional text information analysis paradigms have fundamental limitations in thedeep parsing of complex multimodal dcuments,whichleadstotheinabitytoeffectivelyminethecore informationvaluecontained therein.This study aimstoutilize thebreakthrough capabilities of large models to constructa knowledge fusion framework to achieve structured parsingandcross-modal fusionofmultimodal knowledge.Thispaperdesignsaclosed-loopintellgentframeworkcoveringfour stages:acqusitionasing,tegatioandfusinisframekseslgeodelasthogniientertoaltelliet acquisition,depparsing,higqalityintegration,andfinalcross-modalknowledgefusionofouments.heknowldgefusion frameworkconstructedinthis studyrealizesthe systematic transformatio frommultimodal documents tostructured kowledge, andits keyeficiencyhasasignificant improvementcomparedwith traditional paradigms.Thisresearch framework highly depends ontheunderlyingcapabiliesoflarge modelsandhasalargeconsumptionofcomputingresources.Itisnecessaryto explore the technical paths of model lightweight and inference optimization in the future.
Keywords: multimodal document; knowledge fusion; Large Language Model; information analysis; method system
0 引言
在全球数字化背景下,开源情报(OSINT)已成为支撑国家安全与商业决策的关键支柱。(剩余10162字)