面向大数据的多源化工医药数据融合存储技术优化研究

打开文本图片集
中图分类号:TP311.13;TQ460 文献标志码:A 文章编号:1001-5922(2026)1-0201-04
Abstract:To address the problem that the storage and computing strategies of the traditional Hadoop framework are insuficient inhandling complex data correlations,this studyfirst implemented cross-systemand cross-platform data migrationand standardization through a multi-level dataintegration method,and constructed aunifieddata dictionary.Subsequently,a data distribution mechanism based on the hash bucketing algorithm was introduced tooptimize the HDFS storage strategy,reducing the network transmission overhead during associated data queries.Meanwhile, the MapReduce computing framework was optimized in a targeted manner to improve the eficiency of associated queries.To verify the efectiveness of the proposed optimization strategy,the study compared the assciated query running time of the MySQL database with that of the Hadoop framework before and after optimization basedon multi-source chemical and pharmaceutical data of the same data scale.Theresults showed that when the optimized Hadoop framework was used to store multi-source chemical and pharmaceutical data,the running time required for associated queries was significantly reduced,and the query efficiency was greatly improved.
Key words:big data storage optimization;multi-source data integration;hadoop framework;hash bucket algorithm; correlation analysis
化工医药领域作为典型的数据密集型行业,随着生产自动化与信息化程度的持续提升,各类业务系统每日都会生成海量的多源异构数据[1]。(剩余5394字)