闭合高效用项集的枚举空间并行挖掘算法

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

关键词：高效用项集；大数据；闭合项集；并行计算；数据挖掘

DOI：10.15938/j. jhust. 2025.06.004

中图分类号：TP311.1;TP399 文献标志码：A 文章编号：1007-2683（2025）06-0029-14

Abstract：Toaddress theissuesofresultredundancyandtimeoverheadinhigh-dimensionaldataenvironments，aclosed highutityitemsetminingalgorith，SpCHUM（ClosedHighUtilityItemsets MiningonSpark），isproposed.Theconeptsofsufxsts definedincloseditemsetsareappliedtohighutilityitemsetmining tosimplifytheresultsandreducememoryconsumption.Cobiing theclosedpropertyofhigutlititemsets，thewightedutiltycalulationissmplifdndeitersetiooperationisducdby using prefixpartitioningstrategy，therebyreducingthetimecost.Thealgorithmusesdepth-firstsearchtoconstructtheenueration spacewhenbuilding itemsetsupersets toensuretecompletenessofteenerateditemsets.Theparalelalgoritisimplementedon thesparkframeworktoperformcloseditemsetmininginbigdataenvironments.Experimentalminingonmushroomandotherdatasets shows that the algorithm's running efficiency is improved by 50% compared to existing algorithms. An experiment on dense data sets to eliminate the prefix partitioning strategy shows that the algorithm's running time is extended by 30% after the strategy is removed.

Keywords： high utility itemset；big data；closed itemset；parallel computing；data mining

0引言

传统频繁项集挖掘（frequentitemsetmining，FIM）默认所有项集具有相同重要性，导致挖掘结果常包含高频率但低利润的项集。（剩余23082字）

试读结束

购买全文6.00元下一篇基于多源信息融合的作业动作识别技术

哈尔滨理工大学学报

2025年06期

¥9.00/本