闭合高效用项集的枚举空间并行挖掘算法

打开文本图片集
关键词:高效用项集;大数据;闭合项集;并行计算;数据挖掘
DOI:10.15938/j. jhust. 2025.06.004
中图分类号:TP311.1;TP399 文献标志码:A 文章编号:1007-2683(2025)06-0029-14
Abstract:Toaddress theissuesofresultredundancyandtimeoverheadinhigh-dimensionaldataenvironments,aclosed highutityitemsetminingalgorith,SpCHUM(ClosedHighUtilityItemsets MiningonSpark),isproposed.Theconeptsofsufxsts definedincloseditemsetsareappliedtohighutilityitemsetmining tosimplifytheresultsandreducememoryconsumption.Cobiing theclosedpropertyofhigutlititemsets,thewightedutiltycalulationissmplifdndeitersetiooperationisducdby using prefixpartitioningstrategy,therebyreducingthetimecost.Thealgorithmusesdepth-firstsearchtoconstructtheenueration spacewhenbuilding itemsetsupersets toensuretecompletenessofteenerateditemsets.Theparalelalgoritisimplementedon thesparkframeworktoperformcloseditemsetmininginbigdataenvironments.Experimentalminingonmushroomandotherdatasets shows that the algorithm's running efficiency is improved by 50% compared to existing algorithms. An experiment on dense data sets to eliminate the prefix partitioning strategy shows that the algorithm's running time is extended by 30% after the strategy is removed.
Keywords: high utility itemset;big data;closed itemset;parallel computing;data mining
0引言
传统频繁项集挖掘(frequentitemsetmining,FIM)默认所有项集具有相同重要性,导致挖掘结果常包含高频率但低利润的项集。(剩余23082字)