基于随机映射的隐私保护聚类算法

打开文本图片集
关键词:高维数据;隐私保护;聚类;随机映射;K-means
中图分类号:TP309 文献标志码:A 文章编号:1001-3695(2025)08-035-2511-07
doi:10. 19734/j.issn.1001-3695.2024.10.0503
Privacy-preserving algorithm for clustering high-dimensional data based on random mapping
He Lili a,b, c,Zhang Chenglina,b,c,Cao Mingzenga,bc, Zhang Lei a.b,et (aScloffadEeoicogcalboaofuooutellgee&fr cessing,c.JiusiKbotoftelittoo&qeEeengecsiUJi longjiang 154007,China)
Abstract:Toaddress thechalengeof increasing privacycosts withrisingdata dimensions inclustering privacyprotectionalgorithms,this paper proposed arandom projection-based privacypreserving algorithm(RPPP).RPP selected relevant features usingthesymmetricaluncertaintymethodandgeneratedrandommatricesthroughindependentlyandidenticalldistributed Gaussiansequences.Tostrengthen distance-preservingproperties,itappied Gram-Schmidtorthogonalization toensuretheorthogonalityof therandom matrices.These matriceswere decomposed intomultipleindependentsub-matrices to map thereduced-dimensionalfeatures,andcreatedafeature-matchingdomainandanoise-perturbeddomain.To further enhanceprivacy protectin,thealgorithminjectedrandomnoiseintothenoise-perturbeddomain.ExperimentalresultsdemonstratethatRPPP efectivelydefendsagainstprivacyatacks.TestsconductedontheCancerandDiabetes datasetsshowthatRPPPoutperforms traditional algorithmsinbothprivacyprotectionandclustering eficiency.Specifically,RPPPimproves clustering efficncyby approximately 16.34% , 23.44% ,and 32.94% compared with UPA,GCCG,and AKA algorithms,respectively. Overall,RPPP significatlyehanesprivacyprotectionwhileboostingclustering eficiency,confirming itseffctivenessandpracticalaplicability.
Key Words:high-dimensional data;privacy protection;clustering;random projection;K-means
0 引言
近年来,随着大数据技术的迅速兴起以及信息技术的飞速发展[1,诸如医疗机构和教育机构等组织每天都会生成大量数据,这些数据涵盖了广泛的领域,通过数据挖掘技术的分析和处理,能够将其转换为具有实际应用价值的信息。(剩余18319字)