基于随机映射的隐私保护聚类算法

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:高维数据;隐私保护;聚类;随机映射;K-means

中图分类号:TP309 文献标志码:A 文章编号:1001-3695(2025)08-035-2511-07

doi:10. 19734/j.issn.1001-3695.2024.10.0503

Privacy-preserving algorithm for clustering high-dimensional data based on random mapping

He Lili a,b, c,Zhang Chenglina,b,c,Cao Mingzenga,bc, Zhang Lei a.b,et (aScloffadEeoicogcalboaofuooutellgee&fr cessing,c.JiusiKbotoftelittoo&qeEeengecsiUJi longjiang 154007,China)

Abstract:Toaddress thechalengeof increasing privacycosts withrisingdata dimensions inclustering privacyprotectionalgorithms,this paper proposed arandom projection-based privacypreserving algorithm(RPPP).RPP selected relevant features usingthesymmetricaluncertaintymethodandgeneratedrandommatricesthroughindependentlyandidenticalldistributed Gaussiansequences.Tostrengthen distance-preservingproperties,itappied Gram-Schmidtorthogonalization toensuretheorthogonalityof therandom matrices.These matriceswere decomposed intomultipleindependentsub-matrices to map thereduced-dimensionalfeatures,andcreatedafeature-matchingdomainandanoise-perturbeddomain.To further enhanceprivacy protectin,thealgorithminjectedrandomnoiseintothenoise-perturbeddomain.ExperimentalresultsdemonstratethatRPPP efectivelydefendsagainstprivacyatacks.TestsconductedontheCancerandDiabetes datasetsshowthatRPPPoutperforms traditional algorithmsinbothprivacyprotectionandclustering eficiency.Specifically,RPPPimproves clustering efficncyby approximately 16.34% , 23.44% ,and 32.94% compared with UPA,GCCG,and AKA algorithms,respectively. Overall,RPPP significatlyehanesprivacyprotectionwhileboostingclustering eficiency,confirming itseffctivenessandpracticalaplicability.

Key Words:high-dimensional data;privacy protection;clustering;random projection;K-means

0 引言

近年来,随着大数据技术的迅速兴起以及信息技术的飞速发展[1,诸如医疗机构和教育机构等组织每天都会生成大量数据,这些数据涵盖了广泛的领域,通过数据挖掘技术的分析和处理,能够将其转换为具有实际应用价值的信息。(剩余18319字)

目录
monitor