基于大模型的污染规范提取方法

打开文本图片集
关键词:隐私泄露;污染规范;大模型;半监督学习
中图分类号:TP391 文献标志码:A 文章编号:1001-3695(2025)10-022-3070-06
doi:10.19734/j.issn.1001-3695.2025.03.0077
Taint specification extraction based on LLM
FangMengqi,Xu Jian† (SchoolofComputer Science& Engineering,Nanjing Universityof Science& Technology,Nanjing 21094,China)
Abstract:Withthewidespreadadoptionof mobileapplications,privacyleakage poses acritical challenge.Existing privacy leakageanalysis techniques typicalldependedonpredefinedsourceandsinkAPIlists,knownas taintspecifications.Traditional methods for extracting taintspecificationsoftenutilized manual screening or machinelearningalgorithms,whichstruggledtoscaleandyieldedhighfalse-positiverates.Toaddresstheselimitations,thispaperdevelopedanovel method,named TaintLM,fortaint specificationextractionbasedonlargelanguage models.TaintLMemployedoficialAPIdocumentationas primaryinputandleveragedcarefullydesigned instructions todrivemulti-task learning forsource-sink clasificationandsemanticcategorization.Tomitigatedataimbalance,amulti-task iterativefine-tuningstrategyincorporatedsemi-supervised learning,generating pseudo-labels to iterativelyoptimize model performance.Experiments reveals that TaintLMachieves F1 (204号 scores ofO.92andO.94forsoue-sinkclasificationandsemanticcategorzationespectively,surpassing existing miseam methods.TaintLMenhancestheaccuracyof taintspecificationextraction,demonstratesitseffctiveness inhisdomin,and provides robust technical support for mobile application privacy protection.
Key words: privacy leakage;taint specification; large language model (LLM);semi-supervised learning
0 引言
随着移动应用的广泛普及,隐私泄露问题日益严峻。(剩余18982字)