基于大模型的污染规范提取方法

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:隐私泄露;污染规范;大模型;半监督学习

中图分类号:TP391 文献标志码:A 文章编号:1001-3695(2025)10-022-3070-06

doi:10.19734/j.issn.1001-3695.2025.03.0077

Taint specification extraction based on LLM

FangMengqi,Xu Jian† (SchoolofComputer Science& Engineering,Nanjing Universityof Science& Technology,Nanjing 21094,China)

Abstract:Withthewidespreadadoptionof mobileapplications,privacyleakage poses acritical challenge.Existing privacy leakageanalysis techniques typicalldependedonpredefinedsourceandsinkAPIlists,knownas taintspecifications.Traditional methods for extracting taintspecificationsoftenutilized manual screening or machinelearningalgorithms,whichstruggledtoscaleandyieldedhighfalse-positiverates.Toaddresstheselimitations,thispaperdevelopedanovel method,named TaintLM,fortaint specificationextractionbasedonlargelanguage models.TaintLMemployedoficialAPIdocumentationas primaryinputandleveragedcarefullydesigned instructions todrivemulti-task learning forsource-sink clasificationandsemanticcategorization.Tomitigatedataimbalance,amulti-task iterativefine-tuningstrategyincorporatedsemi-supervised learning,generating pseudo-labels to iterativelyoptimize model performance.Experiments reveals that TaintLMachieves F1 (204号 scores ofO.92andO.94forsoue-sinkclasificationandsemanticcategorzationespectively,surpassing existing miseam methods.TaintLMenhancestheaccuracyof taintspecificationextraction,demonstratesitseffctiveness inhisdomin,and provides robust technical support for mobile application privacy protection.

Key words: privacy leakage;taint specification; large language model (LLM);semi-supervised learning

0 引言

随着移动应用的广泛普及,隐私泄露问题日益严峻。(剩余18982字)

目录
monitor
客服机器人