基于增强控制流图与孪生网络架构的代码克隆检测方法

打开文本图片集
关键词:控制流图;孪生网络架构;代码表征;语义相似性;克隆检测中图分类号:TP311 文献标志码:A 文章编号:1001-3695(2025)07-028-2132-09doi:10.19734/j. issn.1001-3695.2024.11.0441
Abstract:Toaddresstheissues of mising contextual information and weak semantic learning capabilities inexisting code clone detection methods,this paper proposedamethod basedonanenhancedcontrolflowgraph(ECFG)and twin network architecture.Firstly,itdesignedECFG,whichembeddedcross-nodecorelationedges tostrenghencontextualawareness. Then,itintroducedCGSMN,asemanticmatching modelbasedontwinnetworks.Thismodelintegratedamulti-headatntion mechanism to extractkeyinformationfromthenodes,thenimprovedtherelational graphatentionnetwork tocapture nternode associationsand generate graph feature vectors.Finall,it explored thesemanticrelationships between thesefeature vectorsandcomputedthesemanticsimilarity.Empirical evaluationwasconductedon tworepresentative datasets.Theresults show that,compared to methods such as ASTNN,FA-AST,and DHAST,the F1 -score on the BigCloneBench dataset improves by0.5 to15.5percentage points,andby1.5to16.5percentage pointsonthe GogleCode Jamdataset,demonstrating the effectiveness of the proposed method for semantic clone detection.
Key words:controlflow graph;siamese neural network;coderepresentation;codesemantic similarity;codeclonedetection
0 引言
近年来,开源文化的蓬勃发展催生了一系列以协作共享为理念的开发者社群,对软件开发模式产生颠覆性改变[1],软件代码的复制、粘贴和修改等克隆模式成为一种普遍且高效的实践。(剩余23693字)