吉首大学学报(自然科学版) ›› 2022, Vol. 43 ›› Issue (6): 20-25.DOI: 10.13438/j.cnki.jdzk.2022.06.005

• 计算机 • 上一篇    下一篇

基于抽象语法树的代码抄袭检测方法的改进

刘飞翔,龙冬冬,欧幸茹,陈昌奉   

  1. (吉首大学信息科学与工程学院,湖南 吉首 416000)
  • 出版日期:2022-11-25 发布日期:2023-01-10
  • 通讯作者: 陈昌奉(1993—),男,湖南祁东人,吉首大学信息科学与工程学院讲师,主要从事图表示学习、自然语言处理、代码抄袭检测及其应用研究.
  • 基金资助:
    湖南省教育厅科学研究项目(21C0363);湖南省大学生创新创业训练计划项目(JDCX2020379);吉首大学2020年教师校级科研项目(JD20001)

Improved of Source Code Plagiarism Detection Based on Abstract Syntax Tree

LIU Feixiang,LONG Dongdong,OU Xingru,CHEN Changfeng   

  1. (College of Information Science and Engineering,Jishou University,Jishou 416000,Hunan China)
  • Online:2022-11-25 Published:2023-01-10

摘要:针对传统基于抽象语法树的代码抄袭检测方法中存在的检测准确率不高及无法检测语义层面抄袭的问题,设计了一种基于改进抽象语法树的代码抄袭检测方法,该方法通过TF-IDF加权简化的语法树提高检测准确率.利用加权简化的抽象语法树设计特征提取和相似度计算方法实现对语义抄袭的部分检测,实验结果表明,该改进方法比传统的基于抽象语法树的检测方法的准确率更高,且能有效检测出部分基于语义层面的代码抄袭.

关键词: 代码抄袭, 抄袭检测, 抽象语法树

Abstract: For low detection accuracy and inability to detect plagiarism at the semantic level in the traditional abstract syntax tree based code plagiarism detection methods,an improved code plagiarism detection approach based on abstract syntax tree is designed.This approach achieves higher plagiarism detection rate through TF-IDF weighted simplified syntax tree.Moreover,feature extraction and similarity calculation methods based on weighted simplified abstract syntax tree are designed to realize partial detection of semantic plagiarism.Experimental results show that the proposed approach can effectively detect semantic level plagiarism and achieves higher accuracy than traditional tree-based methods.

Key words: abstract syntax tree, code plagiarism detection, feature extraction

公众号 电子书橱 超星期刊 手机浏览 在线QQ