Duplicate source code detection using adaptive and progressive approach| International Journal of Innovative Science and Research Technology

Duplicate Source Code Detection using Adaptive and Progressive Approach

Authors : Waghmare Archana B. , Chopade Kajal P. , Kokare Rucha S. , Kumbhar Shivanjali S. , Mokashi Nayan N.

Volume/Issue : Volume 3 - 2018, Issue 2 - February

Google Scholar : https://goo.gl/DF9R4u

Thomson Reuters ResearcherID : https://goo.gl/3bkzwv

Abstract : Duplicate detection is a data compression techniquefor identifying duplicate copies of repeating data. Today, duplicate detection technique need to process ever larger datasets in ever shorter time but maintaining the quality of datasets. We present adaptive and progressive approaches that signiﬁcantly increase efﬁciency for ﬁnding duplicates. In this paper,the adaptive and progressive approaches and different algorithms are used to detect and calculate the percentage of duplications from source code. Duplication is a big concern in academics and it can be a problem in every course. Duplications occurs when someone copy or present others work as their own work. Students make duplications in different areas: homework assignments, essays,projects,coding, etc. In this paper we focus on programming languages and detect the percentage of duplications in programming assignments.

Keywords : Data Cleaning, Stop Word Elimination, Stemming, Code Clone, Duplicate Detection

Duplicate detection is a data compression techniquefor identifying duplicate copies of repeating data. Today, duplicate detection technique need to process ever larger datasets in ever shorter time but maintaining the quality of datasets. We present adaptive and progressive approaches that signiﬁcantly increase efﬁciency for ﬁnding duplicates. In this paper,the adaptive and progressive approaches and different algorithms are used to detect and calculate the percentage of duplications from source code. Duplication is a big concern in academics and it can be a problem in every course. Duplications occurs when someone copy or present others work as their own work. Students make duplications in different areas: homework assignments, essays,projects,coding, etc. In this paper we focus on programming languages and detect the percentage of duplications in programming assignments.

Keywords : Data Cleaning, Stop Word Elimination, Stemming, Code Clone, Duplicate Detection