[PYTHON] Do you want me to fix that copy?

It's not uncommon to say that copy-and-paste code clones of source code are more reliable, but in many cases the presence of code clones often afflicts us at the last minute.

After the person in charge ran away, corrected the source code until late at night and released it, in fact, he said, "I'm making it with copy and paste, so please fix everything else and test it" and start over. Is very, very sad.

It is a humiliation to modify almost the same source code by subtly changing it like a spot the difference.

At the last minute, to avoid such punishment games, the frequency of copying should be monitored at all times, and any unusual frequency should be corrected immediately.

This section describes the tools for detecting copy and paste.

PMD-CPD PMD is a tool for detecting potential problems in Java source code implemented in Java. http://pmd.sourceforge.net/snapshot/

Part of this feature is the CPD command, which detects duplicate code and can detect duplicates in the following programming languages: ・ Java ・ JSP ・ C ++ ・ Ruby ・ Fortran ・ PHP ・ C # ・ PLSQL ・ Ecmascript

Installation method

Download one of the following files, unzip it, and extract it to any folder. http://sourceforge.net/projects/pmd/files/pmd/

Execution example

Execution example on Windows

cpd --minimum-tokens 50 --language ecmascript --format text --encoding utf8 --files C:\tool\clonedigger\test\ > result.txt

Example of execution on Linux

bin/run.sh cpd --minimum-tokens 35 --format xml --language ruby --files /var/lib/redmine/app/ > result.xml

Parameters

Parameters are shared between windows and linux.

Parameters Description
--minimum-tokens Specify the number of tokens to detect duplicates.
--format text,xml,You can select csv. If you output it in xml, you can use it in jenkins.
--language Specifies the type of programming language.
--files Specify the directory of the source code to be inspected. This is detected recursively.
--encoding Specifies the encoding of the source code to be inspected

GUI You can also work with GUI by running bin / cpdgui.bat. clone.png

Clonedigger Clonedigger is a copy / paste detection tool implemented in Python and Java. http://clonedigger.sourceforge.net/

The discoverable programming languages are:

・ Python ・ Java ・ Lua ・ Javascript

There are some tips below for detecting programming languages other than python.

Installation method

easy_install clonedigger

Execution example

Python detection example

__ If you specify a file __

clonedigger -l python -o ./test.html C:\tool\clonedigger\test\test_utf8.py

__If you specify a folder __

clonedigger -l python -o ./test.html C:\tool\clonedigger\test\test_utf8.py

If you specify a folder, subfolders are also detected. Creates the following HTML in the file specified by -o.

clone.png

If you use the --cpd-output option as follows, it will be output in XML format. This output format is the same as PMD / CPD.

clonedigger -l python --cpd-output -o test.xml C:\tool\clonedigger\test\python\

Java detection example

When detecting source code other than Python, it will not work unless java_antlr exists in the current directory. For Windows and Python 2.7, the following operations are required.

cd C:\Python27\Lib\site-packages\clonedigger-1.1.0-py2.7.egg\clonedigger
clonedigger -l java --cpd-output -o test.xml C:\tool\clonedigger\test\test.java

JavaScript detection example

When detecting JavaScript, it will not work unless js_antlr exists in the current directory. For Windows and Python 2.7, the following operations are required.

cd C:\Python27\Lib\site-packages\clonedigger-1.1.0-py2.7.egg\clonedigger
clonedigger -l js --cpd-output -o test.xml C:\tool\clonedigger\test\test.js

JavaScript that runs in the browser also works with the following sloppy implementation.

  //Extra last comma
  var questions = [
    {message: "Ah ah", category: Category.emotionalExhaustion} ,
  ];

However, in clonedigger, if there is such an invalid description, the analysis will be interrupted and an error will occur. At this time, the following error message will be displayed, which will be a hint for correction.

line 14:2 rule arrayItem failed predicate: { input.LA(1) == COMMA }?

AIST CCFinderX You can download AIST CC Finder X, which was told in the comments, from the following page. http://www.ccfinder.net/ccfinderxos-j.html

It requires 32-bit Java and Python 2.6 (no top or bottom) to work. If you download the Windows binary, it will not work unless you run it on 32bit, so you need to modify gemx.bat as follows.

gemx.bat


set PATH=C:\Windows\SysWOW64;C:\TracLight\python;C:\TracLight\python\python\Scripts\;%~dp0\scripts
set CCFINDERX_PYTHON_INTERPRETER_PATH=C:\TracLight\python\python.exe

The following image is an example of the screen detected by gemx.bat. 無題.png

In addition, you can also display scatter plots. These GUIs are fascinating.

When running from the command line:

ccfx p java -d c:\dev\java\

The a.ccfxd output at this time is a binary file that can be opened with GemX. You can convert it to text format with the following command.

>ccfx p a.ccfxd > test.txt

Visual Basic support

Visual Basic is rarely supported as a tool of this kind. It seems that it is also analyzing the code made with VB6 and VBA. However, it does not recognize the cls file, so modify ccfx_prep_scripts.ini as follows.

visualbasic=.vb;.bas;.frm;.cls

Other

-Processing for sources handling multibyte may have failed. ・ As far as I checked, it seems unlikely that cooperation with Jenkins will be easy. -Since the update history page is a dead link, it seems that it is no longer maintained. I have the source code, so I'll have to fix it myself if needed.

Monitoring by Jenkins

You can monitor code clone transitions using the Jenkins Violations plugin. https://wiki.jenkins-ci.org/display/JENKINS/Violations

  1. Generate XML on the workspace on the Jenkins script.

  2. In the post-build process of the project settings, specify the XML of 1. in the cpd of Report Violations. clone.png

  3. Each build will create the following report. clone.png

Recommended Posts

Do you want me to fix that copy?
Let's summarize what you want to do.
Q. Do you want to do something like generics that takes value from []?
Links to do what you want with Sublime Text
[AWS] What to do when you want to pip with Lambda
[AWS EC2] Settings you want to do on Amazon Linux 2
I want to do ○○ with Pandas
I want to copy yolo annotations
Do you want to wait for general purpose in Python Selenium?
Don't you want to say that you made a face recognition program?
Two document generation tools that you definitely want to use if you write python
How to write environment variables that you don't want to put on [GitHub] Python
Handle CSV that contains the element you want to parse in the file name
[Linux] You do not have root privileges. But I want to yum install.
Linux: Netplan configuration guide to see when you want to fix the IP address
What to do when you want to receive files from a Windows client remotely
I want to do Dunnett's test in Python
What to do if you can't pipenv shell
If you want to create a Word Cloud.
When you want to update the chrome driver.
I want to do pyenv + pipenv on Windows
What to do if you don't want to use Japanese column names when using ortoolpy.logistics_network
What to do if you can't pip install mysqlclient
No module named What to do if you get'libs.resources'
ModuleNotFoundError: No module What to do if you get'tensorflow.contrib'
Settings when you want to run python-mecab with travis
If you want to use Cython, also include python-dev
Don't ask "Are you sure you want to continue connecting"
When you want to filter with Django REST framework
When you want to play a game via Proxy
[ML Ops] I want to do multi-project with Python
Things to do when you start developing with Django
When you want to plt.save in a for statement
The programming language you want to be able to use
I want to say that there is data preprocessing ~
I want to do something in Python when I finish
I want to do Wake On LAN fully automatically
What to do when you can't bind CaboCha to Python
Super easy molecular phylogenetic tree creation technique that I do not want to teach anyone
[Python3] Code that can be used when you want to resize images in folder units
Golang: The matter that you want to execute the process at exactly 00:00 or 30 minutes at the specified interval.