A Python script that compares the contents of two directories


Data from recording media such as digital cameras and action cams (USB memory, SD card, etc.) was copied to another large-capacity external SSD and stored, but the remaining amount of the external SSD is running low. .. There seems to be duplicate data in the external SSD, so I want to delete it. Duplicate files occurred

--After copying to an external SSD, I added it before erasing the copy source file on the recording media, and then backed it up again. --The case of the file name may have changed due to the difference in the OS environment where the backup work was performed. At that time, I misunderstood whether the backup work had been completed or not, and backed up twice.

It seems that the cause is. I decided to write a script because it would be a hassle to check the contents of the two directories without being case sensitive.

File storage

Execution example

If you execute with the two directories you want to compare as arguments, for each subdirectory, for each file, directory, and symbolic link Compare the time stamp with the file size. For files of the same size, use the filecmp.cmp (..., shallow = False) function to compare. The following symbols indicating the results of the comparison are added in front of the file name.

--If there is a file with the same name in the comparison target, the symbol (>, =, <) representing the time stamp (mtime) and the symbol (++ ,-) of the content comparison result 3 bytes combining ,! = ,==, ,!!)

For example, === is "same time stamp and content", > ++ is "newer and larger than the file with the same name in the directory to be compared", and =! = is "comparison target". The time stamp and file size are the same as the file with the same name in the directory, but the contents (data) are different. " In the case of a symbolic link,'= (two blank characters)' is a symbol that "the time stamp and the link destination path are the same", and'= !!'is a symbol that "the time stamp is the same but the link destination path is different". Become.

Execution example

% ./cmp_dirtree ./data0 ./data1
### ========================================
1: ./data0
2: ./data1
### ========================================
###-----------------< M4ROOT >--------------------
1: ./data0/M4ROOT
2: ./data1/M4ROOT
### Sub directories: ---
1:  =++ : 2019/02/09 12:43:35 :          952 : CLIP
2:  =-- : 2019/02/09 12:43:35 :          476 : CLIP
1:  === : 2019/02/09 12:43:35 :           68 : GENERAL
2:  === : 2019/02/09 12:43:35 :           68 : GENERAL
### File lists: ---
1:  >== : 2019/11/05 03:47:45 :         6148 : .DS_Store
2:  <== : 2019/11/05 01:10:13 :         6148 : .DS_Store
1:  >++ : 2019/11/02 19:09:34 :         5122 : MEDIAPRO.XML
2:  <-- : 2019/07/01 19:07:35 :         2595 : MEDIAPRO.XML
1:  >== : 2019/11/02 19:09:34 :            7 : STATUS.BIN
2:  <== : 2019/07/01 19:07:35 :            7 : STATUS.BIN
###-----------------< M4ROOT/CLIP >--------------------
1: ./data0/M4ROOT/CLIP
2: ./data1/M4ROOT/CLIP
### File lists: ---
1:  === : 2019/02/09 14:53:23 :   1878686635 : C0001.MP4
2:  === : 2019/02/09 14:53:23 :   1878686635 : C0001.MP4
1:  === : 2019/02/09 14:53:23 :         2008 : C0001M01.XML
2:  === : 2019/02/09 14:53:23 :         2008 : C0001M01.XML
1:  === : 2019/07/01 19:07:35 :   7627022896 : C0006.MP4
2:  === : 2019/07/01 19:07:35 :   7627022896 : C0006.MP4
1:  === : 2019/07/01 19:07:35 :         2009 : C0006M01.XML
2:  === : 2019/07/01 19:07:35 :         2009 : C0006M01.XML
1:      : 2019/07/28 14:15:53 :  15709053750 : C0007.MP4
2: !    : ~~
1:      : 2019/07/28 14:15:53 :         2008 : C0007M01.XML
2: !    : ~~
(The following is omitted)

If you have a lot of directories and files to compare, you can easily find duplicate files by filtering with ./cmp_dirtree ./data0 ./data1 | grep -e'. =='. ..

Recommended Posts

A Python script that compares the contents of two directories
[Python] A program that compares the positions of kangaroos.
[Python] A program that rotates the contents of the list to the left
[Python] A program that counts the number of valleys
Template of python script to read the contents of the file
[Maya Python] Crush the contents of the script 2 ~ list Notes
A python script that generates a sample dataset for checking the operation of a classification tree
A python script that gets the number of jobs for a specified condition from indeed.com
[Maya Python] Crush the contents of the script 1 ~ Camera Speed Editor
Creating a Python script that supports the e-Stat API (ver.2)
A set of script files that do wordcloud in Python3
I wrote a script that splits the image in two
A Python script that allows you to check the status of the server from your browser
Process the contents of the file in order with a shell script
[python, ruby] fetch the contents of a web page with selenium-webdriver
A story that struggled to handle the Python package of PocketSphinx
A script that returns 0, 1 attached to the first Python prime number
[python] A note that started to understand the behavior of matplotlib.pyplot
The story of making a module that skips mail with python
Run the Python interpreter in a script
[python] [meta] Is the type of python a type?
The story of blackjack A processing (python)
[Python] A program that rounds the score
[Python] A program that calculates the number of chocolate segments that meet the conditions
[Introduction to Python] How to sort the contents of a list efficiently with list sort
From a book that makes the programmer's way of thinking interesting (Python)
[Python] Note: A self-made function that finds the area of the normal distribution
Get the contents of git diff from python
The contents of the Python tutorial (Chapter 5) are itemized.
The contents of the Python tutorial (Chapter 4) are itemized.
The contents of the Python tutorial (Chapter 2) are itemized.
Get the caller of a function in Python
The contents of the Python tutorial (Chapter 8) are itemized.
The contents of the Python tutorial (Chapter 1) are itemized.
Make a copy of the list in Python
The contents of the Python tutorial (Chapter 10) are itemized.
A note about the python version of python virtualenv
"Python Kit" that calls a Python script from Swift
[Python] A rough understanding of the logging module
Output in the form of a python array
The contents of the Python tutorial (Chapter 6) are itemized.
The contents of the Python tutorial (Chapter 3) are itemized.
A discussion of the strengths and weaknesses of Python
[Python, PyPDF2] A script that divides a spread PDF into two left and right
A Python script that reads a SQL file, executes BigQuery and saves the csv
A python script that draws a band diagram from the VASP output file EIGENVAL
Around the authentication of PyDrive2, a package that operates Google Drive with Python
[Python] A program that calculates the number of updates of the highest and lowest records
Python script to get a list of input examples for the AtCoder contest
I made a script to record the active window using win32gui of Python
the zen of Python
A script that takes a snapshot of an EBS volume
Make a BOT that shortens the URL of Discord
A python implementation of the Bayesian linear regression class
Get the return code of the Python script from bat
Python points from the perspective of a C programmer
# Function that returns the character code of a string
Python that merges a lot of excel into one excel
Not being aware of the contents of the data in python
What's in that variable (when running a Python script)
A script that outputs a list of SoftLayer portal users