Sorting image files with Python

Preface

I'm an old person, so I'm basically a "make things that don't exist" group.

theme

This time, as the title suggests, the images are sorted. Probably from around 2000, I don't know how many files there are, such as photos taken with a digital camera or smartphone, downloaded wallpapers, screenshots, etc., and a snapshot-like backup at a certain point is stored in multiple HDDs, so it is in a very chaotic state. Well, it can be said that storing with distribution and redundancy is a risk hedge in a sense.

--HDD1: Up to 2000 sheets --HDD2: Up to 2500 sheets --HDD3: Up to 3000 sheets

There is a lot of duplication, and the wreckage that I had sorted a little before remains in filename (1) .jpg, so I thought I should omit the duplication and put it in a new HDD and upload it to Amazon Photos.

  1. Sort destination folder is dst \ yyyymm (classify by year and month for the time being)
  2. yyyymm is determined from Exif's DateTimeOriginal
  3. Images with Exif but without DateTimeOriginal or without Exif are determined from the file time stamp (last modified date and time).
  4. If a file with the same name already exists in the sorting destination folder, add the serial number "(1)" to the end of the file name and sort the files so that they are not duplicated.
  5. Since I want to delete duplicate files later, generate a csv file that outputs the file name, CRC32, and file size in one line for each yyyymm folder.
  6. Target formats are only .jpg and .png (.gif and .bmp should be almost nonexistent & .q4 and .mag are abandoned anymore)

Development environment

code

Only the place to get crc32 may be reused later, so put it in another module.

mycrc32.py


import binascii
import glob
import os
import sys

#Get CRC32 of file
def get_crc32(file):
    with open(file, "rb") as f:
        barray = f.read()
    return binascii.crc32(barray, 0)

# {File name, CRC32, file size}To output
def output_info(srcFolder):
    files = glob.glob(os.path.join(srcFolder, "*.*"), recursive=True)
    infofile = os.path.join(srcFolder, "info.csv")

    if os.path.isfile(infofile):
        os.remove(infofile)

    for file in files:
        with open(infofile, "a") as f:
            efull = os.path.join(srcFolder, file)
            crc32 = get_crc32(efull)
            f.write(f"{os.path.basename(efull)},{hex(crc32)},{os.path.getsize(file)}\n")

This is the main body.

PictureSorter.py


import datetime
import os
import glob
from PIL import Image
from PIL.ExifTags import TAGS
import shutil
import sys
import mycrc32

#Get Exif information
def get_exif_of_image(file):
    img = Image.open(file)

    try:
        exif = img._getexif()
    except AttributeError:
        return {}
    exifTable = {}

    if exif is not None:
        for key in exif.keys():
            tag = TAGS.get(key, key)
            exifTable[tag] = exif[key]
    return exifTable

#Get the last modified date and time of the file
def get_last_write_time(file):
    t = os.path.getmtime(file)
    return datetime.datetime.fromtimestamp(t)

#Get the destination subfolder name
def get_destination_folder(file):
    exifTable = get_exif_of_image(file)

    if exifTable is not None:
        t = exifTable.get('DateTimeOriginal')

        if t is not None:
            t = datetime.datetime.strptime(t, '%Y:%m:%d %H:%M:%S')
        else:
            t = get_last_write_time(file)
    else:
        t = get_last_write_time(file)

    return t.strftime("%Y%m")

#Decide on a unique file name when duplicate file names
def ensure_filename(dfull):
    path = os.path.dirname(dfull)
    pureName = os.path.splitext(os.path.basename(dfull))[0]
    ext = os.path.splitext(os.path.basename(dfull))[1]

    newName = f"{os.path.join(path, pureName)}{ext}"

    i = 0
    while os.path.isfile(newName):
        i += 1
        newName = f"{os.path.join(path, pureName)}({i}){ext}"

    return newName

if __name__ == "__main__":
    _SOURCE_FOLDER = sys.argv[1]
    _DESTINATION_FOLDER = sys.argv[2]

    def get_ext(file):
        return os.path.splitext(os.path.basename(file))[1].lower()

    if (len(sys.argv) != 3):
        print("PictureSorter.py srcFolder dstFolder")
        x = input()
        exit
    else:
        print(f"srcFolder={_SOURCE_FOLDER}")
        print(f"dstFolder={_DESTINATION_FOLDER}")
        print("any key to go!")
        x = input()

    files = glob.glob(os.path.join(_SOURCE_FOLDER, "*.*"), recursive=True)

    for file in filter(lambda file: get_ext(file) in [ ".jpg ", ".png " ], files):
        dstfol = get_destination_folder(file)
        dfol = os.path.join(_DESTINATION_FOLDER, dstfol)

        if not os.path.exists(dfol):
            os.makedirs(dfol, exist_ok=True)

        #Move file(While saying that, the lost is scary, so copy it once and erase it later)
        dfull = os.path.join(dfol, os.path.basename(file))
        efull = ensure_filename(dfull)
        print(f"{file} -> {efull}")
        shutil.copy2(file, efull)       #If you want to move, use move instead of copy2

    #File name / CRC32 / file size output
    mycrc32.output_info(dfol)

I tried using it

This works as required, but what about the reality? Everyday photos taken with a smartphone (so-called family photos) and screenshots of the game cannot be in the same folder just because they have the same date. Folders for each category are required at a higher level. Once the category is decided, the requirement for moving to the subordinate folder is satisfied with this code.

In that case, it is very troublesome, but it is appropriate to D & D the image file you want to move after selecting the destination category on the GUI form while looking at the reduced image with Explorer. I think that it is cool to let machine learning determine the destination category, but with the current knowledge, that is (sweat)

That's why I did a little research, but it seems that it will not end during the holidays because there are various things such as the implementation using wxPython from the method of using the Windows API and the implementation in 3.9.0 (actually I am doing it during the summer vacation) I will write only the GUI part in C # and make the tea muddy (I will omit it because the C # side is not a big content)

Afterword

Maybe there's more Python-like code, but it's obvious, so around here. Recently, the feel of the membrane keyboard has become unpleasant, so I replaced it with a mechanical keyboard (brown axis) for the first time in several years. It's fun to hit the keys!

Recommended Posts

Sorting image files with Python (2)
Sorting image files with Python (3)
Sorting image files with Python
Image processing with Python
Image processing with Python (Part 2)
Sort huge files with python
Image processing with Python (Part 1)
Tweet with image in Python
Integrate PDF files with Python
Image processing with Python (Part 3)
Reading .txt files with Python
[Python] Image processing with scikit-image
Extract the table of image files with OneDrive & Python
Cut out an image with python
[Python] Using OpenCV with Python (Image Filtering)
Recursively unzip zip files with python
Manipulating EAGLE .brd files with Python
[Python] Using OpenCV with Python (Image transformation)
[Python] POST wav files with requests [POST]
Decrypt files encrypted with OpenSSL with Python 3
Let's do image scraping with Python
Handle Excel CSV files with Python
Read files in parallel with Python
Find image similarity with Python + OpenCV
Image processing with Python 100 knocks # 2 Grayscale
Send image with python, save with php
Sorting files by Python naming convention
Gradation image generation with Python [1] | np.linspace
[Python] Send gmail with python: Send one by one with multiple image files attached
[Python] Easy reading of serial number image files with OpenCV
Basics of binarized image processing with Python
Image processing with Python 100 knock # 10 median filter
[AWS] Using ini files with Lambda [Python]
FizzBuzz with Python3
Scraping with Python
Play audio files from Python with interrupts
python image processing
Statistics with python
Scraping with Python
Create a dummy image with Python + PIL.
Python with Go
Image processing with Python 100 knocks # 8 Max pooling
Introduction to Python Image Inflating Image inflating with ImageDataGenerator
Twilio with Python
Integrate with Python
Play with 2016-Python
Decrypt files encrypted with openssl from python with openssl
AES256 with python
Use cryptography library cryptography with Docker Python image
Image processing with Python & OpenCV [Tone Curve]
Tested with Python
python starts with ()
Image acquisition from camera with Python + OpenCV
[Python] I made an image viewer with a simple sorting function.
Reading and writing JSON files with Python
Download files on the web with Python
[Easy Python] Reading Excel files with openpyxl
with syntax (Python)
Bingo with python
Drawing with Matrix-Reinventor of Python Image Processing-
Zundokokiyoshi with python