Introduction

Continuing from I tried to sort out objects from the images of the steak set meal-several overlaps, this time I compared the histograms to detect similar images. I went.

reference

-Calculate image similarity with Python + OpenCV

Source code

There are some parts that have not been refactored, but I hope you can ignore them.

By the way, this time we will put together rectangles of almost the same image size, so we have not dared to resize the image.

`group_image.py`


# -*- coding: utf-8 -*-

import cv2
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import selectivesearch
import os

def main():
    # loading lena image
    img = cv2.imread("{Image file}")

    # perform selective search
    img_lbl, regions = selectivesearch.selective_search(
        img,
        scale=500,
        sigma=0.9,
        min_size=10
    )

    candidates = set()

    for r in regions:
        # excluding same rectangle (with different segments)
        if r['rect'] in candidates:
            continue

        # excluding regions smaller than 2000 pixels
        if r['size'] < 2000:
            continue

        # distorted rects
        x, y, w, h = r['rect']

        if w / h > 1.2 or h / w > 1.2:
            continue

        candidates.add(r['rect'])

    # draw rectangles on the original image
    fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
    ax.imshow(img)

    overlaps = {}

    #Count the number of overlaps and assign them to the array.
    for x, y, w, h in candidates:
        group = x + y + w + h

        for x2, y2, w2, h2 in candidates:
            if x2 - w < x < x2 + w2 and y2 - h < y < y2 + h2:

                if not group in overlaps:
                    overlaps[group] = 0

                overlaps[group] = overlaps[group] + 1

    #Outputs files with 30 or more overlaps (30 is arbitrarily thresholded).
    for key, overlap in enumerate(overlaps):
        if overlap > 30:
            for x, y, w, h in candidates:
                group = x + y + w + h

                if group in overlaps:
                    cv2.imwrite("{File Path}" + str(group) + '.jpg', img[y:y + h, x:x + w])

    #Calculate the similarity of images by comparing histograms
    image_dir = "{File Path}/"
    target_files = os.listdir(image_dir)
    files = os.listdir(image_dir)

    for target_file in target_files:
        if target_file == '.DS_Store':
            continue

        target_image_path = image_dir + target_file
        target_image = cv2.imread(target_image_path)
        target_hist = cv2.calcHist([target_image], [0], None, [256], [0, 256])

        for file in files:
            if file == '.DS_Store' or file == target_file:
                continue

            comparing_image_path = image_dir + file
            comparing_image = cv2.imread(comparing_image_path)
            comparing_hist = cv2.calcHist([comparing_image], [0], None, [256], [0, 256])

            ret = cv2.compareHist(target_hist, comparing_hist, 0)
            probability = ret * 100

            print("target file: " + target_file, "file: " + file, "similarity: " + str(probability) + "%")

if __name__ == "__main__":
    main()

result

The one with 100% similarity seems to be the exact same image.

Sometimes it's negative, but ...

('target file: 1018.jpg', 'file: 250.jpg', 'similarity: 24.8911807562%')
('target file: 1018.jpg', 'file: 369.jpg', 'similarity: 6.78223462382%')
('target file: 1018.jpg', 'file: 389.jpg', 'similarity: 11.1974626968%')
('target file: 1018.jpg', 'file: 432.jpg', 'similarity: 35.179639392%')
('target file: 1018.jpg', 'file: 463.jpg', 'similarity: 79.5281353144%')
('target file: 1018.jpg', 'file: 477.jpg', 'similarity: 51.5870749875%')
('target file: 1018.jpg', 'file: 480.jpg', 'similarity: 55.1832671208%')
('target file: 1018.jpg', 'file: 492.jpg', 'similarity: 88.2822944972%')
('target file: 1018.jpg', 'file: 522.jpg', 'similarity: 76.9528435542%')
('target file: 1018.jpg', 'file: 547.jpg', 'similarity: 84.9997652385%')
('target file: 1018.jpg', 'file: 559.jpg', 'similarity: 77.6441098189%')
('target file: 1018.jpg', 'file: 575.jpg', 'similarity: 76.3571281251%')
('target file: 1018.jpg', 'file: 581.jpg', 'similarity: 76.7456283874%')
('target file: 1018.jpg', 'file: 594.jpg', 'similarity: 31.9957806646%')
('target file: 1018.jpg', 'file: 603.jpg', 'similarity: 85.3813480299%')
('target file: 1018.jpg', 'file: 629.jpg', 'similarity: 88.0957855275%')
('target file: 1018.jpg', 'file: 632.jpg', 'similarity: 60.7236277665%')
('target file: 1018.jpg', 'file: 634.jpg', 'similarity: 62.3073009307%')
('target file: 1018.jpg', 'file: 635.jpg', 'similarity: 65.5935422037%')
('target file: 1018.jpg', 'file: 657.jpg', 'similarity: 56.6421422253%')
('target file: 1018.jpg', 'file: 658.jpg', 'similarity: 82.0967550779%')
('target file: 1018.jpg', 'file: 659.jpg', 'similarity: 89.7396556858%')
('target file: 1018.jpg', 'file: 754.jpg', 'similarity: 78.3236083079%')
('target file: 1018.jpg', 'file: 758.jpg', 'similarity: 79.0903410039%')
('target file: 1018.jpg', 'file: 799.jpg', 'similarity: 89.3025985059%')
('target file: 1018.jpg', 'file: 806.jpg', 'similarity: 97.2873823376%')
('target file: 1018.jpg', 'file: 815.jpg', 'similarity: 93.3345515745%')
('target file: 1018.jpg', 'file: 867.jpg', 'similarity: 81.8261095798%')
('target file: 1018.jpg', 'file: 920.jpg', 'similarity: 93.4987208053%')
('target file: 1018.jpg', 'file: 921.jpg', 'similarity: 90.3518029292%')
('target file: 1018.jpg', 'file: 932.jpg', 'similarity: 94.4258967857%')
('target file: 1018.jpg', 'file: 964.jpg', 'similarity: 10.5652113467%')
('target file: 1018.jpg', 'file: 972.jpg', 'similarity: 98.8755231495%')
----The following is omitted

Details

What is the similarity of 90% or more?

('target file: 1018.jpg', 'file: 972.jpg', 'similarity: 98.8755231495%')

('target file: 754.jpg', 'file: 758.jpg', 'similarity: 99.8932682258%')

I feel that the accuracy is quite good.

Summary

It seems that I got a pretty good result once, so next time I would like to group images with high similarity and put them together.

All page links

-I tried object detection using Python and OpenCV -I tried to sort out objects from the image of steak set meal-① Object detection -I tried to sort out the objects from the image of the steak set meal-② Overlap number sorting -I tried to sort out the objects from the image of the steak set meal-③ Similar image heat map detection -I tried to sort out the objects from the image of the steak set meal-④ Clustering -I tried to sort out objects from the image of steak set meal-⑤ Similar image feature point detection edition

[PYTHON] I tried to sort out the objects from the image of the steak set meal --③ Similar image Heat map detection