[PYTHON] Search for homeomorphic idioms with opencv

Introduction

Do you know a card game called Topolo Memory? It is a game in which you compete for cards when you find a homeomorphic figure written on a card. When I played this game with a friend, I added a game to search for homeomorphic idioms, so I will try to analyze the solution of this with the help of opencv.

environment

OS:macOSX Python: 3.6.8 opencv: 4.0.0.21 numpy: 1.15.0

means

Use the mecab dictionary as the dictionary. Search from all noun dictionaries (228297).

Extraction

Open the csv files one by one and extract only the headwords. Convert to numpy.array and then slice. Connect all this.

csvnoun.py


import csv
import codecs
import numpy as np
from functools import reduce
csvs = [
    "Noun.csv",
    "Noun.adjv.csv",
    "Noun.adverbal.csv",
    "Noun.demonst.csv",
    "Noun.nai.csv",
    "Noun.name.csv",
    "Noun.number.csv",
    "Noun.org.csv",
    "Noun.others.csv",
    "Noun.place.csv",
    "Noun.proper.csv",
    "Noun.verbal.csv"
]
filedelimitor = "~/mecab-ipadic-2.7.0-20070801/"
def csv_1(csv_file):
    with codecs.open(filedelimitor+csv_file, "r","euc_jp") as f:
        reader = csv.reader(f)
        csv_words = [k for k in reader]
        csv_words_np = np.array(csv_words)
        return(csv_words_np[:,0].tolist())

words = reduce(lambda x,y:x+y,[csv_1(k) for k in csvs])
print(words[0:10])
print("Quantity:",len(words))

It takes about 2 seconds in the local environment.

Generate

Since it is troublesome to output Japanese characters with opencv, it is generated with pillow.

char_img.py


import cv2
from PIL import Image, ImageDraw, ImageFont
import numpy as np

img = Image.new("L",(500, 500),"white")
char = "Ah"
jpfont = ImageFont.truetype("/System/Library/Fonts/Hiragino Horn Gothic W4.ttc",500)
draw = ImageDraw.Draw(img)
draw.text((0,0),char,font=jpfont,fill="black")
img_cv = np.array(img,dtype=np.uint8)

If you want to run it on other than OSX, change jpfont to an appropriate font.

analysis

Since opencv has a function called cv2.findContours, use this. A function that detects the contour of a binary image. (Reference: http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html) When the flag (second argument) is set to RETR_TREE, the hierarchy retains the entire hierarchy. Use this hierarchy. hierarcy is stored in the structure [Next, Previous, First_Child, Parent]. Of these, only parent is used because the structure can be found by examining all parents. Searching using First_child or Next requires less calculation, but even complicated characters do not exceed 20 parts, so we are searching all. Since cv2.findContours has a background of 0, the one with 0 as child is the outermost line. Since the line part has an even parent, count the elements that have an even index as the parent. Finally, it converts the string character by character, concatenates it, and sorts and returns it for searching.

string_topology.py


topology_dic = {}
jpfont = ImageFont.truetype("/System/Library/Fonts/Hiragino Horn Gothic W4.ttc",500)

def char_topology(char):
    if char in topology_dic:
        return topology_dic[char]
    else:
        img = Image.new("L",(500, 500),"white")
        draw = ImageDraw.Draw(img)
        draw.text((0,0),char,font=jpfont,fill="black")
        img_cv = np.array(img,dtype=np.uint8)
        ret,thresh = cv2.threshold(img_cv,127,255,0)
        contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
        img_cv_con = np.zeros((500,500,3),np.uint8)
        cv2.drawContours(img_cv_con,contours,-1,(0,255,0),3)
        parent = [k[3] for k in hierarchy[0]]
        topology = [parent.count(k)
                    for k in range(len(parent)) if parent[k]%2 == 0]
        topology_dic[char] = topology
        return topology

def string_topology(string):
    topology = reduce(lambda x,y:x+y,[char_topology(k) for k in string])
    topology.sort()
    return topology

Search

Converts the list string to topology and outputs the match.

search.py


in_topology = string_topology(sys.argv[1])
print(in_topology)
for k in words:
    if in_topology == string_topology(k):
        print(k)

code

If you connect them all, it will look like this.

same_topology.py


import cv2
import os
from PIL import Image, ImageDraw, ImageFont
import numpy as np
from functools import reduce
import csv
import codecs
import sys

csvs = [
    "Noun.csv",
    "Noun.adjv.csv",
    "Noun.adverbal.csv",
    "Noun.demonst.csv",
    "Noun.nai.csv",
    "Noun.name.csv",
    "Noun.number.csv",
    "Noun.org.csv",
    "Noun.others.csv",
    "Noun.place.csv",
    "Noun.proper.csv",
    "Noun.verbal.csv"
]
filedelimitor = "~/mecab-ipadic-2.7.0-20070801/"
def csv_1(csv_file):
    with codecs.open(filedelimitor+csv_file, "r","euc_jp") as f:
        reader = csv.reader(f)
        csv_words = [k for k in reader]
        csv_words_np = np.array(csv_words)
        return(csv_words_np[:,0].tolist())

words = reduce(lambda x,y:x+y,[csv_1(k) for k in csvs])

topology_dic = {}
jpfont = ImageFont.truetype("/System/Library/Fonts/Hiragino Horn Gothic W4.ttc",500)

def char_topology(char):
    if char in topology_dic:
        return topology_dic[char]
    else:
        img = Image.new("L",(500, 500),"white")
        draw = ImageDraw.Draw(img)
        draw.text((0,0),char,font=jpfont,fill="black")
        img_cv = np.array(img,dtype=np.uint8)
        ret,thresh = cv2.threshold(img_cv,127,255,0)
        contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
        img_cv_con = np.zeros((500,500,3),np.uint8)
        cv2.drawContours(img_cv_con,contours,-1,(0,255,0), 3)
        parent = [k[3] for k in hierarchy[0]]
        topology = [parent.count(k)
                    for k in range(len(parent)) if parent[k]%2 == 0]
        topology_dic[char] = topology
        return topology

def string_topology(string):
    topology = reduce(lambda x,y:x+y,[char_topology(k) for k in string])
    topology.sort()
    return topology



in_topology = string_topology(sys.argv[1])
print(in_topology)
for k in words:
    if in_topology == string_topology(k):
        print(k)

Output result


$ python topology.py tokyo
[0, 0, 0, 1, 4]
Grilled skewers
Lotus stand
Shiritori
Gorenshi
Extraction
Wasan
Beautiful child
To reach
future affairs
Case
coition
Fire car
Fineness
Slush fund
clasp
huge projectile
...

A compound word that is homeomorphic to the first argument is output. In the hand environment, it will be output in about 30 seconds.

Supplement

This time, we have not processed the tofu characters, but it should be done (I think that the ipa dictionary can output without becoming tofu ...). It was pointed out that "times" and "loro" are not homeomorphic, but they should be homeomorphic in terms of continuity. ~~ Maybe. ~~ (I think I lack knowledge of mathematics, so I would appreciate it if an expert would comment)

Recommended Posts

Search for homeomorphic idioms with opencv
Search for OpenCV function names
Search for files with the specified extension
Heat Map for Grid Search with Matplotlib
[Python] Read images with OpenCV (for beginners)
Causal reasoning and causal search with Python (for beginners)
Sequential search with Python
Detect stoop with OpenCV
Binarization with OpenCV / Python
[Boto3] Search for Cognito users with the List Users API
Binary search with python
Database search with db.py
Data Augmentation with openCV
OpenCV for Python beginners
Easy TopView with OpenCV
Stumble with homebrew opencv3
Face recognition with Python's OpenCV
"Apple processing" with OpenCV3 + Python3
Try edge detection with OpenCV
Search list for duplicate elements
Create / search / create table with PynamoDB
Image editing with python OpenCV
Camera capture with Python + OpenCV
Bit full search with Go
[Python] Using OpenCV with Python (Basic)
Search for strings in Python
OpenCV3 installation for Python3 @macOS
Binarize photo data with OpenCV
Search for strings in files
Loop video loading with opencv
Full bit search with Python
I can't search with # google-map. ..
Search numpy.array for consecutive True
Get image features with OpenCV
Face recognition / cutting with OpenCV
Search engine work with python
Search twitter tweets with python
Try OpenCV with Google Colaboratory
Cascade classifier creation with opencv
Streamline web search with python
Using OpenCV with Python @Mac
Image recognition with Keras + OpenCV
Anime face detection with OpenCV
Create a striped illusion with gamma correction for Python3 and openCV3
Make an effector for video conferencing with Spout + OpenCV + dlib (Part 1)