Cut out an image with python

A record of struggling to do something with Python and OCR (Tesseract-OCR is used for OCR)

First, let's read a whole sheet of the object to be read As a result of trying hard to read things like stamps and photos, I do not understand the meaning Even if I can read it, I do not know the data and the break between the data Because of that, it's useless. Therefore, I decided to cut out only the necessary parts and read them.

If you write the following code, you can place it anywhere from the image file You can cut out and save the cut out image with a different name.

   from PIL import Image

#Open image with PIL img_trim = Image.open ('original image file name') #Cut out the specified coordinates img_trim.crop ((x1, y1, x2, y2)). save ('Save name of cropped image')

At this rate, it is necessary to adjust the coordinates many times to cut out the desired part. Let's create an application that acquires coordinates by operating the mouse in order to streamline the work of sneaking coordinates. There are various ways to create a GUI such as C # and VB, but this time I will try Kivy, a mechanism for creating a Python GUI. For the installation work, I should have installed Kivy with pip. .. .. (Details omitted)

UI part definition (main.kv):

#:import hex_color kivy.utils.get_color_from_hex
<ImageWidget>:
    canvas.before:
    Color:
        rgb: 1,1,1
    Rectangle:
        pos: self.pos
        size: self.size
BoxLayout:
    orientation: 'horizontal'
    height: root.height
    width: root.width

    Image:
        id: img
        allow_stretch: True
        source: root.image_src

    BoxLayout:
        size: root.size
        orientation: 'vertical'
        width: 200

        Label:
            id: lbl_file_name
            color: 0, 0, 0, 1
            font_size: 20
            background_color: hex_color('#000000')
        Label:
            id: lbl_result
            color: 0, 0, 0, 1
            font_size: 20

It is written like a simplified version of HTML Then the source of the main body (main.py):

from kivy.app import App

from kivy.core.text import LabelBase, DEFAULT_FONT #Additions from kivy.config import Config from kivy.resources import resource_add_path #addition from kivy.properties import StringProperty from kivy.uix.widget import Widget from kivy.graphics import Line from kivy.graphics import Color from kivy.utils import get_color_from_hex from PIL import Image import math import os import pyocr import pyocr.builders

resource_add_path ('c: / Windows / Fonts') #Addition LabelBase.register (DEFAULT_FONT,'msgothic.ttc') #Addition

Config.set('graphics', 'width', '1224')
Config.set('graphics', 'height', '768')  # 16:9

class ImageWidget(Widget):
    image_src = StringProperty('')

def __init__(self, **kwargs):
    super().__init__(**kwargs)
    self.image_src = 'read_img/0112-3.png'

self.ids.lbl_file_name.text = "filename: \ n {}" .format (self.image_src) self.lines = []

def on_touch_down(self, touch):
    self.x1 = touch.x
    self.y1 = touch.y
    self.x2 = None
    self.y2 = None

def on_touch_move(self, touch):
    img = self.ids.img
    if touch.x > img.width:
        self.x2 = img.width
    else:
        self.x2 = touch.x
    if touch.y > img.height:
        self.y2 = 0
    else:
        self.y2 = touch.y

    for line in self.lines:
        self.canvas.remove(line)
    self.lines = []

    with self.canvas:

#Settings for red line Color(100, 0, 0) touch.ud['line'] = Line(points=[self.x1, self.y1, self.x2, self.y1, self.x2, self.y2, self.x1, self.y2], close='True') self.lines.append(touch.ud['line'])

Settings for making a dashed line

        Color(1, 1, 1)
        touch.ud['line'] = Line(points=[self.x1, self.y1, self.x2, self.y1,
                                        self.x2, self.y2, self.x1, self.y2],
                                dash_offset=5, dash_length=3,
                                close='True')
        self.lines.append(touch.ud['line'])

def on_touch_up(self, touch):

Exit if # touch_move event has not occurred if self.x2 is None: return

#Initialization process: #Get an IMG object img = self.ids.img

Find the size of the resized image:

    vs = img.norm_image_size

#Open image with PIL img_trim = Image.open(self.image_src)

Get the size of the image

    rs = img_trim.size

#Calculate image scale ratio = rs[0] / vs[0]

Find the value of padding applied:

MEMO Assuming center alignment (image object size-display size) / 2

    px = 0
    py = 0
    if img.width > vs[0]:
        px = (img.width - vs[0]) / 2
    if img.height > vs[1]:
        py = (img.height - vs[1]) / 2

Remove padding from IMG objects

    x1 = (self.x1 - px) * ratio
    x2 = (self.x2 - px) * ratio
    y1 = (img.height - self.y1 - py) * ratio
    y2 = (img.height - self.y2 - py) * ratio

Sort the coordinates of the cutout position from small to large

    if x1 < x2:
        real_x1 = math.floor(x1)
        real_x2 = math.ceil(x2)
    else:
        real_x1 = math.floor(x2)
        real_x2 = math.ceil(x1)
    if y1 < y2:
        real_y1 = math.floor(y1)
        real_y2 = math.ceil(y2)
    else:
        real_y1 = math.floor(y2)
        real_y2 = math.ceil(y1)

#Cut out the specified coordinates img_trim.crop((real_x1, real_y1, real_x2, real_y2)).save('write_img/test.png')

Read text from image

    self.read_image_to_string()

def read_image_to_string(self):
    try:

1. Pass the installed Tesseract path

        path_tesseract = r"C:\Program Files\Tesseract-OCR"
        if path_tesseract not in os.environ["PATH"].split(os.pathsep):
            os.environ["PATH"] += os.pathsep + path_tesseract

1. Acquisition of OCR engine

        tools = pyocr.get_available_tools()
        tool = tools[0]

2. Reading the original image

        img = Image.open("write_img/test.png ")

3. OCR execution

        builder = pyocr.builders.TextBuilder(tesseract_layout=6)
        result = tool.image_to_string(img, lang="jpn", builder=builder)

self.ids.lbl_result.text = f "Reading result: \ n {result}" print(result) except Exception as ex: print(ex) self.ids.lbl_result.text = f "Read result: \ nFailure"

class MainApp(App):
    def __init__(self, **kwargs):
        super(MainApp, self).__init__(**kwargs)

self.title ='test'

def build(self):
    return ImageWidget()

if __name__ == '__main__':
    app = MainApp()
    app.run()

Program flow: Get the point you clicked when you clicked Continue drawing the square frame while dragging Get the coordinates of the end when you unclick The image is cut out and OCR is made to read the image.

point: Clicked coordinates on the screen cannot be used even if applied to the original image It is necessary to calculate the actual coordinates in consideration of the contrast. It is necessary to consider that padding is included in the image object on the GUI. It should also be considered that the mouse drag direction surrounds the upper left

As a result of reading OCR, ... It is not very accurate if executed without any adjustment. It seems that we may adjust the parameters and do various things. In the first place, there may be a problem with the performance of OCR. In the future, I would like to select an OCR engine, etc.

Recommended Posts

Cut out an image with python
Cut out face with Python + OpenCV
Image processing with Python
How to crop an image with Python + OpenCV
Create an image with characters in python (Japanese)
Post an article with an image to WordPress with Python
Image processing with Python (Part 2)
Image editing with python OpenCV
Creating an egg with python
Sorting image files with Python (2)
Sorting image files with Python (3)
Image processing with Python (Part 1)
Tweet with image in Python
Sorting image files with Python
Image processing with Python (Part 3)
[Python] Image processing with scikit-image
[Python] Using OpenCV with Python (Image Filtering)
[Python] Using OpenCV with Python (Image transformation)
Create an Excel file with Python3
Image processing with Python 100 knocks # 3 Binarization
I sent an SMS with Python
Let's do image scraping with Python
Find image similarity with Python + OpenCV
Image processing with Python 100 knocks # 2 Grayscale
Draw an illustration with Python + OpenCV
[Python] Send an email with outlook
Send image with python, save with php
Cut out frames from video by 1 second with Python + OpenCV
Gradation image generation with Python [1] | np.linspace
Try to extract a character string from an image with Python3
[Python] I made an image viewer with a simple sorting function.
I tried to make an image similarity function with Python + OpenCV
Basics of binarized image processing with Python
Image processing with Python 100 knock # 10 median filter
[Python] Building an environment with Anaconda [Mac]
Creating an image splitting app with Tkinter
HTML email with image to send with python
Create an image processing viewer with PySimpleGUI
Create a dummy image with Python + PIL.
Image processing with Python 100 knocks # 8 Max pooling
Note when creating an environment with python
Introduction to Python Image Inflating Image inflating with ImageDataGenerator
Quickly create an excel file with Python #python
Cut out A4 print in the image
I tried sending an email with python.
Use cryptography library cryptography with Docker Python image
Image processing with Python & OpenCV [Tone Curve]
Image processing with Python 100 knock # 12 motion filter
Image acquisition from camera with Python + OpenCV
[Python] Quickly create an API with Flask
Drawing with Matrix-Reinventor of Python Image Processing-
Scraping from an authenticated site with python
Easy image processing in Python with Pillow
Create an English word app with python
Image processing with Python 100 knocks # 7 Average pooling
Send an email with Amazon SES + Python
Join an online judge with Python 3.x
Cut out and connect images with ImageMagick
Try to generate an image with aliasing
Light image processing with Python x OpenCV
Let's develop an investment algorithm with Python 1