[PYTHON] I tried to extract characters from subtitles (OpenCV: tesseract-ocr edition)

Introduction

Here, I will try to extract the characters from the subtitles displayed under the political broadcast. Since there is no background, it seems quite so with binarization.

It is possible to get the character and position with considerable accuracy by extracting the character with the google cloud vision API, but here I will try to get the character by other methods.

tesseract-ocr / pyocr

First, try character recognition using tesseract and pyocr.

This is the source image. sentences.png

Extract the characters and positions with the script below.

import sys

import pyocr
import pyocr.builders

import cv2
from PIL import Image

def imageToText(src):
	tools = pyocr.get_available_tools()
	if len(tools) == 0:
		print("No OCR tool found")
		sys.exit(1)

	tool = tools[0]

	dst = tool.image_to_string(
		Image.open(src),
		lang='jpn',
		builder=pyocr.builders.WordBoxBuilder(tesseract_layout=6)
	)
	return dst	

if __name__ == '__main__':
	img_path = sys.argv[1]

	out = imageToText(img_path)

	img = cv2.imread(img_path)
	sentence = []
	for d in out:
		sentence.append(d.content)
		cv2.rectangle(img, d.position[0], d.position[1], (0, 0, 255), 2)

	print("".join(sentence).replace("。","。\n"))

	cv2.imshow("img", img)
	cv2.imwrite("output.png ", img)
	cv2.waitKey(0)
	cv2.destroyAllWindows()

Article 25 All citizens have the right to live a healthy and culturally minimal life.
The two countries must endeavor to improve and promote social welfare, social security and public health in all aspects of life.
(Right to education and agenda]Article 26 All citizens have the right to equal education according to their abilities, as provided for by law.
2 All citizens are obliged to have their children receive general education as required by law.
Compulsory education is free of charge.
[Rights and obligations of work, standards of working conditions and prohibition of child abuse] Article 27 All citizens have the right to work and are obliged to do so.
2 Standards for wages, working hours, rest and other working conditions are stipulated by law.
3 Children must not use this.
Workers' right to organize and collective bargaining] Article 28 The right to collective workers and the right to collective bargaining and other collective actions shall be guaranteed.
Property rights] Article 29 Property rights must not be infringed.
2 The content of property rights shall be stipulated by law so as to conform to the public welfare.
3 Private property may be used for the public with just compensation.

--Character position output.png

In the image of only characters obtained from word or html, the characters themselves can be obtained, but the exact position of the characters seems to be difficult to obtain. What I want here is the position in sentence units, but even if I adjust it with the parameter tesseract_layout = 6, it seems that I can only get it in character units.

Method

output.png

range.png

I tried linear extraction by binarization and Hough transform, but I would like to apply OCR by ROI (cutting out a part of the image) only the part where subtitles are likely to appear once.

I wondered if I could extract only the gray subtitles in the area, but it's difficult for me to know what I'm likely to do because I'm suffering from people: scream:

development of

import sys

import cv2
import os
import numpy as np

import pyocr
import pyocr.builders

from PIL import Image, ImageDraw, ImageFont

import time

def process(src):
	kernel = np.ones((3,3),np.uint8)
	gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)

	o_ret, o_dst = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
	dst = cv2.morphologyEx(o_dst, cv2.MORPH_OPEN, kernel)
	return cv2.bitwise_not(dst)

def imageToText(tool, src):
	tmp_path = "temp.png "

	cv2.imwrite(tmp_path, src)
	dst = tool.image_to_string(
		Image.open(tmp_path),
		lang='jpn',
		builder=pyocr.builders.WordBoxBuilder(tesseract_layout=6)
	)

	sentence = []
	for item in dst:
		sentence.append(item.content)

	return "".join(sentence)


def createTextImage(src, sentence, px, py, color=(8,8,8), fsize=28):

	tmp_path = "src_temp.png "
	cv2.imwrite(tmp_path, src)

	img = Image.open(tmp_path)
	draw = ImageDraw.Draw(img)

	font = ImageFont.truetype("./IPAfont00303/ipag.ttf", fsize)
	draw.text((px, py), sentence, fill=color, font=font)
	img.save(tmp_path)
	return cv2.imread(tmp_path)



if __name__ == '__main__':

	tools = pyocr.get_available_tools()
	if len(tools) == 0:
		print("No OCR tool found")
		sys.exit(1)

	tool = tools[0]

	cap = cv2.VideoCapture('one_minutes.mp4')

	cap_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
	cap_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
	fps = cap.get(cv2.CAP_PROP_FPS)

	telop_height = 50

	fourcc = cv2.VideoWriter_fourcc('m','p','4','v')
	writer = cv2.VideoWriter('extract_telop_text.mp4',fourcc, fps, (cap_width, cap_height + telop_height))

	start = time.time()
	count = 0
	try :
		while True:
			if not cap.isOpened():
				break

			if cv2.waitKey(1) & 0xFF == ord('q'):
				break

			ret, frame = cap.read()

			if frame is None:
				break

			telop = np.zeros((telop_height, cap_width, 3), np.uint8)
			telop[:] = tuple((128,128,128))

			gray_frame = process(frame)
			roi = gray_frame[435:600, :]
			txt = imageToText(tool, roi)

			images = [frame, telop]

			frame = np.concatenate(images, axis=0)
			font = cv2.FONT_HERSHEY_SIMPLEX

			seconds = round(count/fps, 4)

			cv2.putText(frame, "{:.4f} [sec]".format(seconds), 
						(cap_width - 250, cap_height + telop_height - 10), 
						font, 
						1, 
						(0, 0, 255), 
						2, 
						cv2.LINE_AA)

			writer.write(createTextImage(frame, txt, 20, cap_height + 10))
			count += 1

			print("{}[sec]".format(seconds))

	except cv2.error as e:
		print(e)	

	writer.release()
	cap.release()

	print("Done!!! {}[sec]".format(round(time.time() - start,4)))

Supplement

――I use PIL instead of openCV to write Japanese characters, but when I pass the data, I temporarily save the image file. Because of that, it took more than 10 minutes to generate the video: sweat_smile: Is there any good way: disappointed_relieved:

Example)


tmp_path = "src_temp.png "
#Output the image data used in openCV
cv2.imwrite(tmp_path, src)

#Read data with PIL
img = Image.open(tmp_path)

--The font uses IPA font. https://ipafont.ipa.go.jp/IPAfont/IPAfont00303.zip

--The flow before character recognition is as follows.

  1. Make the image black and white
  2. Binarization processing by Otsu formula
  3. Noise removal by opening process (shrinkage-> enlargement)
  4. Invert the image
def process(src):
	kernel = np.ones((3,3),np.uint8)
	gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)

	o_ret, o_dst = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
	dst = cv2.morphologyEx(o_dst, cv2.MORPH_OPEN, kernel)
	return cv2.bitwise_not(dst)

result

extract_telop_text.gif

Kampe interferes with character recognition a little, but I think it can be read to some extent.

in conclusion

Next, let's recognize characters using the Google Cloud Vision API. I tried Demo from https://cloud.google.com/vision/, but the accuracy is still high. demo.png

Helpful link

-[Python] Read the expiration date with OCR (tesseract-ocr / pyocr) (image → sequence) [Home IT # 19] -Convert image data to text with pyOCR in Mac environment -Put Japanese characters in the image with Python.

Recommended Posts

I tried to extract characters from subtitles (OpenCV: tesseract-ocr edition)
I tried to extract characters from subtitles (OpenCV: Google Cloud Vision API)
I tried to extract features with SIFT of OpenCV
I tried to extract players and skill names from sports articles
I tried to summarize Ansible modules-Linux edition
I tried to detect motion quickly with OpenCV
I tried adding post-increment to CPython Extra edition
I tried to extract a line art from an image with Deep Learning
I tried to execute Python code from .Net using Pythonnet (Hallo World edition)
I tried to debug.
I tried to paste
I tried to detect the iris from the camera image
I tried to become an Ann Man using OpenCV
[Deep Learning from scratch] I tried to explain Dropout
I tried to extract various information of remote PC from Python by WMI Library
I tried to create API list.csv in Python from swagger.yaml
I tried to learn PredNet
I tried to organize SVM.
I tried to implement PCANet
I tried face recognition from the video (OpenCV: python version)
I tried to reintroduce Linux
I tried to introduce Pylint
I tried to summarize SparseMatrix
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I tried to translate English subtitles into Japanese with Udemy
I tried to touch jupyter
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
I tried to implement StarGAN (1)
I tried to get various information from the codeforces API
I tried to implement SSD with PyTorch now (model edition)
I tried to get data from AS / 400 quickly using pypyodbc
I tried to solve the ant book beginner's edition with python
I tried to make Kana's handwriting recognition Part 1/3 First from MNIST
I tried to process the image in "sketch style" with OpenCV
I tried to digitize the stamp stamped on paper using OpenCV
I tried to display the video playback time (OpenCV: Python version)
I tried to process the image in "pencil style" with OpenCV
I tried to cut out a still image from the video
I tried to get data from AS / 400 quickly using pypyodbc Preparation 1
[Python] Try to recognize characters from images with OpenCV and pyocr
I tried to make an image similarity function with Python + OpenCV
I tried to implement Deep VQE
I tried to create Quip API
I tried to touch Python (installation)
I tried trimming efficiently with OpenCV
I tried using GrabCut of OpenCV
I tried to implement adversarial validation
I tried to explain Pytorch dataset
I tried Watson Speech to Text
I tried to touch Tesla's API
I tried to implement hierarchical clustering
I tried task queuing from Celery
I tried to organize about MCMC.
I tried to implement Realness GAN
I tried to move the ball
I tried face recognition with OpenCV
I tried to estimate the interval.
I tried to execute SQL from the local environment using Looker SDK
[Data science basics] I tried saving from csv to mysql with python
[Deep Learning from scratch] I tried to implement sigmoid layer and Relu layer.
Mayungo's Python Learning Episode 2: I tried to put out characters with variables