1.First of all 2. Demo 3. Get the string with OCR 4. Take a screen capture of Windows 5. Create GUI with wxPython 6. Make an exe with pyinstaller 7. Finally
I thought it would be a hassle to press the "skip ads" button while watching youtube, so I wondered if I could make a tool myself. I came up with various methods, but I thought it would be easier to understand the implementation method that is closer to human movement. I decided to make a tool that recognizes characters with OCR and clicks the specified part.
As a result, although there are still many performance issues, we were able to create the tool that we had roughly envisioned. Make a note of the know-how you learned when creating it. I hope this article helps someone.
The exe file format. It will automatically click the ad skip link on youtube.
chunta auto click (executable)
OCR processing was realized by using software called tesseract from python. In order to use tesseract from python on Windows, pip install is not good, I had to download the module and put it in the path. Specifically, it is as follows.
#Tesseract in the environment variable PATH(OCR tool)Pass through
os.environ["PATH"] += os.pathsep + os.path.dirname(os.path.abspath(__file__)) + os.sep + RESORSES_FOLDER_NAME
This part is the part to pass the path to use tesseract. I put the path to the RESORSES_FOLDER_NAME folder of the directory where the script is located so that it will work without problems when it is converted to an exe. Store the tesseract module in this RESORSES_FOLDER_NAME folder.
# =========================
#OCR processing
# =========================
tools = pyocr.get_available_tools()
if len(tools) == 0:
wx.MessageBox('OCR tool is not installed on the terminal.\n The OCR tool is not installed on the terminal.', 'Error error')
sys.exit(1)
tool = tools[0]
dst = tool.image_to_string(
cap,
lang='jpn',
builder=pyocr.builders.WordBoxBuilder(tesseract_layout=6)
)
This is the part where the text is actually obtained from the image with tesseract. It seems that various parameters can be specified, but it was said that this parameter is good for recognizing Japanese. I also tried a bit and ended up with this parameter.
I used win32api to get a screen capture on windows. I couldn't install it with pip install, so I had to install win32api separately.
Also, I had to do some special processing to support multiple displays. The source code of the following person was helpful. (Rather, I used it almost as it is)
#==================================================================
#Desktop capture acquisition (multi-monitor compatible)
#The code of the reference URL is used almost as it is.
#
# ref https://se.yuttar-ixm.com/multi-monitor-cap/
#==================================================================
def get_capture(flag_gray: bool = True):
try:
#Get the entire desktop size
vscreenwidth = win32api.GetSystemMetrics(win32con.SM_CXVIRTUALSCREEN)
vscreenheigth = win32api.GetSystemMetrics(win32con.SM_CYVIRTUALSCREEN)
vscreenx = win32api.GetSystemMetrics(win32con.SM_XVIRTUALSCREEN)
vscreeny = win32api.GetSystemMetrics(win32con.SM_YVIRTUALSCREEN)
width = vscreenx + vscreenwidth
height = vscreeny + vscreenheigth
#Get desktop device context
hwnd = win32gui.GetDesktopWindow()
windc = win32gui.GetWindowDC(hwnd)
srcdc = win32ui.CreateDCFromHandle(windc)
memdc = srcdc.CreateCompatibleDC()
#Pixel information copy from device context, bmp conversion
bmp = win32ui.CreateBitmap()
bmp.CreateCompatibleBitmap(srcdc, width, height)
memdc.SelectObject(bmp)
memdc.BitBlt((0, 0), (width, height), srcdc, (0, 0), win32con.SRCCOPY)
#Image acquisition / adjustment
img = np.frombuffer(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
if flag_gray is True :
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#release
srcdc.DeleteDC()
memdc.DeleteDC()
win32gui.ReleaseDC(hwnd, windc)
win32gui.DeleteObject(bmp.GetHandle())
return img
except Exception as err:
#Acquisition failure
return None
Originally I was thinking of creating a GUI, but This time, apart from convenience and usability, it became essential to create a GUI. The reason is that the OCR processing by tesseract was found to take about 5 to 10 seconds for a large image such as a full screen capture. Therefore, I decided to ask the user to specify the range to capture with the GUI.
# =========================
#Processing when the window setting button is clicked
# =========================
def onclick_window_btn(self, event):
#Get full screen capture image of specified size
self.img = get_capture_img(CAPTURE_IMG_WIDTH)
#Image for display
self.img_copy = None
#window name setting
cv2.namedWindow(winname='img')
#Mouse event settings
cv2.setMouseCallback('img', self.draw_rectangle)
#Image display
cv2.imshow('img', self.img)
wx.MessageBox('Choose About Where Your Ads Appear.\n Select the location where your ad will appear about', 'Select advertising area Select advertising area')
# ==================================================
#Draw a rectangle
# ==================================================
def draw_rectangle(self, event, x, y, flags, param):
if event == cv2.EVENT_LBUTTONDOWN:
self.flg_drawing = True
self.ix, self.iy = x, y
elif event == cv2.EVENT_MOUSEMOVE:
if self.flg_drawing == True:
self.img_copy = self.img.copy()
self.img_copy = cv2.rectangle(self.img_copy, (self.ix, self.iy), (x, y), (0, 0, 255), -1)
cv2.imshow('img', self.img_copy)
elif event == cv2.EVENT_LBUTTONUP:
self.flg_drawing = False
self.img_copy = cv2.rectangle(self.img_copy, (self.ix, self.iy), (x, y), (0, 0, 255), -1)
cv2.imshow('img', self.img_copy)
if event == cv2.EVENT_LBUTTONUP:
global setting_value
#Increases toward the right
if self.ix < x:
left = self.ix
right = x
else:
left = x
right = self.ix
#Increases as you go down
if self.iy < y:
bottom = y
top = self.iy
else:
bottom = self.iy
top = y
setting_value.top = top
setting_value.bottom = bottom
setting_value.left = left
setting_value.right = right
It is a process to display a screen capture and draw a rectangle on the image with the mouse. Get the coordinates of the finally drawn rectangle and use this as the capture range.
# ==================================================
# auto_click thread
# ==================================================
class auto_click_Thread(threading.Thread):
# =========================
#constructor
# =========================
def __init__(self):
super(auto_click_Thread, self).__init__()
#Daemonized to terminate the thread when the caller terminates
self.setDaemon(True)
# =========================
#stop processing
# =========================
def stop(self):
global setting_value
setting_value.flg_stop = True
# =========================
#Execution processing
# =========================
def run(self):
global setting_value
setting_value.flg_stop = False
txt = setting_value.txt
min_interval_time = setting_value.min_interval_time
top = int( setting_value.top * (1 / setting_value.rate) )
bottom = int( setting_value.bottom * (1 / setting_value.rate) )
left = int( setting_value.left * (1 / setting_value.rate) )
right = int( setting_value.right * (1 / setting_value.rate) )
click_text(txt, min_interval_time, top, bottom, left, right)
# ==================================================
#start
# ==================================================
def start():
global exec_thread
global setting_value
exec_thread = auto_click_Thread()
exec_thread.start()
# ==================================================
#stop
# ==================================================
def stop():
global exec_thread
exec_thread.stop()
This is the part that calls OCR processing from the GUI. In addition to thread kicking the GUI so it doesn't get busy It is daemonized so that if the parent is killed, the child will also be killed. In other words, OCR processing does not run in the background when the GUI is closed.
Also, it is difficult to kill a thread directly, so Stop processing is implemented by changing the value of the global variable referenced by OCR processing.
Normal exe conversion only requires the user to install tesseract. This is really troublesome, so I made it available without installing it.
#Tesseract in the environment variable PATH(OCR tool)Pass through
os.environ["PATH"] += os.pathsep + os.path.dirname(os.path.abspath(__file__)) + os.sep + RESORSES_FOLDER_NAME
This overlaps with the above part, but it is the process of passing the path to the RESORSES_FOLDER_NAME folder on the same folder as the script. After converting to exe, create a RESORSES_FOLDER_NAME folder on the exe file folder and store the tesseract module in this folder.
As an aside, I also tried exe conversion of one file with --onefile of pyinstaller, but I quit because it took more than 1 minute to start.
pyinstaller --clean --icon=chunta_auto_click.ico -n chunta_auto_click chunta_auto_click.py --noconsole
The above is the command when it is converted to exe.
There are still many performance challenges, but I was able to create the tool I had in mind. I was keenly aware that it was a good time for people like me to easily use the OCR library by collecting information online. I hope this article helps someone.
chunta auto click (executable)
Recommended Posts