Create youtube ad auto skip tool with python and OCR

Lottery

1.First of all 2. Demo 3. Get the string with OCR 4. Take a screen capture of Windows 5. Create GUI with wxPython 6. Make an exe with pyinstaller 7. Finally

Introduction

I thought it would be a hassle to press the "skip ads" button while watching youtube, so I wondered if I could make a tool myself. I came up with various methods, but I thought it would be easier to understand the implementation method that is closer to human movement. I decided to make a tool that recognizes characters with OCR and clicks the specified part.

As a result, although there are still many performance issues, we were able to create the tool that we had roughly envisioned. Make a note of the know-how you learned when creating it. I hope this article helps someone.

Source code

demo

The exe file format. It will automatically click the ad skip link on youtube.

chunta auto click (executable)

sample2.jpg

Get string with OCR

OCR processing was realized by using software called tesseract from python. In order to use tesseract from python on Windows, pip install is not good, I had to download the module and put it in the path. Specifically, it is as follows.

    #Tesseract in the environment variable PATH(OCR tool)Pass through
    os.environ["PATH"] += os.pathsep + os.path.dirname(os.path.abspath(__file__)) + os.sep + RESORSES_FOLDER_NAME

This part is the part to pass the path to use tesseract. I put the path to the RESORSES_FOLDER_NAME folder of the directory where the script is located so that it will work without problems when it is converted to an exe. Store the tesseract module in this RESORSES_FOLDER_NAME folder.

    # =========================
    #OCR processing
    # =========================
    tools = pyocr.get_available_tools()
    
    if len(tools) == 0:
        wx.MessageBox('OCR tool is not installed on the terminal.\n The OCR tool is not installed on the terminal.', 'Error error')
        sys.exit(1)
    
    tool = tools[0]
    
    dst = tool.image_to_string(
        cap,
        lang='jpn',
        builder=pyocr.builders.WordBoxBuilder(tesseract_layout=6)
    )

This is the part where the text is actually obtained from the image with tesseract. It seems that various parameters can be specified, but it was said that this parameter is good for recognizing Japanese. I also tried a bit and ended up with this parameter.

Take a screen capture of Windows

I used win32api to get a screen capture on windows. I couldn't install it with pip install, so I had to install win32api separately.

Also, I had to do some special processing to support multiple displays. The source code of the following person was helpful. (Rather, I used it almost as it is)

Business efficiency improvement with python / RPA self-made? Examination part 5 How to acquire the entire screen in a multi-monitor environment

#==================================================================
#Desktop capture acquisition (multi-monitor compatible)
#The code of the reference URL is used almost as it is.
# 
# ref https://se.yuttar-ixm.com/multi-monitor-cap/
#==================================================================
def get_capture(flag_gray: bool = True):
    try:
        #Get the entire desktop size
        vscreenwidth = win32api.GetSystemMetrics(win32con.SM_CXVIRTUALSCREEN)
        vscreenheigth = win32api.GetSystemMetrics(win32con.SM_CYVIRTUALSCREEN)
        vscreenx = win32api.GetSystemMetrics(win32con.SM_XVIRTUALSCREEN)
        vscreeny = win32api.GetSystemMetrics(win32con.SM_YVIRTUALSCREEN)
        width = vscreenx + vscreenwidth
        height = vscreeny + vscreenheigth
    
        #Get desktop device context
        hwnd = win32gui.GetDesktopWindow() 
        windc = win32gui.GetWindowDC(hwnd)
        srcdc = win32ui.CreateDCFromHandle(windc)
        memdc = srcdc.CreateCompatibleDC()

        #Pixel information copy from device context, bmp conversion
        bmp = win32ui.CreateBitmap()
        bmp.CreateCompatibleBitmap(srcdc, width, height)
        memdc.SelectObject(bmp)
        memdc.BitBlt((0, 0), (width, height), srcdc, (0, 0), win32con.SRCCOPY)

        #Image acquisition / adjustment
        img = np.frombuffer(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
        if flag_gray is True :
            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        #release
        srcdc.DeleteDC()
        memdc.DeleteDC()
        win32gui.ReleaseDC(hwnd, windc)
        win32gui.DeleteObject(bmp.GetHandle())

        return img
    
    except Exception as err:
        #Acquisition failure
        return None

GUI creation with wxPython

Originally I was thinking of creating a GUI, but This time, apart from convenience and usability, it became essential to create a GUI. The reason is that the OCR processing by tesseract was found to take about 5 to 10 seconds for a large image such as a full screen capture. Therefore, I decided to ask the user to specify the range to capture with the GUI.

    # =========================
    #Processing when the window setting button is clicked
    # =========================
    def onclick_window_btn(self, event):
        #Get full screen capture image of specified size
        self.img = get_capture_img(CAPTURE_IMG_WIDTH)
        
        #Image for display
        self.img_copy = None
        
        #window name setting
        cv2.namedWindow(winname='img')
        
        #Mouse event settings
        cv2.setMouseCallback('img', self.draw_rectangle)
        
        #Image display
        cv2.imshow('img', self.img)
        
        wx.MessageBox('Choose About Where Your Ads Appear.\n Select the location where your ad will appear about', 'Select advertising area Select advertising area')
    
    # ==================================================
    #Draw a rectangle
    # ==================================================
    def draw_rectangle(self, event, x, y, flags, param):
        if event == cv2.EVENT_LBUTTONDOWN:
            self.flg_drawing = True
            self.ix, self.iy = x, y
        
        elif event == cv2.EVENT_MOUSEMOVE:
            if self.flg_drawing == True:
                self.img_copy = self.img.copy()
                self.img_copy = cv2.rectangle(self.img_copy, (self.ix, self.iy), (x, y), (0, 0, 255), -1)
                cv2.imshow('img', self.img_copy)
        
        elif event == cv2.EVENT_LBUTTONUP:
            self.flg_drawing = False
            self.img_copy = cv2.rectangle(self.img_copy, (self.ix, self.iy), (x, y), (0, 0, 255), -1)
            cv2.imshow('img', self.img_copy)
        
        if event == cv2.EVENT_LBUTTONUP:
            global setting_value
            
            #Increases toward the right
            if self.ix < x:
                left = self.ix
                right = x
            else:
                left = x
                right = self.ix
            
            #Increases as you go down
            if self.iy < y:
                bottom = y
                top = self.iy
            else:
                bottom = self.iy
                top = y
            
            setting_value.top = top
            setting_value.bottom = bottom
            setting_value.left = left
            setting_value.right = right

It is a process to display a screen capture and draw a rectangle on the image with the mouse. Get the coordinates of the finally drawn rectangle and use this as the capture range.

# ==================================================
# auto_click thread
# ==================================================
class auto_click_Thread(threading.Thread):
    # =========================
    #constructor
    # =========================
    def __init__(self):
        super(auto_click_Thread, self).__init__()
        
        #Daemonized to terminate the thread when the caller terminates
        self.setDaemon(True)
    
    # =========================
    #stop processing
    # =========================
    def stop(self):
        global setting_value
        
        setting_value.flg_stop = True
    
    # =========================
    #Execution processing
    # =========================
    def run(self):
        global setting_value
        
        setting_value.flg_stop = False
        
        txt = setting_value.txt
        min_interval_time = setting_value.min_interval_time
        top = int( setting_value.top * (1 / setting_value.rate) )
        bottom = int( setting_value.bottom * (1 / setting_value.rate) )
        left = int( setting_value.left * (1 / setting_value.rate) )
        right = int( setting_value.right * (1 / setting_value.rate) )
        
        click_text(txt, min_interval_time, top, bottom, left, right)

# ==================================================
#start
# ==================================================
def start():
    global exec_thread
    global setting_value
    
    exec_thread = auto_click_Thread()
    exec_thread.start()

# ==================================================
#stop
# ==================================================
def stop():
    global exec_thread
    
    exec_thread.stop()

This is the part that calls OCR processing from the GUI. In addition to thread kicking the GUI so it doesn't get busy It is daemonized so that if the parent is killed, the child will also be killed. In other words, OCR processing does not run in the background when the GUI is closed.

Also, it is difficult to kill a thread directly, so Stop processing is implemented by changing the value of the global variable referenced by OCR processing.

Make exe with pyinstaller

Normal exe conversion only requires the user to install tesseract. This is really troublesome, so I made it available without installing it.

    #Tesseract in the environment variable PATH(OCR tool)Pass through
    os.environ["PATH"] += os.pathsep + os.path.dirname(os.path.abspath(__file__)) + os.sep + RESORSES_FOLDER_NAME

This overlaps with the above part, but it is the process of passing the path to the RESORSES_FOLDER_NAME folder on the same folder as the script. After converting to exe, create a RESORSES_FOLDER_NAME folder on the exe file folder and store the tesseract module in this folder.

As an aside, I also tried exe conversion of one file with --onefile of pyinstaller, but I quit because it took more than 1 minute to start.

pyinstaller --clean --icon=chunta_auto_click.ico -n chunta_auto_click chunta_auto_click.py --noconsole

The above is the command when it is converted to exe.

Finally

There are still many performance challenges, but I was able to create the tool I had in mind. I was keenly aware that it was a good time for people like me to easily use the OCR library by collecting information online. I hope this article helps someone.

Source code

chunta auto click (executable)

Recommended Posts

Create youtube ad auto skip tool with python and OCR
Create and decrypt Caesar cipher with python
[Python] Python and security-② Port scanning tool made with Python
Automatically search and download YouTube videos with Python
Get comments on youtube Live with [python] and [pytchat]!
Programming with Python and Tkinter
Encryption and decryption with Python
Create 3d gif with python3
Get Youtube data with python
YouTube video management with Python 3
Create an LCD (16x2) game with Raspberry Pi and Python
Let's create a PRML diagram with Python, Numpy and matplotlib.
Create a simple video analysis tool with python wxpython + openCV
python with pyenv and venv
Create a directory with python
Works with Python and R
Create AtCoder Contest appointments on Google Calendar with Python and GAS
Create a C ++ and Python execution environment with WSL2 + Docker + VSCode
Create a simple Python development environment with VS Code and Docker
Communicate with FX-5204PS with Python and PyUSB
Shining life with Python and OpenCV
Create plot animation with Python + Matplotlib
Robot running with Arduino and python
Install Python 2.7.9 and Python 3.4.x with pip.
Create Awaitable with Python / C API
AM modulation and demodulation with python
[Python] font family and font with matplotlib
Create folders from '01' to '12' with python
Scraping with Python, Selenium and Chromedriver
Scraping with Python and Beautiful Soup
Create a virtual environment with Python!
Create an Excel file with Python3
Hadoop introduction and MapReduce with Python
[GUI with Python] PyQt5-Drag and drop-
Reading and writing NetCDF with Python
I played with PyQt5 and Python3
Reading and writing CSV with Python
Multiple integrals with Python and Sympy
Coexistence of Python2 and 3 with CircleCI (1.0)
Easy modeling with Blender and Python
Sugoroku game and addition game with python
FM modulation and demodulation with Python
Create and read messagepacks in Python
[AWS] Create a Python Lambda environment with CodeStar and do Hello World
Create your own virtual camera with Python + OpenCV and apply original effects
Create and edit spreadsheets in any folder on Google Drive with python
Create a Python3 environment with pyenv on Mac and display a NetworkX graph
Create a decision tree from 0 with Python and understand it (5. Information Entropy)
Create "operation log" CSV formatting tool in 5 days with Python Pandas PyInstaller
Communicate between Elixir and Python with gRPC
Data pipeline construction with Python and Luigi
Create a Python function decorator with Class
Monitor Mojo outages with Python and Skype
Automatically create Python API documentation with Sphinx
Create wordcloud from your tweet with python3
Build a blockchain with Python ① Create a class
FM modulation and demodulation with Python Part 3
[Automation] Manipulate mouse and keyboard with Python
Create a dummy image with Python + PIL.
Passwordless authentication with RDS and IAM (Python)
Python installation and package management with pip