Create youtube ad auto skip tool with python and OCR

Lottery

1.First of all 2. Demo 3. Get the string with OCR 4. Take a screen capture of Windows 5. Create GUI with wxPython 6. Make an exe with pyinstaller 7. Finally

Introduction

I thought it would be a hassle to press the "skip ads" button while watching youtube, so I wondered if I could make a tool myself. I came up with various methods, but I thought it would be easier to understand the implementation method that is closer to human movement. I decided to make a tool that recognizes characters with OCR and clicks the specified part.

As a result, although there are still many performance issues, we were able to create the tool that we had roughly envisioned. Make a note of the know-how you learned when creating it. I hope this article helps someone.

Source code

demo

The exe file format. It will automatically click the ad skip link on youtube.

chunta auto click (executable)

Get string with OCR

OCR processing was realized by using software called tesseract from python. In order to use tesseract from python on Windows, pip install is not good, I had to download the module and put it in the path. Specifically, it is as follows.

    #Tesseract in the environment variable PATH(OCR tool)Pass through
    os.environ["PATH"] += os.pathsep + os.path.dirname(os.path.abspath(__file__)) + os.sep + RESORSES_FOLDER_NAME

This part is the part to pass the path to use tesseract. I put the path to the RESORSES_FOLDER_NAME folder of the directory where the script is located so that it will work without problems when it is converted to an exe. Store the tesseract module in this RESORSES_FOLDER_NAME folder.

    # =========================
    #OCR processing
    # =========================
    tools = pyocr.get_available_tools()
    
    if len(tools) == 0:
        wx.MessageBox('OCR tool is not installed on the terminal.\n The OCR tool is not installed on the terminal.', 'Error error')
        sys.exit(1)
    
    tool = tools[0]
    
    dst = tool.image_to_string(
        cap,
        lang='jpn',
        builder=pyocr.builders.WordBoxBuilder(tesseract_layout=6)
    )

This is the part where the text is actually obtained from the image with tesseract. It seems that various parameters can be specified, but it was said that this parameter is good for recognizing Japanese. I also tried a bit and ended up with this parameter.

Take a screen capture of Windows

I used win32api to get a screen capture on windows. I couldn't install it with pip install, so I had to install win32api separately.

Also, I had to do some special processing to support multiple displays. The source code of the following person was helpful. (Rather, I used it almost as it is)

Business efficiency improvement with python / RPA self-made? Examination part 5 How to acquire the entire screen in a multi-monitor environment

#==================================================================
#Desktop capture acquisition (multi-monitor compatible)
#The code of the reference URL is used almost as it is.
# 
# ref https://se.yuttar-ixm.com/multi-monitor-cap/
#==================================================================
def get_capture(flag_gray: bool = True):
    try:
        #Get the entire desktop size
        vscreenwidth = win32api.GetSystemMetrics(win32con.SM_CXVIRTUALSCREEN)
        vscreenheigth = win32api.GetSystemMetrics(win32con.SM_CYVIRTUALSCREEN)
        vscreenx = win32api.GetSystemMetrics(win32con.SM_XVIRTUALSCREEN)
        vscreeny = win32api.GetSystemMetrics(win32con.SM_YVIRTUALSCREEN)
        width = vscreenx + vscreenwidth
        height = vscreeny + vscreenheigth
    
        #Get desktop device context
        hwnd = win32gui.GetDesktopWindow() 
        windc = win32gui.GetWindowDC(hwnd)
        srcdc = win32ui.CreateDCFromHandle(windc)
        memdc = srcdc.CreateCompatibleDC()

        #Pixel information copy from device context, bmp conversion
        bmp = win32ui.CreateBitmap()
        bmp.CreateCompatibleBitmap(srcdc, width, height)
        memdc.SelectObject(bmp)
        memdc.BitBlt((0, 0), (width, height), srcdc, (0, 0), win32con.SRCCOPY)

        #Image acquisition / adjustment
        img = np.frombuffer(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
        if flag_gray is True :
            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        #release
        srcdc.DeleteDC()
        memdc.DeleteDC()
        win32gui.ReleaseDC(hwnd, windc)
        win32gui.DeleteObject(bmp.GetHandle())

        return img
    
    except Exception as err:
        #Acquisition failure
        return None

GUI creation with wxPython

Originally I was thinking of creating a GUI, but This time, apart from convenience and usability, it became essential to create a GUI. The reason is that the OCR processing by tesseract was found to take about 5 to 10 seconds for a large image such as a full screen capture. Therefore, I decided to ask the user to specify the range to capture with the GUI.

    # =========================
    #Processing when the window setting button is clicked
    # =========================
    def onclick_window_btn(self, event):
        #Get full screen capture image of specified size
        self.img = get_capture_img(CAPTURE_IMG_WIDTH)
        
        #Image for display
        self.img_copy = None
        
        #window name setting
        cv2.namedWindow(winname='img')
        
        #Mouse event settings
        cv2.setMouseCallback('img', self.draw_rectangle)
        
        #Image display
        cv2.imshow('img', self.img)
        
        wx.MessageBox('Choose About Where Your Ads Appear.\n Select the location where your ad will appear about', 'Select advertising area Select advertising area')
    
    # ==================================================
    #Draw a rectangle
    # ==================================================
    def draw_rectangle(self, event, x, y, flags, param):
        if event == cv2.EVENT_LBUTTONDOWN:
            self.flg_drawing = True
            self.ix, self.iy = x, y
        
        elif event == cv2.EVENT_MOUSEMOVE:
            if self.flg_drawing == True:
                self.img_copy = self.img.copy()
                self.img_copy = cv2.rectangle(self.img_copy, (self.ix, self.iy), (x, y), (0, 0, 255), -1)
                cv2.imshow('img', self.img_copy)
        
        elif event == cv2.EVENT_LBUTTONUP:
            self.flg_drawing = False
            self.img_copy = cv2.rectangle(self.img_copy, (self.ix, self.iy), (x, y), (0, 0, 255), -1)
            cv2.imshow('img', self.img_copy)
        
        if event == cv2.EVENT_LBUTTONUP:
            global setting_value
            
            #Increases toward the right
            if self.ix < x:
                left = self.ix
                right = x
            else:
                left = x
                right = self.ix
            
            #Increases as you go down
            if self.iy < y:
                bottom = y
                top = self.iy
            else:
                bottom = self.iy
                top = y
            
            setting_value.top = top
            setting_value.bottom = bottom
            setting_value.left = left
            setting_value.right = right

It is a process to display a screen capture and draw a rectangle on the image with the mouse. Get the coordinates of the finally drawn rectangle and use this as the capture range.

# ==================================================
# auto_click thread
# ==================================================
class auto_click_Thread(threading.Thread):
    # =========================
    #constructor
    # =========================
    def __init__(self):
        super(auto_click_Thread, self).__init__()
        
        #Daemonized to terminate the thread when the caller terminates
        self.setDaemon(True)
    
    # =========================
    #stop processing
    # =========================
    def stop(self):
        global setting_value
        
        setting_value.flg_stop = True
    
    # =========================
    #Execution processing
    # =========================
    def run(self):
        global setting_value
        
        setting_value.flg_stop = False
        
        txt = setting_value.txt
        min_interval_time = setting_value.min_interval_time
        top = int( setting_value.top * (1 / setting_value.rate) )
        bottom = int( setting_value.bottom * (1 / setting_value.rate) )
        left = int( setting_value.left * (1 / setting_value.rate) )
        right = int( setting_value.right * (1 / setting_value.rate) )
        
        click_text(txt, min_interval_time, top, bottom, left, right)

# ==================================================
#start
# ==================================================
def start():
    global exec_thread
    global setting_value
    
    exec_thread = auto_click_Thread()
    exec_thread.start()

# ==================================================
#stop
# ==================================================
def stop():
    global exec_thread
    
    exec_thread.stop()

This is the part that calls OCR processing from the GUI. In addition to thread kicking the GUI so it doesn't get busy It is daemonized so that if the parent is killed, the child will also be killed. In other words, OCR processing does not run in the background when the GUI is closed.

Also, it is difficult to kill a thread directly, so Stop processing is implemented by changing the value of the global variable referenced by OCR processing.

Make exe with pyinstaller

Normal exe conversion only requires the user to install tesseract. This is really troublesome, so I made it available without installing it.

    #Tesseract in the environment variable PATH(OCR tool)Pass through
    os.environ["PATH"] += os.pathsep + os.path.dirname(os.path.abspath(__file__)) + os.sep + RESORSES_FOLDER_NAME

This overlaps with the above part, but it is the process of passing the path to the RESORSES_FOLDER_NAME folder on the same folder as the script. After converting to exe, create a RESORSES_FOLDER_NAME folder on the exe file folder and store the tesseract module in this folder.

As an aside, I also tried exe conversion of one file with --onefile of pyinstaller, but I quit because it took more than 1 minute to start.

pyinstaller --clean --icon=chunta_auto_click.ico -n chunta_auto_click chunta_auto_click.py --noconsole

The above is the command when it is converted to exe.

Finally

There are still many performance challenges, but I was able to create the tool I had in mind. I was keenly aware that it was a good time for people like me to easily use the OCR library by collecting information online. I hope this article helps someone.

Source code

chunta auto click (executable)