[PYTHON] Break through Google "reCAPTCHA"! Challenge full automation of browser operation with "2Captcha"

table of contents

-Introduction -[What is 2Captcha](What is # 2captcha) -[Preparing to use 2Captcha](Preparing to use # 2captcha) -[Python + Selenium + 2Captcha breaks through "reCAPTCHAv2"](# pythonselenium2captcha breaks through recaptchav2) -[Saigo ni](# Saigo ni) -Reference

Introduction

I think the biggest difficulty in the task of automating scraping and browser operation is breaking through various captures. In the first place, the capture function is installed because it is not operated by the robot, so I'm wondering what happens when I try to break through it, but there are times when I still want to do something about it. There is a service called "** 2 Captcha **" as a solution in such a case.

I recently learned about this service and tried to use it, and it was so easy to break through the capture, so I will introduce it here.

What is 2 Captcha?

スクリーンショット 2020-11-06 17.00.05.png

It is a service to break through the capture function provided by a Russian company. 2 Captcha's API can be used to automate the capture process. Although it is a paid service, the fee for one API request is about 0.3 yen, so I think it is a sufficiently cheap amount.

How it works

With a service called 2Captcha, you can break through the difficult capture function with overwhelming human wave tactics. When the user uses the API of 2Captcha to send the information of the capture that he wants to cancel, a large number of workers somewhere will cancel the capture and return the necessary information.

スクリーンショット

Language support

2Captcha provides libraries in multiple program languages as a way to use the API more easily.

2 Preparing to use Captcha

account registration

Go to https://2captcha.com/ Register for an account from the "Registration" button on the upper right. スクリーンショット 2020-11-06 13.43.33.png

Set your e-mail address and password and registration is complete. スクリーンショット

When you log in, you will see the following page. スクリーンショット

payment

Unfortunately 2Captcha is not available for free. After logging in, make a deposit from "Add funds" at the top of the screen.

Select an available payment service and set the amount. I paid with PayPal. For the time being, let's deposit the minimum deposit of 3 $. スクリーンショット

When the payment is completed, the original screen display should change to 3 $. (It seems that it may take some time depending on the payment method.)

Get API key

After logging in, the API key is displayed in the center of the screen. 2 Make a copy as it is necessary for using Captcha.

スクリーンショット 2020-11-06 16.04.10.png

Break through "reCAPTCHAv2" with Python + Selenium + 2 Captcha

Let's break through reCAPTCHA v2 using Python.

Package installation

A package for Python is available, so install it first.

pip install 2captcha-python

In addition, the following test handles Headless Chrome with Selenium. For preparation of Selnium, please refer to this article.

Tested on reCAPTCHA demo page

This time I would like to test 2Captcha using this demo page. https://recaptcha-demo.appspot.com/recaptcha-v2-checkbox.php

import traceback

import chromedriver_binary
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from twocaptcha import TwoCaptcha

solver = TwoCaptcha('YOUR_API_KEY')  #Please set your own API key
url = 'https://recaptcha-demo.appspot.com/recaptcha-v2-checkbox.php'


def main():
    #Launch browser
    options = Options()
    options.add_argument('--headless')
    driver = webdriver.Chrome(options=options)

    try:
        #Page access
        driver.get(url)

        # data-Get the value of the sitekey attribute
        data_sitekey = driver.find_element_by_css_selector('[data-sitekey]').get_attribute('data-sitekey')

        #2 Get the unlock code with Captcha
        response = solver.recaptcha(sitekey=data_sitekey, url=url)
        code = response['code']

        #Enter the unlock code in the specified textarea
        textarea = driver.find_element_by_id('g-recaptcha-response')
        driver.execute_script(f'arguments[0].value = "{code}";', textarea)

        #Button click
        driver.find_element_by_css_selector('button[type="submit"]').click()

        #Result display(success:"Success!",Failure:"Something went wrong")
        result = driver.find_element_by_css_selector('body>main>h2:nth-child(3)').text
        print(result)

    except BaseException:
        print(traceback.format_exc())
    driver.quit()


if __name__ == '__main__':
    main()

Execution result: Success!

The response of 2Captcha took about 5 to 20 seconds, but I was able to break through reCAPTCHA.

finally

What did you think. This time I tried to break through Google's reCAPTCHA v2, but it seems that it also supports reCAPTCHA v3 and capture functions other than Google. The bottleneck is that it costs a little money, but it seems to be useful to have it as an option when absolutely necessary.

reference

Recommended Posts

Break through Google "reCAPTCHA"! Challenge full automation of browser operation with "2Captcha"