Do you want to wait for general purpose in Python Selenium?

Introduction

Well, I didn't like mouse fluttering in the business system (attendance management).

Common things in crawling

Basically, it is a system in which humans flutter with a mouse, so it is not considered to be operated by a machine. There is a premise that you will never press a button that you have not seen yet, so if you think about moving it according to the person, do something when you can press it properly or when you can input it. It is necessary to control the timing like this.

Naturally, selenium has such a mechanism.

document: waits, Waits, Wait (translation above)

The second and third documents show two ways. However, waiting for a certain period of time in the dark clouds is an implicit wait, and it's okay to use time.sleep without leaving it to selenium. Well, typing costs and forgetting to put in are eliminated, so it is easy to say that it is easy, but if the response speed differs depending on the network situation and PC load situation, the expected result may not be obtained, so it is originally explicit. Waiting is desirable. However, [I'm polling internally for a certain period (default 0.5s)](https://seleniumhq.github.io/selenium/docs/api/py/_modules/selenium/webdriver/support/wait.html# WebDriverWait), so the processing load will be slightly higher.

How to wait

So, in the sample written in the above document, it is shown as follows.

#The first one
element = WebDriverWait(driver, 10).until(lambda x: x.find_element_by_id(“someId”))

#2nd and 3rd
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))
    )
finally:
    #Exception handling

Customize waiting conditions

In addition, this document describes how to customize. .. Helper classes defined in ExceptedConditions such as presence_of_element_located also works this way. I am using. However, this method can be understood if it is a little more complicated, but if it is as simple as the one presented, it can be done more easily.

You can create a custom wait condition using a class with a call method that returns False if the conditions do not match.

You only have to keep the conditions such as, so you can use a lambda expression or something. (It's mentioned in the very first document.)

def proc(driver, type, name, cname):
  #Create the process you want to customize, return False when the conditions do not match, return the element if successful
  element = driver.find_element(type, name)
  if cname in element.get_attribute("class"):
    return element
  else:
    return False

wait = WebDriverWait(driver, 10)
try:
  element = wait.until(lambda drv: proc(drv, By.ID, 'myNewInput', 'myCSSClass'))
except TimeoutException:
  print("timeout..")
  sys.exit()

An example of a lamda expression is [this document](https://seleniumhq.github.io/selenium/docs/api/py/webdriver_support/selenium.webdriver.support.wait.html?highlight=webdriverwait#selenium.webdriver.support .wait.WebDriverWait ) Is also described. This document has a link to the source, so I think it's very useful. If you refer to here, the point is that you can add a condition such as not only getting an element when it is found, but also getting an element with a certain attribute.

The main subject?

Well, in the first sample, I confirmed that the ID exists in the DOM, but for example, the element you want to check may be class or name, and the element to wait for changes depending on the page and configuration. At that time, this is By.CLASS_NAME, and I wanted to avoid being aware of identifiers such as By.NAME (in terms of coding). Of course, when crawling, you need to think about how easy it is because this element is pulled by the ID, and how to uniquely identify it because it is a class. So, for the time being, I made a little wait process based on the above example. I haven't confirmed it properly.

def wait(drv, sec, selector):
    def chk(selector):
        elem = drv.find_element(By.ID, selector)
        if elem:
            return elem
        elem = drv.find_element(By.CLASS_NAME, selector)
        #print("css:",type(elem), elem)
        if elem:
            return elem
        elem = drv.find_element(By.XPATH, selector)
        if elem:
            return elem
        elem = drv.find_elements(By.ID, selector)
        if elem:
            return elem
        elem = drv.find_elements(By.CLASS_NAME, selector)
        if elem:
            return elem
        return False

    try:
        elem = WebDriverWait(drv, sec).until(
            lambda _: chk(selector)
        )
        return elem
    except TimeoutException:
        print(f"wait timeout.. {selector} not found")
        return None

elem = wait(driver, 10, "elem_name")
if not elem:
    print("wow, unknown error.")

It's kind of like that, but I can't forgive the fact that chk is a bit redundant and find_element is done many times. Furthermore, the elements that can be obtained may be lists. .. In that sense, it seems better to change the way of thinking a little.

Also, stackoverflow provides an example solution for creating your own class. Pass the check part in a list, and if any one hits, it's OK. It's pretty smart, so let's organize it so that it works properly with this idea.

Final form

Wait implementation like that

class AnyEc:
        """ Use with WebDriverWait to combine expected_conditions
                in an OR.
        """""
        def __init__(self, *args):
                if type(args) is tuple:
                        lval = list(args)
                else:
                        lval = args

                self.ecs = []
                for v in lval:
                        if type(v) is list:
                                self.ecs += v
                        else:
                                self.ecs.append(v)

                print("ecs type: ", type(self.ecs))
        def __call__(self, driver):
                #print("ecs: ", self.ecs)
                for fn, param in self.ecs:
                        r = fn(param)
                        print("param: ", param, r)
                        if r :
                                return r
                return False

def wait_any(drv, sec, *args):
        try:
                elem = WebDriverWait(drv, sec).until(
                        AnyEc(*args)
                )
                return elem
        except TimeoutException:
                print(f"wait timeout.. {args} not found")
                return False

How to use

def make_css_selector(key):
        value = []
        value += ['[id="%s"]' % key]
        value += ['#%s' % key]
        value += [key]
        value += ['[name="%s"]' % key]
        value += [".%s" % key]
        return value

#Usage sample

#Url to access
url='https://ja.stackoverflow.com/'
#The tag you want to find
str='question-mini-list h3'
#Leave it to me
val = make_css_selector(str)
fn = [(driver.find_elements_by_css_selector, x) for x in val]

driver = webdriver.Chrome()
driver.get(url)

try :
  #Wait until you find the tag you're looking for, time out after 10 seconds
  elem = wait_any(driver, 10, fn)
  for e in elem:
    print(e.text)

finally:
  driver.close()
  driver.quit()

It doesn't look that smart after all: sweat_smile:

Digression

Originally I wanted to combine XPath with or so that it could be done in one shot, but I gave up because it was troublesome to convert to XPath: stuck_out_tongue_winking_eye:

By the way, let's take a look at the source of find_element. https://seleniumhq.github.io/selenium/docs/api/py/_modules/selenium/webdriver/remote/webdriver.html#WebDriver.find_element

        if self.w3c:
            if by == By.ID:
                by = By.CSS_SELECTOR
                value = '[id="%s"]' % value
            elif by == By.TAG_NAME:
                by = By.CSS_SELECTOR
            elif by == By.CLASS_NAME:
                by = By.CSS_SELECTOR
                value = ".%s" % value
            elif by == By.NAME:
                by = By.CSS_SELECTOR
                value = '[name="%s"]' % value
        return self.execute(Command.FIND_ELEMENT, {
            'using': by,
            'value': value})['value']

In fact, it's almost replaced with CSS_SELECTOR. So, if I didn't need to specify it in XPath, I thought that I should be able to use one find, but it didn't work, so I gave up here.

Recommended Posts

Do you want to wait for general purpose in Python Selenium?
I want to do Dunnett's test in Python
Wait for another window to open in Selenium
When you want to plt.save in a for statement
I want to do something in Python when I finish
What to do if you get `No kernel for language python found` in Hydrogen
What to do if ʻarguments [0] .scrollIntoView ();` fails in python selenium
I want to do something like sort uniq in Python
[Python] How to do PCA in Python
Seeking a unified way to wait and get for state changes in Selenium for Python elements
[Subprocess] When you want to execute another Python program in Python code
What to do if you get a minus zero in Python
[Python] When you want to use all variables in another file
If you want to assign csv export to a variable in python
How to do R chartr () in Python
Let's summarize what you want to do.
What to do if you can't use scikit grid search in Python
What to do if No Python documentation found for ... appears in pydoc
If you want to count words in Python, it's convenient to use Counter.
I want to do a monkey patch only partially safely in Python
I want to create a window in Python
Minimal implementation to do Union Find in Python
I want to merge nested dicts in Python
Log in to Yahoo Business with Selenium Python
Try to calculate RPN in Python (for beginners)
What to do to get google spreadsheet in python
I want to display the progress in Python!
Do you want me to fix that copy?
Use PIL in Python to extract only the data you want from Exif
What to do if you get "Python not configured." Using PyDev in Eclipse
I want to write in Python! (1) Code format check
Tool to make mask image for ETC in Python
[For beginners] How to use say command in python!
I want to embed a variable in a Python string
I want to easily implement a timeout in python
How to do hash calculation with salt in Python
Key input that does not wait for key input in Python
How to run python in virtual space (for MacOS)
I want to write in Python! (2) Let's write a test
Even in JavaScript, I want to see Python `range ()`!
I want to randomly sample a file in Python
Links to do what you want with Sublime Text
What to do if you get an error when importing matplotlib in Python (Mac)
I was addicted to scraping with Selenium (+ Python) in 2020
I just want to find the 95% confidence interval for the difference in population ratios in Python
% And str.format () in Python. Which one do you use?
I want to work with a robot in python.
How to download files from Selenium in Python in Chrome
I want to write in Python! (3) Utilize the mock
For those who want to write Python with vim
[ML Ops] I want to do multi-project with Python
To do the equivalent of Ruby's ObjectSpace._id2ref in Python
I want to use the R dataset in python
I want to manipulate strings in Kotlin like Python!
What to do if you can't hit the arrow keys in the Python interactive console
What to do if you run python in IntelliJ and end with an error
What to do when you can't bind CaboCha to Python
Make a note of what you want to do in the future with Raspberry Pi
What to do if you get Swagger-codegen in python and Import Error: No module named
[Python3] Code that can be used when you want to resize images in folder units
[Python] When you want to import and use your own package in the upper directory