[PYTHON] XPath Basics (3) -Functions often used for XPath

Last time, I introduced the most used XPath writing method, and this time, I introduced the functions that are often used for XPath to specify data more correctly. I will.

1. contains (): Specifies an element that contains a specific string

contains () is typically used to fuzzy search for a string contained in an attribute value or text.

-** contains (@class, "XXX"): Specify the element whose attribute value contains a specific character string **

2.jpg

For example, if you want to get everything with Red in the class attribute from this HTML, write as follows.

//span[contains(@class,“Red”)]

In other words, this XPath means to get a span element ** that contains Red in ** class.

3.jpg

-** contains (text (), "XXX"): Specify elements whose text contains a specific character string ** Harry Potter(html).jpg

For example, if you want to specify an element containing the characters "Rowling" from this HTML, write as follows.

//span[contains(text(),"Rowling")]

** Tips! ** ** When specifying the page feed button, ** "contains (text ()," next ")" ** is often used. Click here for how to write an XPath that specifies a page forward button ➡ How to write an XPath that specifies a page forward button

2. position (): Specify the element at a specific position

In the previous article, I introduced that you can get the elements of order by enclosing a number in [](square bracket). You can also specify the Nth element in position.

For example, in the above HTML, "Product 3" is the 4th th element, so write it as follows.

//tbody/th[4]

Using position () =, write as follows.

//tbody/th[position()=4]

4 (1).jpg

When getting an element other than "advertisement", since "advertisement" is the first th element, write as follows.

//tbody/th[position()>1] 5 (1).jpg

3. and / not / or: Specify an element that contains multiple conditions

If you want to specify an element that contains multiple conditions at the same time, use the and / not / or function.

-** and-Specify elements that match multiple conditions ** 7.jpg

If you want to get the href including "S_20" and "pdf" from this HTML, write as follows.

//a[contains(@href,“S_20”) and contains(@href,“pdf”)]

-** not-Specify an element that does not include specific conditions ** 8.jpg

If you want to get [@href] other than https://helpcenter.octoparse.jp/hc/ja/xpath/S_10.html from this HTML, write as follows.

//a[not(contains(@href, "S_10"))]

-** or-Specify an element that matches any of the conditions ** 9.jpg

If you want to get the href containing M or L from this HTML, write as follows.

//a[contains(@href,”M_”) or contains(@href,”L_”)]

Also, if you want to get a href other than M or L, combine not and or and write as follows.

//a[not(contains(@href,”M_”) or contains(@href,”L_”))]

The above are the functions often used for XPath. If you want to understand more XPath syntax / functions, please see this article.

Original article: https://helpcenter.octoparse.jp/hc/ja/articles/360012713639

Recommended Posts

XPath Basics (3) -Functions often used for XPath
Astro: Python modules / functions often used for analysis
Installation summary often used for AI projects
#Python basics (functions)
Python basics: functions
Functions that can be used in for statements
# 4 [python] Basics of functions
XPath Basics (2) -How to write XPath
Tools used for sutra copying
Settings often used in Jupyter
Seaborn basics for beginners ④ pairplot
XPath Basics (1) -Basic Concept of XPath
Keyword arguments for Python functions
Python for super beginners Python #functions 1