Get information equivalent to the Network tab of Chrome developer tools with Python + Selenium

Thing you want to do

The Network tab of Chrome's developer tool (the one that opens with Ctl + Shift + i on Windows) is an interesting tool that allows you to see the timeline of the data acquired by the browser and simulate the line speed.

This time, I will simply get the URL list of the files displayed in this Network tab with Python + Selenium.

environment

Chrome 79.0.3945.45 beta Python 3.7.3 selenium 3.141.0 chromedriver-binary 79.0.3945.36.0

Debian GNU/Linux 9 (Docker container)

Implementation

Until the page is acquired by Selenium, it is as follows. Set options appropriately, such as headless mode. I get the page with driver.get (), but this excellent article was very helpful for the basic knowledge of this.

-Automatic operation of Chrome with Python + Selenium

netlogs.py


caps = DesiredCapabilities.CHROME
caps["goog:loggingPrefs"] = {"performance": "ALL"} 
# caps["loggingPrefs"] = {"performance": "ALL"} 

# options
options = ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--user-agent='+_headers["User-Agent"])

# get driver
driver = Chrome(options=options, desired_capabilities=caps)
driver.implicitly_wait(5)
driver.get("https://qiita.com/")

The log containing the URL is named performance, so setDesiredCapabilities to get the log [^ 1] I'll give you this when you get the driver [^ 2].

The setting name of DesiredCapabilities depends on the environment. There was a case that it didn't work unless it was "loggingPrefs" instead of "goog: loggingPrefs". Is it different depending on the Chrome version ...?

netlogs.py


time.sleep(2)

I'll wait until the page loads. It seems that the theory is to wait with driver.implicitly_wait (), I put sleep because I couldn't get the desired data well. Please let me know if there is a smarter way ...

netlogs.py


netLog = driver.get_log("performance")

The log acquired by driver.get_log ("performance ") is in JSON-like format and looks like the following.

performance


[
    {'level': 'INFO', 'message': '{
            "message": {
                "method": "Page.frameResized",
                "params": {}
            },
            "webview": "***"
        }', 'timestamp': ***
    },
    {'level': 'INFO', 'message': '{

    ...

We will extract only the necessary parts from the acquired performance log.

netlogs.py


def process_browser_log_entry(entry):
    response = json.loads(entry['message'])['message']
    return response

events = [process_browser_log_entry(entry) for entry in netLog]
events = [event for event in events if 'Network.response' in event['method']]

detected_url = []
for item in events:
    if "response" in item["params"]:
        if "url" in item["params"]["response"]:
            detected_url.append(item["params"]["response"]["url"])

Of the properties " message ", those that further include Network.responseReceived in the"method"name are selectively extracted. Then, the extracted ʻeventswill be a set of items as follows. After that, I found the item containing" url " in" params "=>" response ", extracted it, and stored it in detected_url`.

network.response


[
    {
        "method": "Network.responseReceivedExtraInfo",
        "params": {
            "blockedCookies": [],
            "headers": {
                "cache-control": "max-age=0, private, must-revalidate",
                "content-encoding": "gzip",
                "content-type": "text/html; charset=utf-8",
                "date": "Sat, 23 Nov 2019 07:41:40 GMT",
                "etag": "W/\"***\"",
                "referrer-policy": "strict-origin-when-cross-origin",
                "server": "nginx",
                "set-cookie": "***",
                "status": "200",
                "strict-transport-security": "max-age=2592000",
                "x-content-type-options": "nosniff",
                "x-download-options": "noopen",
                "x-frame-options": "SAMEORIGIN",
                "x-permitted-cross-domain-policies": "none",
                "x-request-id": "***",
                "x-runtime": "***",
                "x-xss-protection": "1; mode=block"
            },
            "requestId": "***"
        }
    },
    {
    ...

Whole code

netlogs.py


caps = DesiredCapabilities.CHROME
caps["goog:loggingPrefs"] = {"performance": "ALL"}

options = ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--user-agent='+_headers["User-Agent"])

driver = Chrome(options=options, desired_capabilities=caps)
driver.implicitly_wait(5)
driver.get("https://qiita.com/")

time.sleep(2)

netLog = driver.get_log("performance")

def process_browser_log_entry(entry):
    response = json.loads(entry['message'])['message']
    return response
events = [process_browser_log_entry(entry) for entry in netLog]
events = [event for event in events if 'Network.response' in event['method']]

detected_url = []
for item in events:
    if "response" in item["params"]:
        if "url" in item["params"]["response"]:
            detected_url.append(item["params"]["response"]["url"])

Other method

It seems that you can also execute a script to get the above information [^ 3].

netlogs_js.py


scriptToExecute = "var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return JSON.stringify(network);"
netData = driver.execute_script(scriptToExecute)
netJson = json.loads(str(netData))

detected_url = []
for item in netJson:
    detected_url.append(item["name"])

I was able to get the URL list information by this method as well.

However, sometimes the desired file is not included, and I feel that it is not a stable method. (Not verified properly)

Please point out if there is a better way!

[^ 1]: I referred to this (almost copy)-[Selenium --python. How to capture network traffic's response [duplicate]](https://stackoverflow.com/questions/52633697/selenium-python-how- to-capture-network-traffics-response)

Recommended Posts

Get information equivalent to the Network tab of Chrome developer tools with Python + Selenium
PhytoMine-I tried to get the genetic information of plants with Python
I tried to get the movie information of TMDb API with Python
Try to automate the operation of network devices with Python
Get the source of the page to load infinitely with python.
How to get the information of organizations, Cost Explorer of another AWS account with Lambda (python)
Get the width of the div on the server side with Selenium + PhantomJS + Python
I tried to get the authentication code of Qiita API with Python.
Automatic operation of Chrome with Python + Selenium + pandas
Get CPU information of Raspberry Pi with Python
Python script to get note information with REAPER
Note: How to get the last day of the month with python (added the first day of the month)
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
How to get the number of digits in Python
Add information to the bottom of the figure with Matplotlib
Try to get the contents of Word with Golang
Get the operation status of JR West with Python
How to switch to smartphone mode with Python + Selenium + Chrome
To do the equivalent of Ruby's ObjectSpace._id2ref in Python
Extract the band information of raster data with python
I tried to automate the article update of Livedoor blog with Python and selenium.
Get Alembic information with Python
I tried to find the entropy of the image with python
Try to get the function list of Python> os package
I tried to get the location information of Odakyu Bus
Minimum knowledge to get started with the Python logging module
Get a list of purchased DMM eBooks with Python + Selenium
I want to get the operation information of yahoo route
How to get into the python development environment with Vagrant
[Introduction to Python] How to get data with the listdir function
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
Try to import to the database by manipulating ShapeFile of national land numerical information with Python
How to determine the existence of a selenium element in Python
Link to get started with python
How to get the ID of Type2Tag NXP NTAG213 with nfcpy
[Python] How to get the first and last days of the month
Get the weather with Python requests
Get the weather with Python requests 2
I want to output the beginning of the next month with Python
Output the contents of ~ .xlsx in the folder to HTML with Python
How to get the Python version
How to get started with Python
Get and set the value of the dropdown menu using Python and Selenium
How to automatically install Chrome Driver for Chrome version with Python + Selenium + Chrome
Memo of the program to get the date in two digits with javascript, Ruby, Python, shell script
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
I tried to improve the efficiency of daily work with Python
Get weather information with Python & scraping
The fastest way to get camera images regularly with python opencv
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
Get a capture of the entire web page in Selenium Python VBA
[Yahoo! Weather Replacement Version] How to get weather information with LINE Notify + Python
Get the number of searches with a regular expression. SeleniumBasic VBA Python
How to get the date and time difference in seconds with python
Try to image the elevation data of the Geographical Survey Institute with Python
Click the Selenium links in order to get the elements of individual pages
[Introduction to Python] How to sort the contents of a list efficiently with list sort
(Python Selenium) I want to check the settings of the download destination of WebDriver
Get the number of visits to each page with ReportingAPI + Cloud Functions
[Python] How to set the (client) window size inside the browser with Selenium