Introduction

When you download a file using Selenium, you have to think about waiting for it to complete. I think there are the following common errors and issues. -** Download is not completed and an error occurs when trying to handle the file ** ・ ** Normally, you can download in a few seconds, but you can wait 15 seconds with plenty of time, or there is an extra waiting time **

Therefore ・ ** Subsequent processing will start as soon as the download is completed ** ・ ** Wait timeout time can be set in 1 second increments ** ・ ** Error handling at timeout is also possible ** I tried to make such a standby process.

Environment: Windows10, Python3.8.3, Selenium3.141.0 Browser: Google Chrome ChromeDriver83.0.4103.39

Chrome download specifications

In implementing this function, we are using the specifications of Google Chrome (as of 2020/07/14). The specifications from the start to the completion of Chrome download are described.

Suppose there is a link that downloads a file called "test.csv" when you click it. ** 1, click the link Download start ** ** 2, "test.csv.cr download" file is generated in the download folder ** ** 3. When the download is completed, ".crdownload" will be removed and "test.csv" will be available **

It has such specifications.

So ** the extension of the downloaded file is ".Crdownload": Downloading Other than ".crdownload": Download completed ** Since it can be judged that, we will monitor the existence of the extension ".crdownload" and wait.

Creating a temporary download folder

When you start Chrome with Selenium, the download destination is the "C: \ Users \ username \ Downloads" folder if it is Windows. If you use the default folder as it is, files downloaded in the past are included, etc. Since it is difficult to monitor the extension, create a temporary download folder to store only the file to be downloaded this time.

Create a temporary folder in the project folder containing this Python file as the working folder.

`python`


import os

#Get current directory
current_dir = os.getcwd()

#Temporary download folder path settings
tmp_download_dir = f'{current_dir}\\tmpDownload'

#Creating a temporary folder
os.mkdir(tmp_download_dir)

* Caution for path delimiters

When specifying the download folder in Chrome on Windows, specify the delimiter with a "" backslash instead of a "/" slash. In this article, the path delimiter is escaped with two backslashes "\\". [For beginners] Unexpected behavior if "" is included when setting the path in Python

Specify Chrome download folder

Change the download destination to the temporary download folder created in the previous section. Use the Chrome option to make changes.

`python`


from selenium import webdriver

#Change download destination in Chrome option settings
options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : tmp_download_dir }
options.add_experimental_option('prefs',prefs)

#Driver path setting
driver_path = 'webdriver\\chromedriver.exe'

#Apply options and launch Chrome
driver = webdriver.Chrome(executable_path = driver_path, chrome_options = options)

Now, when you start Chrome, the download folder will be set and it will be saved in the temporary download folder.

Wait until the download is complete

After starting the download in Selenium, set the wait timeout time and wait until it completes.

`python`


from selenium import webdriver

import os
import sys
import glob
import time

#Download start process with Selenium(Click the download link, etc.)

#Wait timeout time(Seconds)Setting
timeout_second = 10

#Wait for specified time
for i in range(timeout_second + 1):
    #Get file list
    download_fileName = glob.glob(f'{tmp_download_dir}\\*.*')
    
    #If the file exists
    if download_fileName:
        #Extraction of extension
        extension = os.path.splitext(download_fileName[0])

        #The extension is'.crdownload'Not download complete exit waiting
        if '.crdownload' not in extension[1] : break

    #Even if you wait for the specified time.Error if files other than crdownload cannot be confirmed
    if i >= timeout_second:
        # ==Error handling described here==
        #End processing
        driver.quit()
        #Delete temporary folder
        shutil.rmtree(tmp_download_dir)
        sys.exit()

    #Wait a second
    time.sleep(1)

#The following processing after the download is completed Storage in the normal download folder, etc.

Commentary

Roughly speaking, it loops for a specified number of seconds. ** "Check if the download is complete, exit the wait loop if it is complete, wait 1 second if not" ** I am doing.

To exit the wait loop immediately after the download is complete For example, set the timeout seconds to "10 seconds", and the download is actually completed after "3 seconds". In that case, you can proceed to the subsequent processing after the download is completed "3-4 seconds" after the download starts.

If the completion cannot be confirmed even after the specified number of seconds, the error processing is started without exiting the loop.

Extension monitoring

To monitor the extension, glob.glob first gets a list of files in the temporary folder.

#Get file list
download_fileName = glob.glob(f'{tmp_download_dir}\\*.*')

The following list will be returned depending on the download status.

■ ** If the download has not started even after clicking the download button ** File does not exist in folder: ** \ [](empty list) **

■ ** Downloading ** test.csv.crdownload generated: ** [test.csv.crdownload] **

■ ** Download completed ** Remove ".crdownload": ** [test.csv] **

Judging this list, ** When the list is not empty (file exists) and the extension is not ".crdownload" ** I am trying to get out of the waiting loop as the download is completed.

Os.path.splitext is used as the process to extract only the extension.

#Extraction of extension
extension = os.path.splitext(download_fileName[0])

When the file name is " test.csv.crdownload " In extension [0] ** File name before extension [test.csv] ** Extension [1] ** Extension [.crdownload] **

When the file name is " test.csv " In extension [0] ** File name before extension [test] ** Extension [1] ** Extension [.csv] ** Is entered.

It is determined whether the extension [1] with the extension is '.crdownload'.

#The extension is.Not crdownload Download completion process exits
if '.crdownload' not in extension[1] : break

Complete

The above process is summarized, adjusted, and moved from the temporary download folder to the regular download folder to complete. Just change the waiting time when using it!

`python`


from selenium import webdriver

import os
import sys
import glob
import shutil
import time

#Get current directory
current_dir = os.getcwd()

#Temporary download folder path settings
tmp_download_dir = f'{current_dir}\\tmpDownload'

#Delete the temporary folder if it exists(The previous one may remain)
if os.path.isdir(tmp_download_dir):
    shutil.rmtree(tmp_download_dir)

#Creating a temporary download folder
os.mkdir(tmp_download_dir)

#Change the download destination in Chrome option settings
options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : tmp_download_dir }
options.add_experimental_option('prefs',prefs)

#Driver path setting
driver_path = 'webdriver\\chromedriver.exe'

#Apply options and launch Chrome
driver = webdriver.Chrome(executable_path = driver_path, chrome_options = options)

# ===Screen transition===
# driver.get('https://xxxxxxx.co.jp/')

#Click the link to start downloading
# driver.find_element_by_xpath('//*[@id="download"]').click()

#Wait timeout time(Seconds)Setting
timeout_second = 10

#Wait for specified time
for i in range(timeout_second + 1):
    #Get file list
    download_fileName = glob.glob(f'{tmp_download_dir}\\*.*')

    #If the file exists
    if download_fileName:
        #Extraction of extension
        extension = os.path.splitext(download_fileName[0])

        #The extension is'.crdownload'Not download complete exit waiting
        if '.crdownload' not in extension[1] : break

    #Even if you wait for the specified time.Error if files other than crdownload cannot be confirmed
    if i >= timeout_second:
        # ==Error handling described here==
        #End processing
        driver.quit()
        #Delete temporary folder
        shutil.rmtree(tmp_download_dir)
        sys.exit()

    #Wait a second
    time.sleep(1)

# ===Post-processing after download completion===
#Close chrome
driver.quit()

#Store in the primary download folder
shutil.move(download_fileName[0], f'{current_dir}\\Download')

#Delete temporary folder
shutil.rmtree(tmp_download_dir)

However, if you move the file with the same name to the original download folder as it is, an error will occur, so you need to recreate it depending on the situation.

Download_fileName [0] is the full path of the downloaded file, so please rename it as you like.

in conclusion

Thank you for visiting.

There is a better way. Such If you have any suggestions, I would appreciate it if you could comment.

reference

Complete automatic operation of Chrome with Python + Selenium Set the default download folder for Selenium Chrome driver Get the list of files in a folder with Python Get file name / folder name / extension from path string in Python, combine

Python Selenium Dynamic download wait

Introduction

Chrome download specifications

Creating a temporary download folder

python

* Caution for path delimiters

Specify Chrome download folder

python

Wait until the download is complete

python

Commentary

Extension monitoring

Complete

python

in conclusion

reference

`python`

`python`

`python`

`python`