When you download a file using Selenium, you have to think about waiting for it to complete. I think there are the following common errors and issues. -** Download is not completed and an error occurs when trying to handle the file ** ・ ** Normally, you can download in a few seconds, but you can wait 15 seconds with plenty of time, or there is an extra waiting time **
Therefore ・ ** Subsequent processing will start as soon as the download is completed ** ・ ** Wait timeout time can be set in 1 second increments ** ・ ** Error handling at timeout is also possible ** I tried to make such a standby process.
Environment: Windows10, Python3.8.3, Selenium3.141.0 Browser: Google Chrome ChromeDriver83.0.4103.39
In implementing this function, we are using the specifications of Google Chrome (as of 2020/07/14). The specifications from the start to the completion of Chrome download are described.
It has such specifications.
So ** the extension of the downloaded file is ".Crdownload": Downloading Other than ".crdownload": Download completed ** Since it can be judged that, we will monitor the existence of the extension ".crdownload" and wait.
When you start Chrome with Selenium, the download destination is the "C: \ Users \ username \ Downloads" folder if it is Windows. If you use the default folder as it is, files downloaded in the past are included, etc. Since it is difficult to monitor the extension, create a temporary download folder to store only the file to be downloaded this time.
Create a temporary folder in the project folder containing this Python file as the working folder.
python
import os
#Get current directory
current_dir = os.getcwd()
#Temporary download folder path settings
tmp_download_dir = f'{current_dir}\\tmpDownload'
#Creating a temporary folder
os.mkdir(tmp_download_dir)
When specifying the download folder in Chrome on Windows, specify the delimiter with a "" backslash instead of a "/" slash.
In this article, the path delimiter is escaped with two backslashes "\\
".
[For beginners] Unexpected behavior if "" is included when setting the path in Python
Change the download destination to the temporary download folder created in the previous section. Use the Chrome option to make changes.
python
from selenium import webdriver
#Change download destination in Chrome option settings
options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : tmp_download_dir }
options.add_experimental_option('prefs',prefs)
#Driver path setting
driver_path = 'webdriver\\chromedriver.exe'
#Apply options and launch Chrome
driver = webdriver.Chrome(executable_path = driver_path, chrome_options = options)
Now, when you start Chrome, the download folder will be set and it will be saved in the temporary download folder.
After starting the download in Selenium, set the wait timeout time and wait until it completes.
python
from selenium import webdriver
import os
import sys
import glob
import time
#Download start process with Selenium(Click the download link, etc.)
#Wait timeout time(Seconds)Setting
timeout_second = 10
#Wait for specified time
for i in range(timeout_second + 1):
#Get file list
download_fileName = glob.glob(f'{tmp_download_dir}\\*.*')
#If the file exists
if download_fileName:
#Extraction of extension
extension = os.path.splitext(download_fileName[0])
#The extension is'.crdownload'Not download complete exit waiting
if '.crdownload' not in extension[1] : break
#Even if you wait for the specified time.Error if files other than crdownload cannot be confirmed
if i >= timeout_second:
# ==Error handling described here==
#End processing
driver.quit()
#Delete temporary folder
shutil.rmtree(tmp_download_dir)
sys.exit()
#Wait a second
time.sleep(1)
#The following processing after the download is completed Storage in the normal download folder, etc.
Roughly speaking, it loops for a specified number of seconds. ** "Check if the download is complete, exit the wait loop if it is complete, wait 1 second if not" ** I am doing.
To exit the wait loop immediately after the download is complete For example, set the timeout seconds to "10 seconds", and the download is actually completed after "3 seconds". In that case, you can proceed to the subsequent processing after the download is completed "3-4 seconds" after the download starts.
If the completion cannot be confirmed even after the specified number of seconds, the error processing is started without exiting the loop.
To monitor the extension, glob.glob first gets a list of files in the temporary folder.
#Get file list
download_fileName = glob.glob(f'{tmp_download_dir}\\*.*')
The following list will be returned depending on the download status.
■ ** If the download has not started even after clicking the download button ** File does not exist in folder: ** \ [](empty list) **
■ ** Downloading ** test.csv.crdownload generated: ** [test.csv.crdownload] **
■ ** Download completed ** Remove ".crdownload": ** [test.csv] **
Judging this list, ** When the list is not empty (file exists) and the extension is not ".crdownload" ** I am trying to get out of the waiting loop as the download is completed.
Os.path.splitext is used as the process to extract only the extension.
#Extraction of extension
extension = os.path.splitext(download_fileName[0])
When the file name is " test.csv.crdownload
"
In extension [0] ** File name before extension [test.csv] **
Extension [1] ** Extension [.crdownload] **
When the file name is " test.csv
"
In extension [0] ** File name before extension [test] **
Extension [1] ** Extension [.csv] **
Is entered.
It is determined whether the extension [1] with the extension is '.crdownload'
.
#The extension is.Not crdownload Download completion process exits
if '.crdownload' not in extension[1] : break
The above process is summarized, adjusted, and moved from the temporary download folder to the regular download folder to complete. Just change the waiting time when using it!
python
from selenium import webdriver
import os
import sys
import glob
import shutil
import time
#Get current directory
current_dir = os.getcwd()
#Temporary download folder path settings
tmp_download_dir = f'{current_dir}\\tmpDownload'
#Delete the temporary folder if it exists(The previous one may remain)
if os.path.isdir(tmp_download_dir):
shutil.rmtree(tmp_download_dir)
#Creating a temporary download folder
os.mkdir(tmp_download_dir)
#Change the download destination in Chrome option settings
options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : tmp_download_dir }
options.add_experimental_option('prefs',prefs)
#Driver path setting
driver_path = 'webdriver\\chromedriver.exe'
#Apply options and launch Chrome
driver = webdriver.Chrome(executable_path = driver_path, chrome_options = options)
# ===Screen transition===
# driver.get('https://xxxxxxx.co.jp/')
#Click the link to start downloading
# driver.find_element_by_xpath('//*[@id="download"]').click()
#Wait timeout time(Seconds)Setting
timeout_second = 10
#Wait for specified time
for i in range(timeout_second + 1):
#Get file list
download_fileName = glob.glob(f'{tmp_download_dir}\\*.*')
#If the file exists
if download_fileName:
#Extraction of extension
extension = os.path.splitext(download_fileName[0])
#The extension is'.crdownload'Not download complete exit waiting
if '.crdownload' not in extension[1] : break
#Even if you wait for the specified time.Error if files other than crdownload cannot be confirmed
if i >= timeout_second:
# ==Error handling described here==
#End processing
driver.quit()
#Delete temporary folder
shutil.rmtree(tmp_download_dir)
sys.exit()
#Wait a second
time.sleep(1)
# ===Post-processing after download completion===
#Close chrome
driver.quit()
#Store in the primary download folder
shutil.move(download_fileName[0], f'{current_dir}\\Download')
#Delete temporary folder
shutil.rmtree(tmp_download_dir)
However, if you move the file with the same name to the original download folder as it is, an error will occur, so you need to recreate it depending on the situation.
Download_fileName [0]
is the full path of the downloaded file, so please rename it as you like.Thank you for visiting.
There is a better way. Such If you have any suggestions, I would appreciate it if you could comment.
Complete automatic operation of Chrome with Python + Selenium Set the default download folder for Selenium Chrome driver Get the list of files in a folder with Python Get file name / folder name / extension from path string in Python, combine
Recommended Posts