Download images from URL list in Python

Download files from URL list in Python

In the following article, I crawl a specific website and have a list of URLs, so I wrote the code to download it.

WEB scraping with BeautifulSoup4 (serial number page)

WEB scraping with BeautifulSoup4 (layered page)

Source

simple_downloader.py


# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import requests, os

headers = { 'User-Agent' : 'Mozilla/5.0' }
cwd = os.getcwd()
result_dir = cwd + '/download/'
list_file = cwd + '/list.txt'
done_file = 'done.txt'
fail_file = 'fail.txt'

def fetchImage(url):
    path_relative = url.replace('http://', '').replace('https://', '')
    try:
        res = requests.get(url, headers = headers)
        image = res.content
        paths = os.path.split(path_relative)[0].split('/')
        path_current = result_dir
        for path in paths:
            path_current += path + '/'
            if not os.path.exists(path_current):
                os.mkdir(path_current)
        with open('{result_dir}{path_relative}'.format(result_dir = result_dir, path_relative = path_relative), 'wb') as f:
            f.write(image)
    except:
        return False
    return True

def getUrl():
    result = ''
    with open(list_file, 'r') as f:
        url_list = f.read().split('\n')
    result = url_list.pop(0)
    with open(list_file, 'w') as f:
        f.write('\n'.join(url_list))
    return result

def saveUrl(file_name, url):
    with open(file_name, 'a') as f:
        f.write(url + '\n')

def download():
    url = getUrl()
    while url != '':
        if fetchImage(url):
            saveUrl(done_file, url)
            print('done ' + url)
        else:
            saveUrl(fail_file, url)
            print('fail ' + url)
        url = getUrl()

download()

Referenced site

Batch download of images from specific URLs with python Modified version [Python] File / Directory Manipulation

Recommended Posts

Download images from URL list in Python
Extract text from images in Python
Batch download images from a specific URL with python Modified version
Sorted list in Python
Filter List in Python
Parallel download in Python
List find in Python
[Python] Download original images from Google Image Search
Load images from URLs using Pillow in Python 3
Bulk download images from specific URLs with python
Randomly select elements from list (array) in python
Bulk download images from specific site URLs with python
Base64 encoding images in Python 3
Download the file in Python
OCR from PDF in Python
How to download files from Selenium in Python in Chrome
Getting list elements in Python
Relative url handling in python
Extract multiple list duplicates in Python
Pixel manipulation of images in Python
Difference between list () and [] in Python
Output 2017 Premium Friday list in Python
Download images from "Irasutoya" using Scrapy
Get data from Quandl in Python
How to collect images in Python
Download Google Drive files in Python
python / Make a dict from a list.
Post images from Python to Tumblr
Python3> List generation from iterable> list (range (5))
Delete multiple elements in python list
Working with DICOM images in Python
Extract strings from files in Python
Download python
[Python] list
Get exchange rates from open exchange rates in Python
Mosaic images in various shapes (Python, OpenCV)
Display a list of alphabets in Python 3
Revived from "no internet access" in Python
Prevent double boot from cron in Python
OR the List in Python (zip function)
Check if the URL exists in Python
Difference between append and + = in Python list
Get battery level from SwitchBot in Python
Summary of built-in methods in Python list
Generate a class from a string in Python
Generate C language from S-expressions in Python
Get the EDINET code list in Python
Convert from Markdown to HTML in Python
Get Precipitation Probability from XML in Python
Download files in any format using Python
Get rid of DICOM images in Python
Read text in images with python OCR
[Python] Understand list slicing operations in seconds
Get metric history from MLflow in Python
[Python] (Line) Extract values from graph images
Extract every n elements from an array (list) in Python and Ruby
Detect Japanese characters from images using Google's Cloud Vision API in Python
Get time series data from k-db.com in Python
Quadtree in Python --2
Python in optimization
CURL in python