Bulk download images from specific URLs with python

Introduction

In the previously written "Download images from a specific site URL in bulk with python", for example

<img src="../sample.png ">

This time, I fixed the problem that the image could not be downloaded due to ERROR. However, this program is basically root on the url? I'm thinking of a case where it is divided on (http: // domain name / hierarchy). (I'm pretty poor at explaining and I don't understand what I'm saying (^^;) ↑)

Added code

What I added in this code is a function called get_url_root. Arguments: url entered by the user Return value: url up to the domain name. If the return value is 0, it means ERROR.

python


def get_url_root(url):
    if("http://" in url):
        url_delet_http = url.lstrip("http://")
        if("/" in url_delet_http):
            url_root = "http://" + url_delet_http[0:url_delet_http.find("/")]
            return url_root
    elif("https://" in url):
        url_delet_https = url.lstrip("https://")
        if("/" in url_delet_http):
            url_root = "http://" + url_delet_http[0:url_delet_http.find("/")]
            return url_root
    return 0

Edited code

Finally, the part where you actually download the image. If the url contains " ../ ", use the above function to get the url up to the domain name and For example, change the .. part of ../sample.png to http://sample.com/sample.png.

python


for j in range(0,(len_url-1)):
        url = number_url[j]
        if("../" in url):
            root_url = get_url_root(serch_url)
            if(root_url!=0):
                url = url.replace("..",root_url)
                print url
                download(url)
        else:
            download(url)

code

getimage.py


# -*- coding: utf-8 -*- 

import urllib
import urllib2
import os.path
import sys
from HTMLParser import HTMLParser

def download(url):
    img = urllib.urlopen(url)
    localfile = open(os.path.basename(url),'wb')
    localfile.write(img.read())
    img.close()
    localfile.close()

def get_url_root(url):
    if("http://" in url):
        url_delet_http = url.lstrip("http://")
        if("/" in url_delet_http):
            url_root = "http://" + url_delet_http[0:url_delet_http.find("/")]
            return url_root
    elif("https://" in url):
        url_delet_https = url.lstrip("https://")
        if("/" in url_delet_http):
            url_root = "http://" + url_delet_http[0:url_delet_http.find("/")]
            return url_root
    return 0

class imgParser(HTMLParser):

    def __init__(self):
        HTMLParser.__init__(self)

    def handle_starttag(self,tagname,attribute):
        if tagname.lower() == "img":
            for i in attribute:
                if i[0].lower() == "src":
                    img_url=i[1]
                    #Creating a file that collects the URLs of the acquired photos
                    f = open("collection_url.txt","a")
                    f.write("%s\t"%img_url)
                    f.close()

if __name__ == "__main__":

    print('Enter the URL of the site where you want to get the photo.')
    input_url = raw_input('>>>  ')
    serch_url = input_url
    htmldata = urllib2.urlopen(serch_url)

    print('Currently getting image files...')

    parser = imgParser()
    parser.feed(htmldata.read())

    parser.close()
    htmldata.close()

    #Read the generated file
    f = open("collection_url.txt","r")
    for row in f:
        row_url = row.split('\t')
        len_url = len(row_url)
    f.close()

    number_url = []

    for i in range(0,(len_url-1)):
        number_url.append(row_url[i])

    for j in range(0,(len_url-1)):
        url = number_url[j]
        if("../" in url):
            root_url = get_url_root(serch_url)
            if(root_url!=0):
                url = url.replace("..",root_url)
                print url
                download(url)
        else:
            download(url)

    print('The image download is complete.')

    #Delete file
    os.remove("collection_url.txt")

Recommended Posts

Bulk download images from specific URLs with python
Batch download images from a specific URL with python Modified version
Download images from URL list in Python
[Python] Download original images from Google Image Search
Convert PDFs to images in bulk with Python
Load images from URLs using Pillow in Python 3
Automatically download images with scraping
Bordering images with python Part 1
With skype, notify with skype from python!
Download csv file with python
Number recognition in images with Python
Download images from "Irasutoya" using Scrapy
Using Rstan from Python with PypeR
Implemented file download with Python + Bottle
Install Python from source with Ansible
Create folders from '01' to '12' with python
I can't download images with Google_images_download
Extract text from images in Python
Post multiple Twitter images with python
Run Aprili from Python with Orange
Post images from Python to Tumblr
Animate multiple still images with Python
Load gif images with Python + OpenCV
Call python from nim with Nimpy
[Python] Collect images easily with icrawler!
Read fbx from python with cinema4d
Working with DICOM images in Python
Upload and download images with falcon
[Python] Try to recognize characters from images with OpenCV and pyocr
Collecting information from Twitter with Python (Twitter API)
Receive textual data from mysql with python
Get html from element with Python selenium
Download XBRL of securities report, quarterly report, financial report from EDINET / TDNET with Python
[Note] Get data from PostgreSQL with Python
Play audio files from Python with interrupts
Create wordcloud from your tweet with python3
Amplify images for machine learning with python
Read CSV file with python (Download & parse CSV file)
Exclude tweets containing URLs with tweepy [Python]
HTTP split download guy made with Python
Capturing images with Pupil, python and OpenCV
Tweet from python with Twitter Developer + Tweepy
Download Japanese stock price data with python
Business efficiency starting from scratch with Python
Decrypt files encrypted with openssl from python with openssl
Working with Azure CosmosDB from Python Part.2
Image acquisition from camera with Python + OpenCV
Download files on the web with Python
[python, openCV] base64 Face recognition with images
Getting started with Dynamo from Python boto
Download python
Get images from specific users on Twitter
Add Gaussian noise to images with python2.7
Easily download mp3 / mp4 with python and youtube-dl!
Use C ++ functions from python with pybind11
Importing and exporting GeoTiff images with Python
Read text in images with python OCR
Upload images to Google Drive with Python
[Python] (Line) Extract values from graph images
Automatically save images of your favorite characters from Google Image Search with Python
Collecting information from Twitter with Python (Environment construction)