Get the size of the image file on the web (Python3, no additional library required)

When collecting images on the Web with a crawler, etc., there may be a situation where you want to obtain only the size (resolution) of the image without downloading the entire image file. Therefore, if you just look at the header of the file, you can get the desired information without downloading everything.

Below is the source code. I have confirmed the operation with Python 3.4.2 + OS X 10.10 (Yosemite). It supports GIF, JPEG and PNG formats. There is no need for additional libraries such as OpenCV.

import sys
import struct
import urllib.request

def parse_jpeg(res):
    while not res.closed:
        (marker, size) = struct.unpack('>2sH', res.read(4))
        if marker == b'\xff\xc0':
            (_,height,width,_) = struct.unpack('>chh10s', res.read(size-2))
            return (width,height)
        else:
            res.read(size-2)

def parse_png(res):
    (_,width,height) = struct.unpack(">14sII", res.read(22))
    return (width, height)

def parse_gif(res):
    (_,width,height) = struct.unpack("<4sHH", res.read(8))
    return (width, height)

def get_image_size(url):
    res = urllib.request.urlopen(url)
    size = (-1,-1)
    if res.status == 200:
        signature = res.read(2)
        if signature == b'\xff\xd8': #jpg
            size = parse_jpeg(res)
        elif signature == b'\x89\x50': #png
            size = parse_png(res)
        elif signature == b'\x47\x49': #gif
            size = parse_gif(res)
    res.close()
    return size

Please make up for error handling as appropriate.

Reference material

Darkside Communication Group "File Format Encyclopedia" (ISBN4-87310-064-X)

Recommended Posts

Get the size of the image file on the web (Python3, no additional library required)
[Python] Get the character code of the file
Get the update date of the Python memo file.
Get the size (number of elements) of UnionFind in Python
[Python] Get the list of ExifTags names of Pillow library
Get the number of readers of a treatise on Mendeley in Python
Get the width of the div on the server side with Selenium + PhantomJS + Python
Get a capture of the entire web page in Selenium Python VBA
Check the existence of the file with python
Test & Debug Tips: Create a file of the specified size in Python
Download files on the web with Python
I made a program to check the size of a file in Python
[Python3] Understand the basics of file operations
Build a Docker image containing the private repository Python library on GitHub Actions
Get the contents of git diff from python
[Python] Get / edit the scale label of the figure
[Python] Get the main topics of Yahoo News
Get the caller of a function in Python
Image processing? The story of starting Python for
[Python] Get the last updated date of the website
Have pandas read the zip file on the web
At the time of python update on ubuntu
Convert the character code of the file with Python3
[Python] Get the day of the week (English & Japanese)
[Python + OpenCV] Whiten the transparent part of the image
Get the tag search results of Nico Nico Douga in XML format. (No login required)
[Python3] Take a screenshot of a web page on the server and crop it further
Summary of versions of the Python standard library that are now server validated on https
Extract the table of image files with OneDrive & Python
Template of python script to read the contents of the file
How to get the number of digits in Python
[Python] Get the text of the law from the e-GOV Law API
[Python] Get the numbers in the graph image with OCR
Get the image of "Suzu Hirose" by Google image search.
Create an image file using PIL (Python Imaging Library).
[python] Get the list of classes defined in the module
Install the 3rd party python library on Cinema 4D
Get the return code of the Python script from bat
Convert pdf to Text on the command line. No knowledge of Python required. About pdf2txt.py attached to pdfminer and adjustment parameters.
[Python] Save the result of web scraping the Mercari product page on Google Colab to Google Sheets and display the product image as well.
Get the operation status of JR West with Python
[Python] Get the number of views of all posted articles
Get the URL of the HTTP redirect destination in Python
[Introduction to Python] Basic usage of the library matplotlib
Python> library> os> os.walk ()> Get directory structure / Implementation to get each file path in the specified directory
A program that automatically resizes the iOS app icon to the required image size in Python
the zen of Python
I tried to find the entropy of the image with python
Put the latest version of Python on linux (Debian) on Chromebook
I tried "gamma correction" of the image with Python + OpenCV
Try to get the function list of Python> os package
Test.py is not reflected on the web server in Python3.
The story of how the Python bottle worked on Sakura Internet
Get only the Python version (such as 2.7.5) on the CentOS 7 shell
Run the intellisense of your own python library with VScode.
Get the MIME type in Python and determine the file format
Get the number of specific elements in a python list
[Python] Calculate the angle consisting of three points on the coordinates
[Note] Import of a file in the parent directory in Python
100 language processing knock-29: Get the URL of the national flag image
(Python3) No. oO (Are you using the standard library?): 5 shaders