Faster loading of Python images

In image processing and image-based deep learning, the process of reading an image frequently occurs.

If it is hundreds of sheets, it will take several minutes just to read it if it is on the scale of tens of thousands, and if it is only once, it will be even faster if it is read many times for experiments.

Here, we will compare several libraries and show you how to reduce the loading time.

Conclusion

-** Save the image with pickle or np.save ** (Data size will increase) --Use the new pickle protocol --Save as np.uint8

Execution environment

Comparison of reading speed

The library used and the time it took to load one image (acquire data as numpy.array) are as follows.

Library Load time
OpenCV 4.23 ms
matplotlib 4.37 ms
keras.preprocessing 3.49 ms
skimage 2.56 ms
PIL 2.63 ms
numpy 333 µs
pickle(protocol=1) 597 µs
pickle(protocol=2) 599 µs
pickle(protocol=3) 112 µs
pickle(protocol=4) 118 µs
_pickle(protocol=4) 117 µs

The read image is a 512 x 512 png file.

As for numpy and pickle, the images saved as .npy and .pickle were loaded in advance, so it is not a fair comparison. ** It is a table that if you convert the image in advance, this speed will come out **, and it can not be concluded that numpy and pickle are generally fast.

pickle can specify protocol when pickle.dump, ** The newer the protocol, the faster the reading speed. ** That's right, so image data is saved for each protocol.

There is also a fast library like accimage, but I don't use it because it doesn't support macOS.

There is also an option hdf5, but it has not been examined.

The code used is as follows. (Using jupyter notebook)

import cv2
import matplotlib.pyplot as plt
import pickle
import numpy as np
from keras.preprocessing import image
from PIL import Image
from skimage import io
import _pickle

def imread1(path):
    return cv2.imread(path)

def imread2(path):
    return plt.imread(path)

def imread3(path):
    img = image.load_img(path)
    return image.img_to_array(img)

def imread4(path):
    return io.imread(path)

def imread5(path):
    img = Image.open(path)
    return np.asarray(img)

def numpy_load(path):
    return np.load(path)

def pickle_load(path):
    with open(path, mode='rb') as f:
        return pickle.load(f)
    
def _pickle_load(path):
    with open(path, mode='rb') as f:
        return _pickle.load(f)

%timeit img = imread1(img_path)
%timeit img = imread2(img_path)
%timeit img = imread3(img_path)
%timeit img = imread4(img_path)
%timeit img = imread5(img_path)
%timeit img = numpy_load(npy_path)
%timeit img = pickle_load(pickle_path_1)
%timeit img = pickle_load(pickle_path_2)
%timeit img = pickle_load(pickle_path_3)
%timeit img = pickle_load(pickle_path_4)
%timeit img = _pickle_load(pickle_path_4)

Data size comparison

The size of a 512x512 .png image saved with numpy and pickle is as follows.

Library Data type size
raw data - 236 KB
numpy np.uint8 820 KB
pickle(protocol=1) np.uint8 820 KB
pickle(protocol=2) np.uint8 820 KB
pickle(protocol=3) np.uint8 787 KB
pickle(protocol=4) np.uint8 787 KB
numpy np.float32 3.1 MB
pickle(protocol=1) np.float32 4.9 MB
pickle(protocol=2) np.float32 4.8 MB
pickle(protocol=3) np.float32 3.1 MB
pickle(protocol=4) np.float32 3.1 MB

It was found that even np.uint8 occupies more than three times the capacity of the original data.

If you have enough storage space and want to increase the reading speed as much as possible, it seems better to convert it once so that it can be read easily with npy or pickle.

Recommended Posts

Faster loading of Python images
Pixel manipulation of images in Python
Introduction of Python
Basics of Python ①
Basics of python ①
[Python] Accelerates loading of time series CSV
Copy of python
Get rid of DICOM images in Python
Introduction of Python
[Python] Takes representative values ​​of multiple images [Numpy]
Getting rid of DICOM images in Python Part 2
Anonymous upload of images using Imgur API (using Python)
[Python] Operation of enumerate
Unification of Python environment
Copy of python preferences
Basics of Python scraping basics
[python] behavior of argmax
Usage of Python locals ()
the zen of Python
Installation of Python 3.3 rc1
# 4 [python] Basics of functions
Dynamic loading of modules
Basic knowledge of Python
Sober trivia of python3
Summary of Python arguments
Basics of python: Output
Installation of matplotlib (Python 3.3.2)
Application of Python 3 vars
Various processing of Python
Try projective transformation of images using OpenCV with Python
I tried "morphology conversion" of images with Python + OpenCV
Save images using python3 requests
Bordering images with python Part 1
Base64 encoding images in Python 3
Towards the retirement of Python2
Summary of python file operations
Summary of Python3 list operations
[Python] Loading multi-level self-made modules
Python --Quick start of logging
Recommendation of binpacking library of python
[python] Value of function object (?)
Python --Check type of values
[Python] Etymology of python function names
About the ease of Python
Static analysis of Python programs
Equivalence of objects in Python
Introduction of activities applying Python
Optimal placement of multiple images
python> Handling of 2D arrays
Install multiple versions of Python
Version upgrade of python Anaconda
Handling of python on mac
python: Basics of using scikit-learn ①
2.x, 3.x character code of python
Comparison of 4 Python web frameworks
Simple FPS measurement of python
Check OpenSSL version of python 2.6
Python release cycle is faster!
Python implementation of particle filters
Post processing of python (NG)
Post images of Papillon regularly on Python + AWS Lambda + Slack