Introduction

Last time Create training data using the collected image data. Since it takes a long time to calculate if the image data is passed to the tensor flow as it is, it is converted to the numpy array format to shorten the calculation time.

Source code

import

from PIL import Image
import os, glob
import numpy as np
from sklearn import model_selection

Preparation for conversion process

classes = ["monkey", "boar", "crow"]
num_classes = len(classes)
image_size = 50

X = []
Y = []

This time, we will classify monkey, boar, and crow, so we will store the keywords. The image size is unified to 50x50. X and Y are labels that indicate the image data and whether the image is monkey (0), boar (1), or crow (2), respectively.

for index, classlabel in enumerate(classes):
    photos_dir = "./" + classlabel
    files = glob.glob(photos_dir + "/*.jpg ")
    for i, file in enumerate(files):
        if i >= 141: break # monkey,boar,crow Adjust to the minimum number of data for each
        image = Image.open(file)
        image = image.convert("RGB")
        image = image.resize((image_size, image_size))
        data = np.asarray(image)
        X.append(data)
        Y.append(index)
X = np.array(X)
Y = np.array(Y)

glob () is a method that can get a list of files by matching wildcard patterns, and the following data is stored in files.

['./monkey\\49757184328.jpg', 
 './monkey\\49767449258.jpg', 
 ...

For each image, open the image, convert it to RGB 256 gradation format, and resize it to 50x50. Then convert it to a numpy array format (which seems to be faster than a Python list).

The X and Y created in this way contain the following data.

`X`


(423, 50, 50, 3)Array of
[[[[ 89  92  60]
   [ 85  84  52]
   [ 91  84  51]
   ...
   [177 178  24]
   [142 145  15]
   [231 219  35]]
   ...

`Y`


423 array
[0 0 ... 1 1 ... 2 2 ...]

Digression

Two methods are used to change to a numpy array, such as data = np.asarray (image) and X = np.array (X). The behavior is the same when converting from a list to a numpy array, but the behavior is different when converting from a numpy array to a numpy array. Reference: https://punhundon-lifeshift.com/array_asarray

Saving training data

Use the train_test_split method to split X and Y into training data and model validation data and save them with the file name" animal.npy ".

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, Y)
xy = (X_train, X_test, y_train, y_test)
np.save("./animal.npy", xy)

X_train and y_train are in an array of 317, X_test and y_test are an array of 106. That is, about 75% of the data of X and Y is divided into train, and about 25% of data is divided into test.

[PYTHON] Creating training data