Last time Create training data using the collected image data. Since it takes a long time to calculate if the image data is passed to the tensor flow as it is, it is converted to the numpy array format to shorten the calculation time.
from PIL import Image import os, glob import numpy as np from sklearn import model_selection
classes = ["monkey", "boar", "crow"] num_classes = len(classes) image_size = 50 X =  Y = 
This time, we will classify
crow, so we will store the keywords.
The image size is unified to 50x50.
Y are labels that indicate the image data and whether the image is monkey (0), boar (1), or crow (2), respectively.
for index, classlabel in enumerate(classes): photos_dir = "./" + classlabel files = glob.glob(photos_dir + "/*.jpg ") for i, file in enumerate(files): if i >= 141: break # monkey,boar,crow Adjust to the minimum number of data for each image = Image.open(file) image = image.convert("RGB") image = image.resize((image_size, image_size)) data = np.asarray(image) X.append(data) Y.append(index) X = np.array(X) Y = np.array(Y)
glob () is a method that can get a list of files by matching wildcard patterns, and the following data is stored in files.
['./monkey\\49757184328.jpg', './monkey\\49767449258.jpg', ...
For each image, open the image, convert it to RGB 256 gradation format, and resize it to 50x50. Then convert it to a numpy array format (which seems to be faster than a Python list).
Y created in this way contain the following data.
(423, 50, 50, 3)Array of [[[[ 89 92 60] [ 85 84 52] [ 91 84 51] ... [177 178 24] [142 145 15] [231 219 35]] ...
423 array [0 0 ... 1 1 ... 2 2 ...]
Two methods are used to change to a numpy array, such as
data = np.asarray (image) and
X = np.array (X). The behavior is the same when converting from a list to a numpy array, but the behavior is different when converting from a numpy array to a numpy array.
train_test_split method to split
Y into training data and model validation data and save them with the file name" animal.npy ".
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, Y) xy = (X_train, X_test, y_train, y_test) np.save("./animal.npy", xy)
y_train are in an array of 317,
y_test are an array of 106.
That is, about 75% of the data of
Y is divided into train, and about 25% of data is divided into test.