Load_data self-made to run Python MNIST sample code on your own dataset

Introduction

When you start to be interested in deep learning and AI and try to move the sample code, many samples using a dataset called MNIST will appear. MNIST is a data set of handwritten characters classified by labels from 0 to 9 and is a grayscale image with a resolution of 28x28.

The sample code itself can be executed as long as the environment can be built, I want to use the original dataset I created myself, and when I look at the MNIST code, the creation of the dataset is almost complete with the following line.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Suddenly creating your own dataset from here is a very high hurdle. So, in this article, instead of mnist.load_data, we will implement a function to create your own dataset in mnist format.

mnist.load_data() The MNIST specifications are also featured in the official documentation. https://keras.io/ja/datasets/

The usage is the same as the above sample. x_train and y_train store training data and labels. x_test and y_test also store a set of verification data.

As for the training data, this article is very easy to understand, so I will share it. Machine learning training data division and learning / prediction / verification

Self-made load_data ()

Preparations for handling the load_data that I make this time -Save images separately for each folder Only this!

Here is a list of imports and source code.

import.txt


from PIL import Image
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
import os, glob

my_load_data().py


def my_load_data(folder_str, size):
    print('load_dataset...')
    folders = folder_str.split('__')
    X = []
    Y = []
    for index, fol_name in enumerate(folders):
        files = glob.glob(fol_name + '/*.jpg')
        for file in files:
            image = Image.open(file)
            image = image.resize((size, size))
            image = image.convert('L')
            data = np.asarray(image)
            X.append(data)
            Y.append(index)
    X = np.array(X)
    Y = np.array(Y)
    oh_encoder = OneHotEncoder(categories='auto', sparse=False)
    onehot = oh_encoder.fit_transform(pd.DataFrame(Y))
    X_train, X_test, y_train, y_test = train_test_split(X, onehot, test_size=0.2)
    return X_train, X_test, y_train, y_test

For the formal argument folder_str, specify the folder where the image is divided. When labeling, multiple folders are required, so specify the folder names separated by'__'. The sample code has a jpg extension, but you can change it. size is the resolution. Since MNIST is 28x28, specify 28. The label seems to be one hot, so I will convert it for the time being. This is the main function when actually using the above function.

sample.py


import argparse

def main():
    parser = argparse.ArgumentParser(description='sample')
    parser.add_argument('--folder', '-i')
    parser.add_argument('--size', '-s', type=int, default=28)
    args = parser.parse_args()
    X_train, X_test, y_train, y_test = my_load_data(args.folder, args.size)

    #Verification
    print('X_train',X_train)
    print('y_train',y_train)

Example of execution command

python sample.py --folder f1__f2__f3 -s 28

f1, f2, and f3 assume the folder containing the images in the current directory.

in conclusion

This time, I created my_load_data so that I can try MNIST load_data with my own data. We hope you enjoy moving the MNIST sample. If you have any problems with the operation or if you have any questions, please feel free to comment.

In writing this article, I borrowed the wisdom of various ancestors. I will write it at the end. Thank you for reading. If you like LGTM, please!

reference

[How to convert image data to numpy format](https://newtechnologylifestyle.net/%E7%94%BB%E5%83%8F%E3%83%87%E3%83%BC%E3%82%BF % E3% 81% 8B% E3% 82% 89numpy% E5% BD% A2% E5% BC% 8F% E3% 81% AB% E5% A4% 89% E6% 8F% 9B% E3% 81% 99% E3 % 82% 8B% E6% 96% B9% E6% B3% 95 /)

Understanding Keras VAE Image Anomaly Detection

Image reproduction with convolution autoencoder, noise removal, segmentation

Recommended Posts

Load_data self-made to run Python MNIST sample code on your own dataset
[Python] Register your own library on PyPI
Run Python code on A2019 Community Edition
Create a shortcut to run a Python file in VScode on your terminal
Put MicroPython on Windows to run ESP32 on Python
Randomly sample MNIST data to create a dataset
Run Python on Apache to view InfluxDB data
Memo to create your own Box with Pepper's Python
[Introduction to Udemy Python 3 + Application] 66. Creating your own exceptions
Take your own peak memory usage on Linux & Python
Try to improve your own intro quiz in Python
[Road to intermediate Python] Define in in your own class
Rewrite Python2 code to Python3 (2to3)
Detailed explanation How to run the sample code of UNIX programming 3rd edition on Mac
Sample to put Python Flask web app on Azure App Service (Web App)
Run the intellisense of your own python library with VScode.
Run BNO055 python sample code with I2C (Raspberry Pi 3B)
I want to be able to run Python in VS Code
Try HeloWorld in your own language (with How to & code)
Send push notifications to iOS apps with Python2 (with sample code)
Run the output code on the local web server as "A, pretending to be B" in python
Run Openpose on Python (Windows)
Run Tensorflow 2.x on Python 3.7
Update python on Mac to 3.7-> 3.8
Run Python CGI on CORESERVER
Run unix command on python
How to run Notepad ++ Python
[Python] Sample code for Python grammar
Convert python 3.x code to python 2.x
How to install OpenCV on Cloud9 and run it in Python
How to write code to access python dashDB on Bluemix or local
Simple code to call a python program from Javascript on EC2
Try to log in to Netflix automatically using python on your PC