[PYTHON] Creating a dataset loader

Introduction

--Give a batch size from the learning program so that you can get the training image and training label, and you can get the test image and test label. --I referred to the implementation of TensorFlow below. - https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/contrib/learn/python/learn/datasets/mnist.py --The part that reads the learning image, learning label, test image, and test label that were previously pickle has been modified. --The complete source is here.

data set

――As mentioned at the beginning, we refer to the following. - https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/contrib/learn/python/learn/datasets/mnist.py

class DataSet():
    """Data set management."""

    def __init__(self, images, labels):
        self._num_examples = images.shape[0]
        images = images.reshape(images.shape[0], images.shape[1] * images.shape[2])
        images = images.astype(numpy.float32)
        images = numpy.multiply(images, 1.0 / 255.0)
        self._images = images
        self._labels = labels
        self._epochs_completed = 0
        self._index_in_epoch = 0
def dense_to_one_hot(labels_dense, num_classes):
    """Convert class labels from scalars to one-hot vectors."""
    num_labels = labels_dense.shape[0]
    index_offset = numpy.arange(num_labels) * num_classes
    labels_one_hot = numpy.zeros((num_labels, num_classes))
    labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
    return labels_one_hot

Pickle data reading

--Load the image and label of pickle into the above dataset.

Data reading

--Read the original data or padded data based on the configuration file.

def load_data(one_hot=False, validation_size=0):
    """Configure dataset.Read according to py."""

    train_num = AUGMENT_NUM if USE_AUGMENT else 0
    datasets_file = os.path.join(DATASETS_PATH, ','.join(CLASSES), '{}x{}-{}.pickle'.format(IMG_ROWS, IMG_COLS, train_num))

    with open(datasets_file, 'rb') as fin:
        (train_images, train_labels), (test_images, test_labels) = pickle.load(fin)

Label one hot conversion

--This time, the read label data is converted to ʻone_hotand connected to the subsequent process. --For example, if the labels are Sunny: 1, Cloudy: 2, Rain: 3, Sunny: (1, 0, 0), Cloudy: (0, 1, 0), Rain: (0,, It is a conversion to a shape like 0, 1). --For more information, search formachine learning one hot`.

    if one_hot:
        num_classes = len(numpy.unique(train_labels))
        train_labels = dense_to_one_hot(train_labels, num_classes)
        test_labels = dense_to_one_hot(test_labels, num_classes)

Image and label dataset classification

--Every time you load the dataset, the position is shuffled. ――Validation may be required, so we are trying to cut it out from the training data. ――Finally, we are classifying the data set for learning, validation, and testing.

    perm = numpy.arange(train_images.shape[0])
    numpy.random.shuffle(perm)
    train_images = train_images[perm]
    train_labels = train_labels[perm]

    validation_images = train_images[:validation_size]
    validation_labels = train_labels[:validation_size]
    train_images = train_images[validation_size:]
    train_labels = train_labels[validation_size:]

    train = DataSet(train_images, train_labels)
    validation = DataSet(validation_images, validation_labels)
    test = DataSet(test_images, test_labels)

    return Datasets(train=train, validation=validation, test=test)

in conclusion

--Created a dataset loader. The part of the original data that reads the pickle data has been modified. ――However, these days, you can hide this area and program it, so there is little need to implement it. I think it's the first and the last. ――Next time, I would like to create a learning model.

Recommended Posts

Creating a dataset loader
[Day 9] Creating a model
Creating a Home screen
4. Creating a structured program
Creating a scraping tool
"Creating a stock price dataset" and "Implementing AR (1)"
Try creating a CRUD function
Block device RAM Disk Creating a device
Creating a wav file split program
Step by Step for creating a Dockerfile
Creating a decision tree with scikit-learn
Creating a Flask server with Docker
Creating a simple table using prettytable
Creating a simple app with flask
Creating a web application using Flask ①
[Python] Creating a scraping tool Memo
Precautions when creating a Python generator
Creating a learning model using MNIST
Creating a web application using Flask ③
When creating a matrix in a list
Creating a web application using Flask ④
[Python] Chapter 03-01 turtle graphics (creating a turtle)
Creating a simple PowerPoint file with Python
Creating a LINE bot ~ Creating, deploying, and launching ~
Commands for creating a new django project
[Colab] How to copy a huge dataset
Creating a python virtual environment on Windows
Memo for creating a text formatting tool
[Python] Creating a stock price drawdown chart
Image segmentation with CaDIS: a Cataract Dataset
Creating a shell script to write a diary
Memo about Sphinx Part 1 (Creating a project)
Creating a cholera map for John Snow
Creating a virtual environment in an Anaconda environment
Creating a development environment for machine learning
python: Creating a ramen timer (pyttsx3, time)