[Python] How to read data from CIFAR-10 and CIFAR-100

CIFAR-10 and CIFAR-100 are a dataset of 80 million labeled color images with a size of 32x32.

Download the data from the data provider. https://www.cs.toronto.edu/~kriz/cifar.html

Download from "CIFAR-10 python version" and "CIFAR-100 python version" and unzip to a suitable location.

Screenshot from 2015-12-05 01:56:04.png

Screenshot from 2015-12-05 01:56:39.png

`input_cifar.py`


import cPickle
import numpy as np
import os

def unpickle(file):
    fo = open(file, 'rb')
    dict = cPickle.load(fo)
    fo.close()
    return dict
    
def conv_data2image(data):
    return np.rollaxis(data.reshape((3,32,32)),0,3)
    
def get_cifar10(folder):
    tr_data = np.empty((0,32*32*3))
    tr_labels = np.empty(1)
    '''
    32x32x3
    '''
    for i in range(1,6):
        fname = os.path.join(folder, "%s%d" % ("data_batch_", i))
        data_dict = unpickle(fname)
        if i == 1:
            tr_data = data_dict['data']
            tr_labels = data_dict['labels']
        else:
            tr_data = np.vstack((tr_data, data_dict['data']))
            tr_labels = np.hstack((tr_labels, data_dict['labels']))
    
    data_dict = unpickle(os.path.join(folder, 'test_batch'))
    te_data = data_dict['data']
    te_labels = np.array(data_dict['labels'])
    
    bm = unpickle(os.path.join(folder, 'batches.meta'))
    label_names = bm['label_names']
    return tr_data, tr_labels, te_data, te_labels, label_names

def get_cifar100(folder):
    train_fname = os.path.join(folder,'train')
    test_fname  = os.path.join(folder,'test')
    data_dict = unpickle(train_fname)
    train_data = data_dict['data']
    train_fine_labels = data_dict['fine_labels']
    train_coarse_labels = data_dict['coarse_labels']
    
    data_dict = unpickle(test_fname)
    test_data = data_dict['data']
    test_fine_labels = data_dict['fine_labels']
    test_coarse_labels = data_dict['coarse_labels']
    
    bm = unpickle(os.path.join(folder, 'meta'))
    clabel_names = bm['coarse_label_names']
    flabel_names = bm['fine_label_names']
    
    return train_data, np.array(train_coarse_labels), np.array(train_fine_labels), test_data, np.array(test_coarse_labels), np.array(test_fine_labels), clabel_names, flabel_names

if __name__ == '__main__':
    datapath = "./data/cifar-10-batches-py"
    datapath2 = "./data/cifar-100-python"
    
    tr_data10, tr_labels10, te_data10, te_labels10, label_names10 = get_cifar10(datapath)
    tr_data100, tr_clabels100, tr_flabels100, te_data100, te_clabels100, te_flabels100, clabel_names100, flabel_names100 = get_cifar100(datapath2)

Paste the above code into input_cifar.py, create a data folder in the folder where input_cifar.py is, and put your Dataset there When input_cifar.py is executed, it will be as follows.

CIFAR-10

`ipython`


In [1]: %run input_cifar.py
In [2]: tr_data10.shape
Out[2]: (50000, 3072)
In [3]: tr_labels10.shape
Out[3]: (50000,)
In [4]: te_data10.shape
Out[4]: (10000, 3072)
In [5]: te_labels10.shape
Out[5]: (10000,)
In [6]: label_names10
Out[6]: 
['airplane',
 'automobile',
 'bird',
 'cat',
 'deer',
 'dog',
 'frog',
 'horse',
 'ship',
 'truck']

In CIFAR-10 and CIFAR-100, the data is divided into 50,000 training data and 10,000 test data. To extract the 0th training data, do as follows.

`ipython`


In [7]: img0 = tr_data10[0]

The image is a color image with a size of 32x32. The data is stored in Plane format in the order of R, G, B. From the beginning to 1024 is the R Plane, from there to 1024 is the G Plane, and from there to the end is the B Plane.

When displaying an image, the data is in one column, so you have to sort it to 32x32x3. When using scikit-image imshow, you can arrange them in the order of R, G, B, R, G, B, so do as follows.

`ipython`


In [8]: img0 = img0.reshape((3,32,32))
In [9]: img0.shape
Out[9]: (3, 32, 32)
In [10]: import numpy as np
In [11]: img1 = np.rollaxis(img0, 0, 3)
In [12]: img1.shape
Out[12]: (32, 32, 3)
In [13]: from skimage import io
In [14]: io.imshow(img1)
In [15]: io.show()

The 0th is frog when you look at the label, but it is not clear even if you look at it because it is reduced to 32x32.

CIFAR-100 In CIFAR-100, images are divided into 100 class categories, and the 100 classes are further grouped into 20 superclasses. The super class and class are as follows. The data storage method is the same as CIFAR-10.

Superclass	Classes
aquatic	mammals beaver, dolphin, otter, seal, whale
fish	aquarium fish, flatfish, ray, shark, trout
flowers	orchids, poppies, roses, sunflowers, tulips
food	containers bottles, bowls, cans, cups, plates
fruit and vegetables	apples, mushrooms, oranges, pears, sweet peppers
household electrical devices	clock, computer keyboard, lamp, telephone, television
household furniture	bed, chair, couch, table, wardrobe
insects	bee, beetle, butterfly, caterpillar, cockroach
large carnivores	bear, leopard, lion, tiger, wolf
large man-made outdoor things	bridge, castle, house, road, skyscraper
large natural outdoor scenes	cloud, forest, mountain, plain, sea
large omnivores and herbivores	camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals	fox, porcupine, possum, raccoon, skunk
non-insect invertebrates	crab, lobster, snail, spider, worm
people	baby, boy, girl, man, woman
reptiles	crocodile, dinosaur, lizard, snake, turtle
small mammals	hamster, mouse, rabbit, shrew, squirrel
trees	maple, oak, palm, pine, willow
vehicles 1	bicycle, bus, motorcycle, pickup truck, train
vehicles 2	lawn-mower, rocket, streetcar, tank, tractor

The label name of the Superclass is in clabel_names100, and the label name of the class is in flabel_names100.

`ipython`


In [6]: len(clabel_names100)
Out[6]: 20
In [7]: len(flabel_names100)
Out[7]: 100
In [8]: clabel_names100
Out[8]: 
['aquatic_mammals',
 'fish',
 'flowers',
 'food_containers',
 'fruit_and_vegetables',
 'household_electrical_devices',


 'reptiles',
 'small_mammals',
 'trees',
 'vehicles_1',
 'vehicles_2']
In [9]: flabel_names100
Out[9]: 
['apple',
 'aquarium_fish',
 'baby',
 'bear',
 'beaver',
 'bed',
 'bee',
 'beetle',
 'bicycle',
 'bottle',


 'willow_tree',
 'wolf',
 'woman',
 'worm']
In [10]: