CIFAR-10 and CIFAR-100 are a dataset of 80 million labeled color images with a size of 32x32.
Download the data from the data provider. https://www.cs.toronto.edu/~kriz/cifar.html
Download from "CIFAR-10 python version" and "CIFAR-100 python version" and unzip to a suitable location.
input_cifar.py
import cPickle
import numpy as np
import os
def unpickle(file):
fo = open(file, 'rb')
dict = cPickle.load(fo)
fo.close()
return dict
def conv_data2image(data):
return np.rollaxis(data.reshape((3,32,32)),0,3)
def get_cifar10(folder):
tr_data = np.empty((0,32*32*3))
tr_labels = np.empty(1)
'''
32x32x3
'''
for i in range(1,6):
fname = os.path.join(folder, "%s%d" % ("data_batch_", i))
data_dict = unpickle(fname)
if i == 1:
tr_data = data_dict['data']
tr_labels = data_dict['labels']
else:
tr_data = np.vstack((tr_data, data_dict['data']))
tr_labels = np.hstack((tr_labels, data_dict['labels']))
data_dict = unpickle(os.path.join(folder, 'test_batch'))
te_data = data_dict['data']
te_labels = np.array(data_dict['labels'])
bm = unpickle(os.path.join(folder, 'batches.meta'))
label_names = bm['label_names']
return tr_data, tr_labels, te_data, te_labels, label_names
def get_cifar100(folder):
train_fname = os.path.join(folder,'train')
test_fname = os.path.join(folder,'test')
data_dict = unpickle(train_fname)
train_data = data_dict['data']
train_fine_labels = data_dict['fine_labels']
train_coarse_labels = data_dict['coarse_labels']
data_dict = unpickle(test_fname)
test_data = data_dict['data']
test_fine_labels = data_dict['fine_labels']
test_coarse_labels = data_dict['coarse_labels']
bm = unpickle(os.path.join(folder, 'meta'))
clabel_names = bm['coarse_label_names']
flabel_names = bm['fine_label_names']
return train_data, np.array(train_coarse_labels), np.array(train_fine_labels), test_data, np.array(test_coarse_labels), np.array(test_fine_labels), clabel_names, flabel_names
if __name__ == '__main__':
datapath = "./data/cifar-10-batches-py"
datapath2 = "./data/cifar-100-python"
tr_data10, tr_labels10, te_data10, te_labels10, label_names10 = get_cifar10(datapath)
tr_data100, tr_clabels100, tr_flabels100, te_data100, te_clabels100, te_flabels100, clabel_names100, flabel_names100 = get_cifar100(datapath2)
Paste the above code into input_cifar.py, create a data folder in the folder where input_cifar.py is, and put your Dataset there When input_cifar.py is executed, it will be as follows.
CIFAR-10
ipython
In [1]: %run input_cifar.py
In [2]: tr_data10.shape
Out[2]: (50000, 3072)
In [3]: tr_labels10.shape
Out[3]: (50000,)
In [4]: te_data10.shape
Out[4]: (10000, 3072)
In [5]: te_labels10.shape
Out[5]: (10000,)
In [6]: label_names10
Out[6]:
['airplane',
'automobile',
'bird',
'cat',
'deer',
'dog',
'frog',
'horse',
'ship',
'truck']
In CIFAR-10 and CIFAR-100, the data is divided into 50,000 training data and 10,000 test data. To extract the 0th training data, do as follows.
ipython
In [7]: img0 = tr_data10[0]
The image is a color image with a size of 32x32. The data is stored in Plane format in the order of R, G, B. From the beginning to 1024 is the R Plane, from there to 1024 is the G Plane, and from there to the end is the B Plane.
When displaying an image, the data is in one column, so you have to sort it to 32x32x3. When using scikit-image imshow, you can arrange them in the order of R, G, B, R, G, B, so do as follows.
ipython
In [8]: img0 = img0.reshape((3,32,32))
In [9]: img0.shape
Out[9]: (3, 32, 32)
In [10]: import numpy as np
In [11]: img1 = np.rollaxis(img0, 0, 3)
In [12]: img1.shape
Out[12]: (32, 32, 3)
In [13]: from skimage import io
In [14]: io.imshow(img1)
In [15]: io.show()
The 0th is frog when you look at the label, but it is not clear even if you look at it because it is reduced to 32x32.
CIFAR-100 In CIFAR-100, images are divided into 100 class categories, and the 100 classes are further grouped into 20 superclasses. The super class and class are as follows. The data storage method is the same as CIFAR-10.
Superclass | Classes |
---|---|
aquatic | mammals beaver, dolphin, otter, seal, whale |
fish | aquarium fish, flatfish, ray, shark, trout |
flowers | orchids, poppies, roses, sunflowers, tulips |
food | containers bottles, bowls, cans, cups, plates |
fruit and vegetables | apples, mushrooms, oranges, pears, sweet peppers |
household electrical devices | clock, computer keyboard, lamp, telephone, television |
household furniture | bed, chair, couch, table, wardrobe |
insects | bee, beetle, butterfly, caterpillar, cockroach |
large carnivores | bear, leopard, lion, tiger, wolf |
large man-made outdoor things | bridge, castle, house, road, skyscraper |
large natural outdoor scenes | cloud, forest, mountain, plain, sea |
large omnivores and herbivores | camel, cattle, chimpanzee, elephant, kangaroo |
medium-sized mammals | fox, porcupine, possum, raccoon, skunk |
non-insect invertebrates | crab, lobster, snail, spider, worm |
people | baby, boy, girl, man, woman |
reptiles | crocodile, dinosaur, lizard, snake, turtle |
small mammals | hamster, mouse, rabbit, shrew, squirrel |
trees | maple, oak, palm, pine, willow |
vehicles 1 | bicycle, bus, motorcycle, pickup truck, train |
vehicles 2 | lawn-mower, rocket, streetcar, tank, tractor |
The label name of the Superclass is in clabel_names100, and the label name of the class is in flabel_names100.
ipython
In [6]: len(clabel_names100)
Out[6]: 20
In [7]: len(flabel_names100)
Out[7]: 100
In [8]: clabel_names100
Out[8]:
['aquatic_mammals',
'fish',
'flowers',
'food_containers',
'fruit_and_vegetables',
'household_electrical_devices',
'reptiles',
'small_mammals',
'trees',
'vehicles_1',
'vehicles_2']
In [9]: flabel_names100
Out[9]:
['apple',
'aquarium_fish',
'baby',
'bear',
'beaver',
'bed',
'bee',
'beetle',
'bicycle',
'bottle',
'willow_tree',
'wolf',
'woman',
'worm']
In [10]:
Recommended Posts