[Python] Lesen von Daten aus CIFAR-10 und CIFAR-100

CIFAR-10 und CIFAR-100 sind ein Datensatz von 80 Millionen beschrifteten Farbbildern mit einer Größe von 32 x 32.

Laden Sie Daten vom Datenprovider herunter. https://www.cs.toronto.edu/~kriz/cifar.html

Von "CIFAR-10 Python-Version" und "CIFAR-100 Python-Version" herunterladen und an einen geeigneten Ort entpacken

Screenshot from 2015-12-05 01:56:04.png

Screenshot from 2015-12-05 01:56:39.png

`input_cifar.py`


import cPickle
import numpy as np
import os

def unpickle(file):
    fo = open(file, 'rb')
    dict = cPickle.load(fo)
    fo.close()
    return dict
    
def conv_data2image(data):
    return np.rollaxis(data.reshape((3,32,32)),0,3)
    
def get_cifar10(folder):
    tr_data = np.empty((0,32*32*3))
    tr_labels = np.empty(1)
    '''
    32x32x3
    '''
    for i in range(1,6):
        fname = os.path.join(folder, "%s%d" % ("data_batch_", i))
        data_dict = unpickle(fname)
        if i == 1:
            tr_data = data_dict['data']
            tr_labels = data_dict['labels']
        else:
            tr_data = np.vstack((tr_data, data_dict['data']))
            tr_labels = np.hstack((tr_labels, data_dict['labels']))
    
    data_dict = unpickle(os.path.join(folder, 'test_batch'))
    te_data = data_dict['data']
    te_labels = np.array(data_dict['labels'])
    
    bm = unpickle(os.path.join(folder, 'batches.meta'))
    label_names = bm['label_names']
    return tr_data, tr_labels, te_data, te_labels, label_names

def get_cifar100(folder):
    train_fname = os.path.join(folder,'train')
    test_fname  = os.path.join(folder,'test')
    data_dict = unpickle(train_fname)
    train_data = data_dict['data']
    train_fine_labels = data_dict['fine_labels']
    train_coarse_labels = data_dict['coarse_labels']
    
    data_dict = unpickle(test_fname)
    test_data = data_dict['data']
    test_fine_labels = data_dict['fine_labels']
    test_coarse_labels = data_dict['coarse_labels']
    
    bm = unpickle(os.path.join(folder, 'meta'))
    clabel_names = bm['coarse_label_names']
    flabel_names = bm['fine_label_names']
    
    return train_data, np.array(train_coarse_labels), np.array(train_fine_labels), test_data, np.array(test_coarse_labels), np.array(test_fine_labels), clabel_names, flabel_names

if __name__ == '__main__':
    datapath = "./data/cifar-10-batches-py"
    datapath2 = "./data/cifar-100-python"
    
    tr_data10, tr_labels10, te_data10, te_labels10, label_names10 = get_cifar10(datapath)
    tr_data100, tr_clabels100, tr_flabels100, te_data100, te_clabels100, te_flabels100, clabel_names100, flabel_names100 = get_cifar100(datapath2)

Fügen Sie den obigen Code in input_cifar.py ein, erstellen Sie einen Datenordner in dem Ordner, in dem sich input_cifar.py befindet, und legen Sie den Datensatz dort ab Wenn input_cifar.py ausgeführt wird, sieht es wie folgt aus.

CIFAR-10

`ipython`


In [1]: %run input_cifar.py
In [2]: tr_data10.shape
Out[2]: (50000, 3072)
In [3]: tr_labels10.shape
Out[3]: (50000,)
In [4]: te_data10.shape
Out[4]: (10000, 3072)
In [5]: te_labels10.shape
Out[5]: (10000,)
In [6]: label_names10
Out[6]: 
['airplane',
 'automobile',
 'bird',
 'cat',
 'deer',
 'dog',
 'frog',
 'horse',
 'ship',
 'truck']

In CIFAR-10 und CIFAR-100 sind die Daten in 50.000 Trainingsdaten und 10.000 Testdaten unterteilt. Gehen Sie wie folgt vor, um die 0. Trainingsdaten zu extrahieren.

`ipython`


In [7]: img0 = tr_data10[0]

Das Bild ist ein Farbbild mit einer Größe von 32 x 32. Die Daten werden im Ebenenformat in der Reihenfolge R, G, B gespeichert. Von Anfang bis 1024 ist R-Ebene, von dort bis 1024 ist G-Ebene und von dort bis zum Ende ist B-Ebene.

Wenn Sie ein Bild anzeigen, befinden sich die Daten in einer Spalte, sodass Sie sie nach 32 x 32 x 3 sortieren müssen. Wenn Sie die Imshow des Scikit-Bildes verwenden, kann es in der Reihenfolge R, G, B, R, G, B angeordnet werden. Gehen Sie also wie folgt vor.

`ipython`


In [8]: img0 = img0.reshape((3,32,32))
In [9]: img0.shape
Out[9]: (3, 32, 32)
In [10]: import numpy as np
In [11]: img1 = np.rollaxis(img0, 0, 3)
In [12]: img1.shape
Out[12]: (32, 32, 3)
In [13]: from skimage import io
In [14]: io.imshow(img1)
In [15]: io.show()

Der 0. ist Frosch, wenn Sie sich das Etikett ansehen, aber es ist nicht klar, selbst wenn Sie es sich ansehen, weil es auf 32x32 reduziert ist.

CIFAR-100 In CIFAR-100 werden Bilder in 100 Klassenkategorien unterteilt, und die 100 Klassen werden weiter in 20 Superklassen gruppiert. Die Superklasse und Klasse sind wie folgt. Die Datenspeichermethode ist dieselbe wie bei CIFAR-10.

Superclass	Classes
aquatic	mammals beaver, dolphin, otter, seal, whale
fish	aquarium fish, flatfish, ray, shark, trout
flowers	orchids, poppies, roses, sunflowers, tulips
food	containers bottles, bowls, cans, cups, plates
fruit and vegetables	apples, mushrooms, oranges, pears, sweet peppers
household electrical devices	clock, computer keyboard, lamp, telephone, television
household furniture	bed, chair, couch, table, wardrobe
insects	bee, beetle, butterfly, caterpillar, cockroach
large carnivores	bear, leopard, lion, tiger, wolf
large man-made outdoor things	bridge, castle, house, road, skyscraper
large natural outdoor scenes	cloud, forest, mountain, plain, sea
large omnivores and herbivores	camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals	fox, porcupine, possum, raccoon, skunk
non-insect invertebrates	crab, lobster, snail, spider, worm
people	baby, boy, girl, man, woman
reptiles	crocodile, dinosaur, lizard, snake, turtle
small mammals	hamster, mouse, rabbit, shrew, squirrel
trees	maple, oak, palm, pine, willow
vehicles 1	bicycle, bus, motorcycle, pickup truck, train
vehicles 2	lawn-mower, rocket, streetcar, tank, tractor

Der Labelname von Superclass befindet sich in clabel_names100 und der Labelname der Klasse in flabel_names100.

`ipython`


In [6]: len(clabel_names100)
Out[6]: 20
In [7]: len(flabel_names100)
Out[7]: 100
In [8]: clabel_names100
Out[8]: 
['aquatic_mammals',
 'fish',
 'flowers',
 'food_containers',
 'fruit_and_vegetables',
 'household_electrical_devices',


 'reptiles',
 'small_mammals',
 'trees',
 'vehicles_1',
 'vehicles_2']
In [9]: flabel_names100
Out[9]: 
['apple',
 'aquarium_fish',
 'baby',
 'bear',
 'beaver',
 'bed',
 'bee',
 'beetle',
 'bicycle',
 'bottle',


 'willow_tree',
 'wolf',
 'woman',
 'worm']
In [10]: