[PYTHON] [Introduction to pytorch-lightning] How to use torchvision.transforms and how to freely create your own dataset ♬

Since I want to use various data, I tried various ways to create my own dataset, so I will summarize it. It is an indispensable technology for performing denoising, coloring, domain conversion, etc.

This time, I will summarize the two elements. One is how to use various classes of torchvision.transforms and how to create your own class, and the other is how to create your own dataset using them. In the latter half, there are the following references, but since we have done a lot of trial and error, we will post the results.

【reference】 ① Explanation of transforms, Datasets, Dataloader of pyTorch and creation and use of self-made Dataset (2) I implemented reading Dataset with PyTorch ③TORCHVISION.TRANSFORMS

What i did

・ Organize transforms ・ Apply to autoencoder ・ How to make your own dataset ① In the case of data-label ② In the case of data1-data2-label

・ Organize transforms

The transform appears (defines) in the constructor of pytorch-lighitning as shown below, data processing is easily defined in setup, and that processing is executed at the time of acquisition in Dataloader. In the following, transforms.Normalize ((0.1307,), (0.3081,)) is executed for MNIST data. At first, I would like to summarize this number from what.

The conclusion was that this number is the mean and standard deviation, and the spell was to restandardize each image number so that it falls within this mean and standard deviation. (In the case of an image, it is a process called brightness adjustment, such as brightening something that is too dark or darkening something that is too bright.) In fact, it's unclear where this number came from this time around. It is a good idea to find the average and standard deviation of each average from the whole image, but I could not find any evidence of doing so ⇒ See reference ⑤ below, subtract the average for each channel from each , Seems to be divided by the standard deviation of that channel. But descriptive statistics should. However, it is out of the center of my interest this time, so I will pass it.
In terms of research themes, I think it is interesting to seriously measure the contribution to accuracy while adapting this in various ways, and it is an important theme in object detection (essential and should be done for your own subject). I think

class LitAutoEncoder(pl.LightningModule):

    def __init__(self, data_dir='./'):
        super().__init__()
        self.data_dir = data_dir
        
        # Hardcode some dataset specific attributes
        self.num_classes = 10
        self.classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
        self.dims = (1, 28, 28)
        channels, width, height = self.dims
        self.transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.1307,), (0.3081,))])
        
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 32))
        self.decoder = nn.Sequential(nn.Linear(32, 128), nn.ReLU(), nn.Linear(128, 28 * 28))

    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding
．．．
    def setup(self, stage=None): #train, val,test data split
        # Assign train/val datasets for use in dataloaders
        mnist_full =MNIST(self.data_dir, train=True, transform=self.transform)
        n_train = int(len(mnist_full)*0.8)
        n_val = len(mnist_full)-n_train
        self.mnist_train, self.mnist_val = torch.utils.data.random_split(mnist_full, [n_train, n_val])
        self.mnist_test = MNIST(self.data_dir, train=False, transform=self.transform)

    def train_dataloader(self):
        self.trainloader = DataLoader(self.mnist_train, shuffle=True, drop_last = True, batch_size=32, num_workers=0)
        # get some random training images
        return self.trainloader
．．．

And these transforms were summarized in Reference ③ above. I haven't tried everything here, but I tried to move the functions in the table below that I might use for the time being.

function	Remarks
rotate(x, angle)	Rotate based on angle
to_grayscale(x)	Convert to grayscale
vflip(x)	Flip up and down
hflip(x)	Flip left and right
Resize(imageSize)	Resize to the specified size
Normalize(self.mean, self.std)	Normalize the image with the specified mean and standard deviation
Compose()	()Perform a series of transformations in
ToTensor()	Convert to torch Tensor
ToPILImage()	Convert to PILImage

TORCHVISION.TRANSFORMS class etc.

Compose(transforms)
CenterCrop(size)
ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
FiveCrop(size)
Grayscale(num_output_channels=1)
Pad(padding, fill=0, padding_mode='constant')
RandomAffine(degrees, translate=None, scale=None, shear=None, resample=0, fillcolor=0)
RandomApply(transforms, p=0.5)
RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')
RandomGrayscale(p=0.1)
RandomHorizontalFlip(p=0.5)
RandomPerspective(distortion_scale=0.5, p=0.5, interpolation=2, fill=0)
RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)
RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)
RandomSizedCrop(*args, **kwargs)
RandomVerticalFlip(p=0.5)
Resize(size, interpolation=2)
TenCrop(size, vertical_flip=False)
GaussianBlur(kernel_size, sigma=(0.1, 2.0))

Transforms on PIL Image only;
RandomChoice(transforms)
RandomOrder(transforms)

Transforms on torch.*Tensor only;
LinearTransformation(transformation_matrix, mean_vector)
Normalize(mean, std, inplace=False) output[channel] = (input[channel] - mean[channel]) / std[channel]
RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3), value=0, inplace=False)
ConvertImageDtype(dtype: torch.dtype)

Conversion Transforms;
ToPILImage(mode=None)
ToTensor

Generic Transforms;
Lambda(lambd)

Functional Transforms;
Example: you can apply a functional transform with the same parameters to multiple images like this:...
Example: you can use a functional transform to build transform classes with custom behavior:...
adjust_brightness(img: torch.Tensor, brightness_factor: float) → torch.Tensor
adjust_contrast(img: torch.Tensor, contrast_factor: float) → torch.Tensor
adjust_gamma(img: torch.Tensor, gamma: float, gain: float = 1) → torch.Tensor
adjust_hue(img: torch.Tensor, hue_factor: float) → torch.Tensor
adjust_saturation(img: torch.Tensor, saturation_factor: float) → torch.Tensor
...Omitted below

I will post the code as a bonus. For how to write a class, refer to Reference ④ below. In addition, the execution results of various transforms are posted in Reference ⑤. Furthermore, for how to put gaussian noize, refer to Reference ⑥, and the same code is also posted in Reference ⑦. It is described in Reference ⑤ that you can use your own transform function from transforms.Lambda (function name), but this time it is not used.

PIL.ImageOps.equalize (image, mask = None) etc. can also be used.

from PIL import ImageFilter
img = Image.open("sample.jpg ")

def blur(img):
    """Apply a Gaussian filter.
    """
    return img.filter(ImageFilter.BLUR)
transform = transforms.Lambda(blur)
img = transform(img)
img

【reference】 ④vision/docs/source/transforms.rst ⑤ Pytorch – Transform summary that can be used with torchvision ⑥How to add noise to MNIST dataset when using pytorch Therefore, the following reference ⑦ can be easily executed as sample augmentation. ⑦Pytorch Image Augmentation using Transforms.

・ Apply to autoencoder

The code for pytorch-lightning is below. In the code below, the image is not resized, but it can be done by changing the Network.

Application to autoencoder

class LitAutoEncoder(pl.LightningModule):

    def __init__(self, data_dir='./'):
        super().__init__()
        self.data_dir = data_dir
        
        # Hardcode some dataset specific attributes
        self.num_classes = 10
        self.classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
        #self.classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
        self.dims = (3, 32, 32)
        self.mean = [0.5,0.5,0.5] #[0.485, 0.456, 0.406] #[0.5,0.5,0.5]
        self.std  = [0.25,0.25,0.25] #[0.229, 0.224, 0.225] #[0.5,0.5,0.5]
        self.imageSize = (32,32)
        self.p=0.5
        self.scale=(0.01, 0.05) #(0.02, 0.33)
        self.ratio=(0.3, 0.3) #(0.3, 3.3)
        self.value=0
        self.inplace=False
        #channels, width, height = self.dims
        self.transform = transforms.Compose([
            transforms.Resize(self.imageSize), #Image resizing
            transforms.ToTensor(),
            transforms.Normalize(self.mean, self.std),
            transforms.RandomErasing(p=self.p, scale=self.scale, ratio=self.ratio, value=self.value, inplace=self.inplace),
            MyAddGaussianNoise(0., 0.5)
        ])
        self.encoder = Encoder()
        self.decoder = Decoder()

    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

result Both are output at 1epock, but the output image is better with noise.

No processing	After applying transforms with the above compose
ToTensor(), Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))	Resize(self.imageSize), ToTensor(), Normalize(self.mean, self.std), RandomErasing(...), MyAddGaussianNoise(0., 0.5)
input	input
output	output

・ How to make your own dataset

In the above, if you want to download and use a dataset that is open to the public, you can simply put it in your own Dir with the following code and transform the data by reading it as follows. However, in the case of own data, it is carried out from the point where files and images are read according to the format. cifar10_full =CIFAR10(self.data_dir, train=True, transform=self.transform)

Normal dataset, Dataloader usage code

    def prepare_data(self):
        # download
        CIFAR10(self.data_dir, train=True, download=True)
        CIFAR10(self.data_dir, train=False, download=True)

    def setup(self, stage=None): #train, val,test data split
        # Assign train/val datasets for use in dataloaders
        cifar10_full =CIFAR10(self.data_dir, train=True, transform=self.transform)
        n_train = int(len(cifar10_full)*0.8)
        n_val = len(cifar10_full)-n_train
        self.cifar10_train, self.cifar10_val = torch.utils.data.random_split(cifar10_full, [n_train, n_val])
        self.cifar10_test = CIFAR10(self.data_dir, train=False, transform=self.transform)
    
    def train_dataloader(self):
        self.trainloader = DataLoader(self.cifar10_train, shuffle=True, drop_last = True, batch_size=32, num_workers=0)
        # get some random training images
        return self.trainloader
    
    def val_dataloader(self):
        return DataLoader(self.cifar10_val, shuffle=False, batch_size=32, num_workers=0)
    
    def test_dataloader(self):
        self.testloader = DataLoader(self.cifar10_test, shuffle=False, batch_size=32, num_workers=0)
        return self.testloader

① In the case of data-label

First of all, Basic as in Reference ② is important. In the previous learning of mediapipe, I created and used the following dataset. In the following, the data was read from the csv file, converted to coordinates, and provided out_data and its classification, out_label.

dataset code for previous mediapipe_hands data

class HandsDataset(torch.utils.data.Dataset):
    def __init__(self, data_num, transform=None):
        self.transform = transform
        self.data_num = data_num
        self.data = []
        self.label = []
        df = pd.read_csv('./hands/sample_hands7.csv', sep=',')
        print(df.head(3)) #Data confirmation
        df = df.astype(int)
        x = []
        for j in range(self.data_num):
            x_ = []
            for i in range(0,21,1):
                x__ = [df['{}'.format(2*i)][j],df['{}'.format(2*i+1)][j]]
                x_.append(x__)
            x.append(x_)
        y = df['42'][:self.data_num]

        #The following float()And long()The designation of is the liver of this time
        self.data = torch.from_numpy(np.array(x)).float()
        print(self.data)
        self.label = torch.from_numpy(np.array(y)).long()
        print(self.label)

    def __len__(self):
        return self.data_num

    def __getitem__(self, idx):
        out_data = self.data[idx]
        out_label =  self.label[idx]
        if self.transform:
            out_data = self.transform(out_data)
        return out_data, out_label

This time, we will show the case of providing your own image data as your own dataset. The results are as follows.

dataset code for your own image data

class ImageDataset(torch.utils.data.Dataset):

    def __init__(self, data_num, transform=None):
        self.transform = transform
        self.data_num = data_num
        self.data = []
        self.label = []
        x = []
        y = []
        from_dir = './face/mayuyu/'
        sk = 0
        for path in glob.glob(os.path.join(from_dir, '*.jpg')):    
            image = Image.open(path)
            x.append(np.array(image)/255.)
            y.append(sk)
            sk += 1
        
        self.data = torch.from_numpy(np.array(x)).float()
        self.label = torch.from_numpy(np.array(y)).long()

    def __len__(self):
        return self.data_num

    def __getitem__(self, idx):
        out_data = self.data[idx]
        out_label =  self.label[idx]
        if self.transform:
            out_data = self.transform(out_data)
        return out_data, out_label

mean, std = [0.5,0.5,0.5], [0.25,0.25,0.25]
model = ImageDataset(10, transform = transforms.Normalize(mean, std))
for i in range(10):
    image =  model.data[i]
    print(model.label[i], image)
    plt.title('label_{}'.format(model.label[i]))
    plt.imshow(image)
    plt.pause(1)
    plt.close()

#### ② In the case of data1-data2-label This code downloads the so-called cifar10 dataset, converts it to gray, and provides the dataset at the same time as the original color image. At this time, the original label naturally has a necessary scene, so it is also provided at the same time. Basically, the code of the dataset that outputs the gray image as out_data and the color image as out_label is shown in Reference (1) above, but the original label is also output here at the same time. Also, I tried to make the code as easy to understand as possible. It can be used with the same code below as the number of data, and the number of data according to the number of batches for the data group processed according to trans1 and trans2. In other words, as shown in the execution result, in the case of the following code, 32 pieces of data will be generated, and 4 pieces will be taken out and used.

dataset = ImageDataset(32,transform1 = trans1, transform2 = trans2)
testloader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=0)

Cifar10 Data processed data and unprocessed data, and dataset code for providing label

import numpy as np
import torch
import torchvision
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import cv2
import matplotlib.pyplot as plt
from torchvision.datasets import CIFAR10
from PIL import Image

class ImageDataset(torch.utils.data.Dataset):

    def __init__(self, data_num, transform1 = None, transform2 = None,train = True):

        self.transform1 = transform1
        self.transform2 = transform2
        self.ts = torchvision.transforms.ToPILImage()
        self.ts2 = transform=transforms.ToTensor()
        
        self.data_dir = './'
        self.data_num = data_num
        self.data = []
        self.label = []

        # download
        CIFAR10(self.data_dir, train=True, download=True)
        CIFAR10(self.data_dir, train=False, download=True)
        self.data =CIFAR10(self.data_dir, train=True, transform=self.ts2)

    def __len__(self):
        return self.data_num

    def __getitem__(self, idx):
        out_data = self.ts(self.data[idx][0])
        out_label =  np.array(self.data[idx][1])
        if self.transform1:
            out_data1 = self.transform1(out_data)
        if self.transform2:
            out_data2 = self.transform2(out_data)
        return out_data1, out_data2, out_label

trans1 = torchvision.transforms.ToTensor()
trans2 = torchvision.transforms.Compose([torchvision.transforms.Grayscale(), torchvision.transforms.ToTensor()])

dataset = ImageDataset(32,transform1 = trans1, transform2 = trans2)
testloader = DataLoader(dataset, batch_size=4,
                            shuffle=True, num_workers=0)

ts = torchvision.transforms.ToPILImage()

for out_data1, out_data2, out_label in testloader:
    print(len(out_label),out_label)
    for i in range(len(out_label)):
        image =  out_data1[i]
        image_gray = out_data2[i]
        im = ts(image)
        im_gray = ts(image_gray)
        #print(out_label[i])
        plt.imshow(np.array(im_gray),  cmap='gray')
        plt.title('{}'.format(out_label[i]))
        plt.pause(1)
        plt.clf()
        plt.imshow(np.array(im))
        plt.title('{}'.format(out_label[i]))
        plt.pause(1)
        plt.clf()
plt.close()

Execution result is as follows

>python dataset_cifar10_original.py
Files already downloaded and verified
Files already downloaded and verified
4 tensor([0, 3, 2, 6], dtype=torch.int32)
tensor(0, dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(6, dtype=torch.int32)
4 tensor([2, 2, 9, 5], dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(5, dtype=torch.int32)
4 tensor([3, 6, 1, 7], dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(1, dtype=torch.int32)
tensor(7, dtype=torch.int32)
4 tensor([3, 9, 4, 9], dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(9, dtype=torch.int32)
4 tensor([7, 8, 4, 4], dtype=torch.int32)
tensor(7, dtype=torch.int32)
tensor(8, dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(4, dtype=torch.int32)
4 tensor([6, 7, 9, 0], dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(7, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(0, dtype=torch.int32)
4 tensor([4, 1, 9, 2], dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(1, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(2, dtype=torch.int32)
4 tensor([6, 9, 6, 3], dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(3, dtype=torch.int32)

### Summary ・ I played with transforms ・ I made my own dataset and played with it -You can now create your own dataset using your own data. ・ I learned how to use a dataset and its Dataloader that can perform various processes and simultaneously provide various datasets obtained as a result.

・ I want to use this to create new learning and usage apps for denoizing, coloring, image enlargement, image composition, etc.

bonus

import torchvision.transforms.functional as TF
import random
import matplotlib.pyplot as plt
import cv2
from PIL import Image
import numpy as np
import torch
import torchvision

class MyRotationTransform:MyRotationTransform
    """Rotate by one of the given angles."""

    def __init__(self, angles):
        self.angles = angles

    def __call__(self, x):
        angle = random.choice(self.angles)
        return TF.rotate(x, angle)
    
class MyGrayscaleTransform:
    """GrayScale by this class."""

    def __init__(self):
        pass

    def __call__(self, x):
        #return TF.rgb_to_grayscale(x)
        return TF.to_grayscale(x)
    
class MyVflipTransform:
    """Vertical flip by this class."""

    def __init__(self):
        pass

    def __call__(self, x):
        return TF.vflip(x)    

class MyHflipTransform:
    """Vertical flip by this class."""

    def __init__(self):
        pass

    def __call__(self, x):
        return TF.hflip(x)   

from torchvision import transforms    
class MyNormalizeTransform:
    """normalization by the image."""

    def __init__(self):
        self.imageSize = (512,512)
        self.mean = [0.485, 0.456, 0.406]
        self.std  = [0.229, 0.224, 0.225]
        
    def __call__(self, x):
        img = self.transform = transforms.Compose([
            transforms.Resize(self.imageSize), #Image resizing
            transforms.ToTensor(), #Tensorization
            transforms.Normalize(self.mean, self.std), #Standardization
        ])
        return img(x) 
    
class MyErasingTransform:
    """normalization by the image."""

    def __init__(self):
        self.imageSize = (512,512)
        self.p=0.5
        self.scale=(0.02, 0.33)
        self.ratio=(0.3, 3.3)
        self.value=0
        self.inplace=False
        
    def __call__(self, x):
        self.transform = transforms.Compose([
            transforms.Resize(self.imageSize), #Image resizing
            transforms.ToTensor(), #Tensorization
            transforms.RandomErasing(p=self.p, scale=self.scale, ratio=self.ratio, value=self.value, inplace=self.inplace)
        ])
        return self.transform(x)     

class MyAddGaussianNoise(object):
    def __init__(self, mean=0., std=0.1):
        self.std = std
        self.mean = mean
        
    def __call__(self, tensor):
        return tensor + torch.randn(tensor.size()) * self.std + self.mean
    
    def __repr__(self):
        return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)  
    
trans2 = torchvision.transforms.Compose([torchvision.transforms.Grayscale(), torchvision.transforms.ToTensor()])
ts = torchvision.transforms.ToPILImage()

trans3 = MyGrayscaleTransform()
trans4 = MyHflipTransform()
trans5 = MyNormalizeTransform()
trans6 = MyErasingTransform()
trans7 = transforms.Compose([
        transforms.ToTensor(),
        #transforms.Normalize((0.1307,), (0.3081,)),
        MyAddGaussianNoise(0., 0.1)
        ])

angle_list =[i for i in range(-10,10,1)] #[-30, -15, 0, 15, 30]
rotation_transform = MyRotationTransform(angles=angle_list)

x = Image.open('./face/mayuyu/2.jpg')
while 1:
    y = rotation_transform(x)
    #z = trans5(x)
    z = trans7(y)
    plt.imshow(ts(z))
    plt.pause(0.1)
    #z = trans3(x)
    #plt.imshow(z,  cmap='gray')
    #plt.pause(0.1)
    #plt.imshow(np.array(ts(trans2(y))),  cmap='gray')
    #plt.pause(0.1)
    plt.clf()