TL;DR

#When the image exists in such a directory
.
└─ my_img_path
   |
   ├── hogehoge
   |     |-img_A.png
   |     |-img_B.png
   |
   ├── fugafuga
   |     |-img_A.png
   |     |-img_B.png
   ...
   |
   └── piyopiyo
         |-img_A.png
         |-img_B.png

`python`



from torch.utils.data import DataLoader, Dataset
from PIL import Image
import glob
import os

#Creating a dataset
class PairImgs(Dataset):
    # torch.utils.Dataset inheritance of data
    """
    self.img_paths Paths to the directory one level above the folder containing the image pair
    self.imgs_list List of all folders containing image pairs
    self.transform The specified transform
    """

    def __init__(self, img_dir, transform):
        self.img_paths = img_dir #Specify the path to the parent of the directory containing the image pair
        self.imgs_list = glob.glob(os.path.join(self.img_paths, "*"))
        self.transform = transform
    
    def __getitem__(self, index):
        #Returns files under the directory specified by index
        
        #Load image as PIL
        img_A = Image.open(os.path.join(self.imgs_list[index], "img_A.png "))  #Rename the image as you like
        img_B = Image.open(os.path.join(self.imgs_list[index], "img_B.png "))

        if self.transform is not None:
            #Do if there is preprocessing.Usually transforms.ToTensor()Converts to Tensor
            img_A = self.transform(img_A)
            img_B = self.transform(img_B)

        return img_A , img_B
    
    def __len__(self):  #Number of folders containing image pairs=Returns the number of datasets
        return len(glob.glob(self.img_paths + "*"))

`python`


# **When using**
#Create a dataset
data_set = PairImgs("./my_img_path", transform=transforms.ToTensor())

# (If necessary)Split the dataset for training and testing
train_size = int( len(data_set) * 0.8 ) #Teacher data size 80 overall%To
test_size = n_samples - train_size  #Test data size

train_data, test_data = torch.utils.data.random_split(
        data_set,
        [train_size, test_size ],
        generator=torch.Generator().manual_seed(0)  #Random seed fixed
    )

#Load the created dataset into the data loader
batch_size = 32    #Data loader batch size can be changed arbitrarily
#Training data is shuffled ON Test data is not shuffled
train_loader = DataLoader(train_data, batch_size=batch_size, 
                          shuffle=True, num_workers=2)
test_loader = DataLoader(test_data, batch_size=batch_size,
                         shuffle=False, num_workers=2)

Introduction

PyTorch Advent Calendar 2020 Advent Calendar 2020 was open, so I wrote it in a hurry.

Commentary

I thought about handling image pairs with Pytorch. The ** image pair ** in this case is related to two images (for example, input data and teacher data), and the two images need to be acquired together. As a specific example, it is used for Image-to-Image conversion such as SRCNN. Basically, I read @ mathlive's Explanation of pyTorch transforms, Datasets, Dataloader and Creation and use of my own Dataset.

Creating a Dataset

`python`


class PairImgs(Dataset):
    # torch.utils.Dataset inheritance of data
    """
    self.img_paths Paths to the directory one level above the folder containing the image pair
    self.imgs_list List of all folders containing image pairs
    self.transform The specified transform
    """

    def __init__(self, img_dir, transform):
        self.img_paths = img_dir #Specify the path to the parent of the directory containing the image pair
        self.imgs_list = glob.glob(os.path.join(self.img_paths, "*"))
        self.transform = transform

---Abbreviation---

Pass ** "parent directory of the folder containing the image pair" ** as an argument to the constructor. This time it will be "./My_img_path ". Execute glob.glob ("./ my_img_path/* ") as a process in the constructor and save the target folder this time as a list.

Since img_paths will not be used anymore, it should not be necessary to save it, but it is left because an error occurred when trying to get__len__ with len (imgs_list). You may have made a mistake in writing something.

`python`


---Abbreviation---

def __getitem__(self, idx):
        #Returns files under the directory specified by index

        #Load image as PIL
        img_A = Image.open(os.path.join(self.imgs_list[idx], "img_A.png "))  #Rename the image as you like
        img_B = Image.open(os.path.join(self.imgs_list[idx], "img_B.png "))

        if self.transform is not None:
            #Do if there is preprocessing.Usually transforms.ToTensor()Converts to Tensor
            img_A = self.transform(img_A)
            img_B = self.transform(img_B)

        return img_A , img_B

---Abbreviation---

When accessing Dataset of torch.utils.data, the index value of the array is sent as an argument, so the second argument idx receives it. In other words, the folder containing the target image pair will be self.img_list [index], so give it the image name (" img_A.png ") and use PIL.Image.open (). Load the image as PIL.

`python`


---Abbreviation---

    def __len__(self):  #Number of folders containing image pairs=Returns the number of datasets
        return len(glob.glob(os.path.join(self.img_paths, "*")))

As mentioned above, len (imgs_list) should be fine, but in my environment I got an error, so I run glob again to make the tea muddy.

Put Dataset in DataLoader and use it

`python`


# **When using**
#Create a dataset
data_set = PairImgs("./my_img_path", transform=transforms.ToTensor())

---Abbreviation---

See the reference link for details on transform.

`python`


---Abbreviation---

# (If necessary)Split the dataset for training and testing
train_size = int( len(data_set) * 0.8 ) #Teacher data size 80 overall%To
test_size = n_samples - train_size  #Test data size

train_data, test_data = torch.utils.data.random_split(
        data_set,
        [train_size, test_size],
        generator=torch.Generator().manual_seed(0)  #Random seed fixed
    )

---Abbreviation---

Personally, it's easier to split the dataset just before creating the DataLoader. This time it was divided into training data and test data, but if you want to divide it into verification data as well, the second argument of torch.utils.data.random_split () is a list of 3 elements such as [train_size, test_size, valid_size]. Then, the return value should be received by three variables.

`python`


---Abbreviation---

#Load the created dataset into the data loader
batch_size = 32    #Data loader batch size can be changed arbitrarily
#Training data is shuffled ON Test data is not shuffled
train_loader = DataLoader(train_data, batch_size=batch_size, 
                          shuffle=True, num_workers=2)
test_loader = DataLoader(test_data, batch_size=batch_size,
                         shuffle=False, num_workers=2)

Specify Dataset divided above in DataLoader and finish. After that, you can proceed with learning while taking out batches one by one with a for statement or the like.

Reference link

Explanation of transforms, Datasets, Dataloader of pyTorch and creation and use of self-made Dataset

[PYTHON] [PyTorch] Handle image pairs with Dataset & DataLorder

python

python

Introduction

Commentary

Creating a Dataset

python

python

python

Put Dataset in DataLoader and use it

python

python

python

Reference link

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`