TL;DR
#When the image exists in such a directory
.
└─ my_img_path
|
├── hogehoge
| |-img_A.png
| |-img_B.png
|
├── fugafuga
| |-img_A.png
| |-img_B.png
...
|
└── piyopiyo
|-img_A.png
|-img_B.png
python
from torch.utils.data import DataLoader, Dataset
from PIL import Image
import glob
import os
#Creating a dataset
class PairImgs(Dataset):
# torch.utils.Dataset inheritance of data
"""
self.img_paths Paths to the directory one level above the folder containing the image pair
self.imgs_list List of all folders containing image pairs
self.transform The specified transform
"""
def __init__(self, img_dir, transform):
self.img_paths = img_dir #Specify the path to the parent of the directory containing the image pair
self.imgs_list = glob.glob(os.path.join(self.img_paths, "*"))
self.transform = transform
def __getitem__(self, index):
#Returns files under the directory specified by index
#Load image as PIL
img_A = Image.open(os.path.join(self.imgs_list[index], "img_A.png ")) #Rename the image as you like
img_B = Image.open(os.path.join(self.imgs_list[index], "img_B.png "))
if self.transform is not None:
#Do if there is preprocessing.Usually transforms.ToTensor()Converts to Tensor
img_A = self.transform(img_A)
img_B = self.transform(img_B)
return img_A , img_B
def __len__(self): #Number of folders containing image pairs=Returns the number of datasets
return len(glob.glob(self.img_paths + "*"))
python
# **When using**
#Create a dataset
data_set = PairImgs("./my_img_path", transform=transforms.ToTensor())
# (If necessary)Split the dataset for training and testing
train_size = int( len(data_set) * 0.8 ) #Teacher data size 80 overall%To
test_size = n_samples - train_size #Test data size
train_data, test_data = torch.utils.data.random_split(
data_set,
[train_size, test_size ],
generator=torch.Generator().manual_seed(0) #Random seed fixed
)
#Load the created dataset into the data loader
batch_size = 32 #Data loader batch size can be changed arbitrarily
#Training data is shuffled ON Test data is not shuffled
train_loader = DataLoader(train_data, batch_size=batch_size,
shuffle=True, num_workers=2)
test_loader = DataLoader(test_data, batch_size=batch_size,
shuffle=False, num_workers=2)
PyTorch Advent Calendar 2020 Advent Calendar 2020 was open, so I wrote it in a hurry.
I thought about handling image pairs with Pytorch
. The ** image pair ** in this case is related to two images (for example, input data and teacher data), and the two images need to be acquired together. As a specific example, it is used for Image-to-Image conversion such as SRCNN.
Basically, I read @ mathlive's Explanation of pyTorch transforms, Datasets, Dataloader and Creation and use of my own Dataset.
python
class PairImgs(Dataset):
# torch.utils.Dataset inheritance of data
"""
self.img_paths Paths to the directory one level above the folder containing the image pair
self.imgs_list List of all folders containing image pairs
self.transform The specified transform
"""
def __init__(self, img_dir, transform):
self.img_paths = img_dir #Specify the path to the parent of the directory containing the image pair
self.imgs_list = glob.glob(os.path.join(self.img_paths, "*"))
self.transform = transform
---Abbreviation---
Pass ** "parent directory of the folder containing the image pair" ** as an argument to the constructor. This time it will be "./My_img_path "
.
Execute glob.glob ("./ my_img_path/* ")
as a process in the constructor and save the target folder this time as a list.
img_paths
will not be used anymore, it should not be necessary to save it, but it is left because an error occurred when trying to get__len__
with len (imgs_list)
. You may have made a mistake in writing something.python
---Abbreviation---
def __getitem__(self, idx):
#Returns files under the directory specified by index
#Load image as PIL
img_A = Image.open(os.path.join(self.imgs_list[idx], "img_A.png ")) #Rename the image as you like
img_B = Image.open(os.path.join(self.imgs_list[idx], "img_B.png "))
if self.transform is not None:
#Do if there is preprocessing.Usually transforms.ToTensor()Converts to Tensor
img_A = self.transform(img_A)
img_B = self.transform(img_B)
return img_A , img_B
---Abbreviation---
When accessing Dataset
of torch.utils.data
, the index value of the array is sent as an argument, so the second argument idx
receives it. In other words, the folder containing the target image pair will be self.img_list [index]
, so give it the image name (" img_A.png "
) and use PIL.Image.open ()
. Load the image as PIL.
python
---Abbreviation---
def __len__(self): #Number of folders containing image pairs=Returns the number of datasets
return len(glob.glob(os.path.join(self.img_paths, "*")))
As mentioned above, len (imgs_list)
should be fine, but in my environment I got an error, so I run glob
again to make the tea muddy.
python
# **When using**
#Create a dataset
data_set = PairImgs("./my_img_path", transform=transforms.ToTensor())
---Abbreviation---
See the reference link for details on transform
.
python
---Abbreviation---
# (If necessary)Split the dataset for training and testing
train_size = int( len(data_set) * 0.8 ) #Teacher data size 80 overall%To
test_size = n_samples - train_size #Test data size
train_data, test_data = torch.utils.data.random_split(
data_set,
[train_size, test_size],
generator=torch.Generator().manual_seed(0) #Random seed fixed
)
---Abbreviation---
Personally, it's easier to split the dataset just before creating the DataLoader
.
This time it was divided into training data and test data, but if you want to divide it into verification data as well, the second argument of torch.utils.data.random_split ()
is a list of 3 elements such as [train_size, test_size, valid_size]. Then, the return value should be received by three variables.
python
---Abbreviation---
#Load the created dataset into the data loader
batch_size = 32 #Data loader batch size can be changed arbitrarily
#Training data is shuffled ON Test data is not shuffled
train_loader = DataLoader(train_data, batch_size=batch_size,
shuffle=True, num_workers=2)
test_loader = DataLoader(test_data, batch_size=batch_size,
shuffle=False, num_workers=2)
Specify Dataset
divided above in DataLoader
and finish.
After that, you can proceed with learning while taking out batches one by one with a for
statement or the like.
Explanation of transforms, Datasets, Dataloader of pyTorch and creation and use of self-made Dataset
Recommended Posts