This time, I summarized the method of sequence data input in PyTorch. I'm sure there are many aspects that cannot be reached, but I would appreciate any technical guidance. What you can understand in this article is how to read a dataset in PyTorch as a chunk of fixed-length moving images. In particular, it is assumed that data sets such as UCSD DATASET, which are not saved as moving images but are saved as serial number images for each folder, are handled.
DATASET/ ├ train/ │ └ img_0001.png ← 1st frame of video │ └ img_0002.png ← 2nd frame of video │ └ img_0003.png : │ : └ test/
I wanted to use PyTorch to learn LSTM without supervised learning, but I didn't have a video load module (in my research), so I reluctantly decided to make it myself.
** Assuming that the data set in image format is first read, a moving image (partial time series) with a fixed length is created from it, the batch size is solidified, and the LSTM is trained. I will. ** **
In PyTorch, Dataset and DataLoader classes for reading the training dataset are prepared, and the data existing in the dir given at the time of object declaration is prepared for batch size for each epoch, so at the time of learning Very convenient. If you refer to here, there are the following three characters related to reading.
transforms
--Module in charge of data preprocessing
Dataset
--A module that returns a set of data and the corresponding label --Returns preprocessed data using transforms when returning data.
DataLoader
--A module that returns data from a dataset to a batch size
Generally, in transforms, set the preprocessing (standardization, size conversion, etc.) of the dataset, then use Dataset to apply the association with the label and preprocessing, and finally use DataLoader to set the batch size. I think that it will be a flow of returning it as a lump of. However, this is only the case if the dataset input is i.i.d., which is a problem if you want to input ** sequence data. ** ** I want a module that can handle sequence data, especially moving image data, so I thought about it.
First of all, since the base is the Dataset class, we inherit this and declare a sub class (Seq_Dataset: SD) with Ds as the parent class (superclass).
Only the method you want to change will be described on the SD again. (Undefined methods are automatically overridden.)
Basically, when you extend the Dataset class and extend it, you will write changes to __len__
and __getitem__
.
In particular, in __getitem__
, describe the processing (moving image conversion) for the read Dataset object (image data this time).
The flow assumed this time is ** Preprocessing setting with transform → Image data reading and processing with ImageFolder (Dataset) → Finally, Seq_Dataset creates a fixed-length moving image (partial time series) and returns the batch size of it ** ..
Below is the SD class that extends Ds this time. I will briefly explain each function.
dataset.py
import torch
from torch.utils.data import Dataset
class Seq_Dataset(Dataset):
def __init__(self, datasets ,time_steps):
self.dataset = datasets
self.time_steps = time_steps
channels = self.dataset[0][0].size(0)
img_size = self.dataset[0][0].size(1)
video = torch.Tensor()
self.video = video.new_zeros((time_steps,channels,img_size,img_size))
def __len__(self):
return len(self.dataset)-self.time_steps
def __getitem__(self, index):
for i in range(self.time_steps):
self.video[i] = self.dataset[index+i][0]
img_label =self.dataset[index]
return self.video,img_label
In __init__
, we simply define the necessary variables. This time, I took a fixed length, that is, time_steprs as an argument. Also, the variable called video is a tensor that stores a fixed-length partial time series, and is initialized with 0. It will be in the form of storing the image data in the datasets that I wrote here.
In __len__
, it only returns the total number of data. This time, the read image data is finally returned as a fixed-length moving image, so the total number is len (dataset) -time_steps.
In __getitem__
, a partial time series for time_steps is generated and assigned to video and returned. Here you can also describe level operations on images. Since there is a background of unsupervised learning this time, there is an outrage that the value of the image is substituted as it is without specifying anything about label. Regarding the method of specifying the label, I think that there are many reference examples if you refer to others. (Sorry for this application)
When actually training, I think that it will be in the form of using the data_loader object and turning it with for to train the model. The procedure to get data_loader is as follows, define each variable, and follow the flow of ImageFolder → Seq_Dataset → DataLoader.
main.py
data_dir = "./data/train/"
time_steps = 10
num_workers = 4
dataset = datasets.ImageFolder(data_dir, transform=transform)
data_ = dataset.Seq_Dataset(dataset, time_steps)
data_loader = DataLoader(data_, batch_size=opt.batch_size, shuffle=True, num_workers=num_workers)
The shape of the partial time series tensor that is finally output has [batchsize, timeouts, channels, imgsize, imgsize]. In the future, I would like to publish the LSTM implementation in PyTorch using this self-made module. Thank you for watching until the end.
Recommended Posts