Regression by image, for example, predicting tabelog evaluation from ramen image, scoring human face, etc., but there are few materials in both Japanese and English, so I stumbled several times, so a memorandum I would like to summarize as.
From the torchvision tutorial, the model will work just by playing with the existing classification model.
Reference site
Tutorial of torchvison https://medium.com/@benjamin.phillips22/simple-regression-with-neural-networks-in-pytorch-313f06910379
Basically, image regression can use the classification mechanism almost as it is. The following three changes are required.
Loss
In the tutorial, Loss is Cross Entropy, so change it to Loss used for regression such as MSE and Smooth L1.
# Setup the loss fxn
criterion = nn.MSELoss()
I want to play with it based on the tutorial, so set the number of classes to 1 "only".
# Number of classes in the dataset
num_classes = 1
However, if only the above change is made, the number of nodes in the output layer will decrease quite rapidly, such as 1024-> 1, so change the model as follows. (Example in ResNet)
model_ft.fc = nn.Sequential(nn.Linear(num_ftrs, 256),
nn.LeakyReLU(),
nn.Linear(256, 32),
nn.LeakyReLU(),
nn.Linear(32, 1))
Thanks to pytorch's Dataset, which is a too convenient module, if you divide the training data into folders, it will label the class without permission. However, since what I want to do this time is regression, it is necessary to have numbers such as float and int, so I will make my own Dataset.
class Create_Datasets(Dataset):
def __init__(self, path, data_transform):
self.path = path
self.df = self.create_csv(path)
self.data_transform = data_transform
def create_csv(self, path):
image_path = []
for file_name in glob(path + '/*.jpeg'):
basename = os.path.basename(file_name)
image_path.append(basename)
df = pd.DataFrame(image_path, columns=['path'])
'''
Pretreatment of your choice
'''
return df
def __len__(self):
return len(self.df)
def __getitem__(self, i):
file = self.df['path'][i]
score = np.array(self.df['good'][i])
image = Image.open(os.path.join(self.path, file))
image = self.data_transform(image)
return image, score
When creating training data with the above dataset, an error occurred in the backward of loss. The code works, but the loss doesn't go down, so it took a lot of time.
RuntimeError: Found dtype Double but expected Float
The solution is to set the label dtype to torch.float32.
labels = labels.float().to(device)
This time, the torchvision model can be used for regression as it is. ResNet seems to be a good match for regression, as I moved it lightly. I couldn't find any information for the small changes, so I summarized it.