[PYTHON] Regression by CNN (built model of torch vision)

Introduction

Regression by image, for example, predicting tabelog evaluation from ramen image, scoring human face, etc., but there are few materials in both Japanese and English, so I stumbled several times, so a memorandum I would like to summarize as.

From the torchvision tutorial, the model will work just by playing with the existing classification model.

Reference site

Tutorial of torchvison https://medium.com/@benjamin.phillips22/simple-regression-with-neural-networks-in-pytorch-313f06910379

Changes from classification

Basically, image regression can use the classification mechanism almost as it is. The following three changes are required.

Loss --Output layer --Granting correct answer data

Loss

In the tutorial, Loss is Cross Entropy, so change it to Loss used for regression such as MSE and Smooth L1.

# Setup the loss fxn
criterion = nn.MSELoss()

Output layer

I want to play with it based on the tutorial, so set the number of classes to 1 "only".

# Number of classes in the dataset
num_classes = 1

However, if only the above change is made, the number of nodes in the output layer will decrease quite rapidly, such as 1024-> 1, so change the model as follows. (Example in ResNet)

model_ft.fc = nn.Sequential(nn.Linear(num_ftrs, 256),
              nn.LeakyReLU(),
              nn.Linear(256, 32),
              nn.LeakyReLU(),
              nn.Linear(32, 1))

Granting correct answer data

Thanks to pytorch's Dataset, which is a too convenient module, if you divide the training data into folders, it will label the class without permission. However, since what I want to do this time is regression, it is necessary to have numbers such as float and int, so I will make my own Dataset.

Pytorch's own data set has quite a lot of information, so I'll leave the details to that.

class Create_Datasets(Dataset):
    def __init__(self, path, data_transform):
        self.path = path
        self.df = self.create_csv(path)
        self.data_transform = data_transform

    def create_csv(self, path):
        image_path = []
        for file_name in glob(path + '/*.jpeg'):
            basename = os.path.basename(file_name)
            image_path.append(basename)

        df = pd.DataFrame(image_path, columns=['path'])

'''
Pretreatment of your choice
'''

        return df

    def __len__(self):
        return len(self.df)
     
    def __getitem__(self, i):
        file = self.df['path'][i]
        score = np.array(self.df['good'][i])
        image = Image.open(os.path.join(self.path, file))
        image = self.data_transform(image)
 
        return image, score

Personally addicted pitfalls

When creating training data with the above dataset, an error occurred in the backward of loss. The code works, but the loss doesn't go down, so it took a lot of time.

RuntimeError: Found dtype Double but expected Float

The solution is to set the label dtype to torch.float32.

labels = labels.float().to(device)

At the end

This time, the torchvision model can be used for regression as it is. ResNet seems to be a good match for regression, as I moved it lightly. I couldn't find any information for the small changes, so I summarized it.