When I read this article, I longed for "I wish I could understand this content and implement it myself".
Counting cars from aerial photographs with Deep Learning
I learned PyTorch with the target of being able to implement this myself. Since I was able to do it to some extent, I estimated the number of cars with the model constructed on the image taken by the artificial satellite that was released, and asked for the mapping of the cars in the same way.
Copyright©2016DigitalGlobe.
We will introduce the acquisition of image data for learning and verification, Dataset and Dataloader processing for modeling with PyTorch, learning, verification and demonstration with images taken by artificial satellites. Similar to How to create aerial image building segmentation by Pytorch., it is intended for those who are new to PyTorch and image classification, so it is introduced in quite detail. I will. Therefore, it is a long sentence, so if you know PyTorch, please see only the points.
The code used here (Jupyter lab) is available on Github. If you would like to try it in your own environment (including Google Colaboratory), please download it from the following. pytorch_car_counting
The implementation environment of this article is as follows.
OS:Ubuntu: 18.04LTS GPU:GeoForce GTX1070
Python: 3.7 PyTorch: 1.1.0
Object detection is the first method that comes to mind as a method of identifying an object from an image. This technology first recognizes the presence or absence of an object in the image and its position, and then classifies the object. For example, the technology is introduced in detail at the following site.
Object detection in deep learning
In the aerial photograph (COWC: Car Overhead With Context) used this time, the [method] of object detection (https://arthurdouillard.com/post/nato-challenge/) is introduced as a learning example.
At first, I thought about implementing a general object detection (SSD) method with Pytorch and counting the cars in the image, but [this article](https://qiita.com/motokimura/items/ I am interested in the method of identifying the number of cars in the image as the pattern (texture) of the image and counting the cars by image classification according to the number of cars, as introduced in d155d532a5f1dd02089c). Depending on whether you create a service or application, you don't need the position information of the cars in the image, and I thought it would be enough to know the number of cars in a certain grid and obtain the distribution. Also, this time we targeted the number of cars, but you can create a similar distribution map by changing the target. However, I was not thinking about difficult things, and I was most interested in the attractiveness and effectiveness of "reducing annotation costs". It's an interesting idea.
Now, prepare image data and annotation information (number of cars in the image) for learning and verification.
I tried to download the aerial photograph image data and annotation data from the site at COWC and prepare the data for learning and verification, but since effective data was already prepared, I used that. I was allowed to. thank you so much.
Please see this article for information on how to download COWC data and how to preprocess it.
Here, we will introduce the process for building a model with PyTorch after the above preprocessing. In addition, Code has been uploaded to Github for general preprocessing from COWC data download, so please refer to it as well. Please give me.
First, import each module.
import argparse
import os
import shutil
import math
import numpy as np
from PIL import Image
from skimage import io
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np
import cv2
Image.MAX_IMAGE_PIXELS = 1000000000
After that, check the acquired image.
train_path = '../../data/cowc_processed/train_val/crop/train/'
files =os.listdir(train_path)
#Get file name
print(files[0])
#Read the train file.
im = cv2.imread(train_path + files[0])
im_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
plt.imshow(im_rgb)
From this, you can see that the aerial photograph of the car in the left half is composed of the annotation image (white dots are the position of the car) that shows the position of the car in the right half.
Next, prepare the data of the pair of the aerial photograph and the corresponding number of cars.
#Check image size
v_size = im_rgb.shape[0]
h_size = im_rgb.shape[1]
print(h_size)
#The image is divided into a left image and an annotation image on the right.
clp_l = im_rgb[0:v_size, 0:h_size//2]
clp_r = im_rgb[0:v_size, h_size//2:h_size]
#Confirmation of captured image
plt.imshow(clp_l)
#Confirmation of annotation image
plt.imshow(clp_r)
Estimate the number of cars from the separated annotation images.
#Calculate the number of cars from the integrated value of the annotation image. (One:765)
car_count = int(np.sum(clp_r) // 765)
print('Number of cars: ', car_count)
Estimating the number of units from the signal strength of the annotation image, the output is as follows. output
Number of cars: 6
Using the above method, the number of cars is calculated from the division processing and annotation images for all the training (Train) and verification (Validation) images. The divided aerial photographs are stored in the folder (directory) of the number of cars.
for i in range(len(files)):
train_path = '../../data/cowc_processed/train_val/crop/train/'
files =os.listdir(train_path)
im = cv2.imread(train_path + files[i])
im_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
#Check image size
v_size = im_rgb.shape[0]
h_size = im_rgb.shape[1]
clp_l = im_rgb[0:v_size, 0:h_size//2]
clp_r = im_rgb[0:v_size, h_size//2:h_size]
car_count = int(np.sum(clp_r) // 765)
car_count
#rint(car_count)
path_train = '../../../data/train/'
#Create if there is no save destination directory
if not os.path.exists(path_train + str(car_count)):
os.mkdir(path_train + str(car_count))
#Saving separated images
Image.fromarray(clp_l).save(path_train + str(car_count) +'/' + files[i])
From the processed data, check the distribution of the number of cars and the number of data for learning data.
#Number of train data
print('Distribution of train numbers: ', len(os.listdir(path_train)))
#Number of train data
import glob
print('Number of train data: ', len(glob.glob(path_train + '/*/*')))
output
Distribution of train numbers: 21
Number of train data: 37981
The same processing is performed for the verification data, the image is divided, and the divided image is stored in the directory of the number of cars.
Here, when we checked the number of vehicles in the verification data, the maximum number was 12, so we stored all the images of 12 or more vehicles in the learning data as 12 in the same directory.
for i in range(13, 21, 1):
move_glob('./train/12/', './train/' + str(i) + '/*.png')
Then, in order to fix the number of characters in the classification name in Dataset, if the number of cars in the directory name is one digit, it was changed to two digits.
#Change the directory name to 2 digits.
for i in range(0,10,1):
os.rename('./train/'+ str(i), './train/0' + str(i))
os.rename('./val/'+ str(i), './val/0' + str(i))
This completes the preparation of learning and verification data.
The image classificaton model was created based on the model introduced in "Learn while making! Development deep learning by PyTorch", which was used as a reference for learning Pytorch.
Book "Learn while making! Deep learning by PyTorch" (Yutaro Ogawa, Mynavi Publishing, 19/07/29)
This book can be carefully learned and constructed from the concept of PyTorch's workflow, with a wide range of objects such as image classification, object detection, segmentation, GAN, natural language processing, and video classification. It was very helpful. Among the image classifications introduced here, Fine Tuning based on the VGG-16 model is adopted to build the model. Please refer to this book for the method of Fine tuning by Vgg-16 and PyTorch. In addition, the code described in the book is published below. If you are interested in what the content is, please refer to this as well.
Learn while making! Deep learning by PyTorch MIT License
Now, create a Dataset and Dataloader for training and verification data, and build and verify the model.
Create a Dataset and Dataloader compatible with PyTorch in order to build a vehicle number prediction model based on the aerial photograph image data prepared in the preprocessing and the number of vehicles corresponding to it.
First, import the required modules.
#Package import
import glob
import os.path as osp
import random
import numpy as np
import json
from PIL import Image
from tqdm import tqdm
import matplotlib.pyplot as plt
%matplotlib inline
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import torchvision
from torchvision import models, transforms
Next, the random numbers are fixed to maintain reproducibility.
#Set random number seed
torch.manual_seed(1234)
np.random.seed(1234)
random.seed(1234)
Then check the version of PyTorch.
print(torch.__version__)
print(torchvision.__version__)
output
1.1.0
0.3.0
My environment is the above version respectively.
First, create a Dataset. As mentioned earlier, the Dataset and Fine tuning models use the settings introduced in Learn while making! Deep learning by PyTorch. .. Therefore, if you are thinking about how to improve the accuracy of the built model, please try various things such as changing the Augmentation or adding a filter based on this setting. I think that you can learn what kind of parameters are effective or not as your own feeling.
Dataset is set by the following functions for both training (Train) and verification (Val).
First, Augmentation of image data, standardization processing and size change, and Tensor conversion for handling with PyTorch are performed below.
class ImageTransform():
"""
Image preprocessing class. It behaves differently during training and verification.
Resize the image and standardize the colors.
During training, data augmentation is performed with Random Resized Crop and Random Horizontal Flip.
Attributes
----------
resize : int
The size of the image to be resized.
mean : (R, G, B)
The average value for each color channel.
std : (R, G, B)
Standard deviation of each color channel.
"""
def __init__(self, resize, mean, std):
self.data_transform = {
'train': transforms.Compose([
transforms.RandomResizedCrop(
resize, scale=(0.5, 1.0)), #Data augmentation
transforms.RandomHorizontalFlip(), #Data augmentation
transforms.RandomVerticalFlip(), #Data augmentation
transforms.RandomAffine([-30, 30], scale=(0.8, 1.2)), #Rotate and resize
#transforms.RandomErasing(p=0.5), #Probability 0.Randomly erase area with 5
transforms.ToTensor(), #Convert to tensor
transforms.Normalize(mean, std) #Standardization
]),
'val': transforms.Compose([
transforms.Resize(resize), #resize
#transforms.CenterCrop(resize), #Cut out the center of the image with resize x resize
transforms.ToTensor(), #Convert to tensor
transforms.Normalize(mean, std) #Standardization
])
}
def __call__(self, img, phase='train'):
"""
Parameters
----------
phase : 'train' or 'val'
Specify the preprocessing mode.
"""
return self.data_transform[phase](img)
Then, how does this conversion change the image? Let's take a look at the Train image.
#Check the operation of image preprocessing during training
#The image of the processing result changes each time it is executed
# 1.Image loading
image_file_path = '../data/train/01/03553_97_597.png'
img = Image.open(image_file_path) # [height][width][Color RGB]
# 2.Display of original image
plt.imshow(img)
plt.show()
# 3.Image pre-processing and viewing of processed images
size = 96
mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)
transform = ImageTransform(size, mean, std)
img_transformed = transform(img, phase="train") # torch.Size([3, 224, 224])
# (Color, height, width)To(Height, width, color)Convert to 0-Display with limited value to 1
img_transformed = img_transformed.numpy().transpose((1, 2, 0))
img_transformed = np.clip(img_transformed, 0, 1)
plt.imshow(img_transformed)
plt.show()
The execution result is as follows.
You can see that the original image at the top has been tilted and standardized (the hue has changed). Next, create a list of training and validation files for creating a Dataset.
def make_datapath_list(phase="train"):
"""
Create a list containing the data paths.
Parameters
----------
phase : 'train' or 'val'
Specify training data or verification data
Returns
-------
path_list : list
A list containing the paths to the data
"""
rootpath = "../data/"
target_path = osp.join(rootpath+phase+'/**/*.png')
print(target_path)
path_list = [] #Store here
#Get the file path to the subdirectory using glob
for path in glob.glob(target_path):
path_list.append(path)
return path_list
#Run
train_list = make_datapath_list(phase="train")
val_list = make_datapath_list(phase="val")
Let's check the number of images for learning and verification prepared here.
len(train_list), len(val_list)
output
(37981, 10267)
It was confirmed that there were 37981 images for learning and 10267 images for verification.
Next, create a Dataset. Since the distribution of the number of cars in the images for learning and verification was set to 0 to 12 (for learning, 13 or more cars were all moved to 12), the settings are as follows.
class CarCountDataset(torch.utils.data.Dataset):
"""
Dataset class for images of the number of cars. Inherit PyTorch's Dataset class.
Attributes
----------
file_list :list
A list of image paths
transform : object
Instance of preprocessing class
phase : 'train' or 'test'
Set learning or training.
"""
def __init__(self, file_list, transform=None, phase='train'):
self.file_list = file_list #List of file paths
self.transform = transform #Instance of preprocessing class
self.phase = phase #Specifying train or val
def __len__(self):
'''Returns the number of images'''
return len(self.file_list)
def __getitem__(self, index):
'''
Get Tensor format data and labels for preprocessed images
'''
#Load index th image
img_path = self.file_list[index]
img = Image.open(img_path) # [height][width][Color RGB]
#Perform image preprocessing
img_transformed = self.transform(
img, self.phase) # torch.Size([3, 224, 224])
#Extract the image label from the file name
if self.phase == "train":
label = img_path[14:16]
#print(label)
elif self.phase == "val":
#print(img_path)
label = img_path[12:14]
#print(label)
#Change the label to a number
if label == "00":
label = 0
elif label == "01":
label = 1
elif label == "02":
label = 2
elif label == "03":
label = 3
elif label == "04":
label = 4
elif label == "05":
label = 5
elif label == "06":
label = 6
elif label == "07":
label = 7
elif label == "08":
label = 8
elif label == "09":
label = 9
elif label == "10":
label = 10
elif label == "11":
label = 11
elif label == "12":
label = 12
#elif label == "13": #Since cal is up to 12, train included more than 12 in the 12 cars.
# label = 13
#elif label == "14":
# label = 14
#elif label == "15":
# label = 15
#elif label == "16":
# label = 16
#elif label == "17":
# label = 17
#elif label == "18":
# label = 18
#elif label == "19":
# label = 19
#elif label == "20":
# label = 20
#print(type(label))
return img_transformed, label
#Run
train_dataset = CarCountDataset(
file_list=train_list, transform=ImageTransform(size, mean, std), phase='train')
val_dataset = CarCountDataset(
file_list=val_list, transform=ImageTransform(size, mean, std), phase='val')
#Operation check
index = 0
print(train_dataset.__getitem__(index)[0].size())
print(train_dataset.__getitem__(index)[1])
There is a method in PyTorch to treat the directory name as a classification name (Class name), but this time we are declaring that the directory name is the classification name, which is the method explained in the book. Now that the Dataset is ready after executing the above, create a Dataloader. The following processing is performed as learning and verification data.
#Specify the size of the mini batch
batch_size = 32
#Create DataLoader
train_dataloader = torch.utils.data.DataLoader(
train_dataset, batch_size=batch_size, shuffle=True)
val_dataloader = torch.utils.data.DataLoader(
val_dataset, batch_size=batch_size, shuffle=False)
#Collect in dictionary variables
dataloaders_dict = {"train": train_dataloader, "val": val_dataloader}
print(type(dataloaders_dict))
print(dataloaders_dict["train"])
print(dataloaders_dict)
#Operation check
batch_iterator = iter(dataloaders_dict["train"]) #Convert to iterator
print(type(batch_iterator))
inputs, labels = next(
batch_iterator) #Extract the first element
#print(inputs.size())
#print(labels)
#print(labels.size())
Some data formats are output by Print, but these are for the purpose of confirmation, so if you have finished that, please add # to the head and delete it. (Leave it as it is in the sense of my memorandum.)
This completes the preparation of the impit data for executing PyTorch. Next, build the model.
The network model is built by Fine Tuning based on VGG-16. As many of you may know VGG-16, this is a 16-layer CNN model trained on a large-scale image dataset called "ImageNet". The output will be 1000 classes, and the estimation results classified into 1000 types such as dogs and birds and cats will be output.
First, call it to use VGG16 with PyTorch.
#Learned VGG-Load 16 models
# VGG-Instantiate 16 models
use_pretrained = True #Use trained parameters
net = models.vgg16(pretrained=use_pretrained)
print(net)
There are many trained models available in PyTorch besides VGG16. If you are interested, please refer to the Official Document.
The model is as follows. output.
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace)
(5): Dropout(p=0.5)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
The last out_feature = 1000, which indicates that the output is classified into 1000 types. Therefore, the final output layer will be changed to 13 types from 0 to 12, which is the output of the number distribution of cars this time.
#Replace the output unit in the last output layer of the VGG16 with a class of 21 car numbers
net.classifier[6] = nn.Linear(in_features=4096, out_features=13)
#Set to training mode
net.train()
print('Network setup complete: Loaded the trained weights and set them to training mode')
Next, set the Fine tuning of this model. In the model of VGG16, set not to update some parameters by learning as in the book.
#Variables params for parameters to be learned by fine tuning_to_Store in 1 to 3 of update
params_to_update_1 = []
params_to_update_2 = []
params_to_update_3 = []
#Specify the parameter name of the layer to be trained
update_param_names_1 = ["features"]
update_param_names_2 = ["classifier.0.weight",
"classifier.0.bias", "classifier.3.weight", "classifier.3.bias"]
update_param_names_3 = ["classifier.6.weight", "classifier.6.bias"]
#Store each parameter in each list
for name, param in net.named_parameters():
if update_param_names_1[0] in name:
param.requires_grad = True
params_to_update_1.append(param)
print("params_to_update_Stored in 1:", name)
elif name in update_param_names_2:
param.requires_grad = True
params_to_update_2.append(param)
print("params_to_update_Store in 2:", name)
elif name in update_param_names_3:
param.requires_grad = True
params_to_update_3.append(param)
print("params_to_update_Store in 3:", name)
else:
param.requires_grad = False
print("No gradient calculation. Do not learn:", name)
Next, we define the loss function. The loss function is cross entropy because of the majority classification.
#Loss function settings
criterion = nn.CrossEntropyLoss()
Then set the optimization method. The optimization method is general SGD, and the learning rate of each parameter is the same weight as the book.
#Optimizing method settings
optimizer = optim.SGD([
{'params': params_to_update_1, 'lr': 1e-4},
{'params': params_to_update_2, 'lr': 5e-4},
{'params': params_to_update_3, 'lr': 1e-3}
], momentum=0.9)
I think it will be interesting to experiment to see how the result (accuracy) changes by changing the settings. Since it takes time to learn, I want a high-performance GPU.
Now, let's set the learning method at the end.
#Create a function to train the model
def train_model(net, dataloaders_dict, criterion, optimizer, num_epochs):
#Initial setting
#Check if GPU can be used
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#print("Device used:", device)
print("device name", torch.cuda.get_device_name(0))
#Network to GPU
net.to(device)
#If the network is fixed to some extent, speed it up
torch.backends.cudnn.benchmark = True
#train accurascy, train loss, val_accuracy, val_Set so that loss can be graphed.
x_epoch_data=[]
y_train_loss=[]
y_train_accuracy=[]
y_val_loss=[]
y_val_accuracy=[]
#epoch loop
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch+1, num_epochs))
print('-------------')
x_epoch_data.append(epoch)
#Training and verification loop for each epoch
for phase in ['train', 'val']:
if phase == 'train':
net.train() #Put the model in training mode
else:
net.eval() #Put the model in validation mode
epoch_loss = 0.0 #epoch loss sum
epoch_corrects = 0 #Number of correct answers for epoch
#Epoch to check the verification performance when unlearned=0 training omitted
if (epoch == 0) and (phase == 'train'):
continue
#Loop to retrieve mini-batch from data loader
for inputs, labels in tqdm(dataloaders_dict[phase]):
#Send data to GPU if GPU is available
inputs = inputs.to(device)
labels = labels.to(device)
#Initialize optimizer
optimizer.zero_grad()
#Forward calculation
with torch.set_grad_enabled(phase == 'train'):
outputs = net(inputs)
loss = criterion(outputs, labels) #Calculate loss
_, preds = torch.max(outputs, 1) #Predict label
#Backpropagation during training
if phase == 'train':
loss.backward()
optimizer.step()
#Result calculation
epoch_loss += loss.item() * inputs.size(0) #Update total loss
#Updated the total number of correct answers
epoch_corrects += torch.sum(preds == labels.data)
#Display loss and correct answer rate for each epoch
epoch_loss = epoch_loss / len(dataloaders_dict[phase].dataset)
epoch_acc = epoch_corrects.double(
) / len(dataloaders_dict[phase].dataset)
if phase == 'train':
y_train_loss.append(epoch_loss)
y_train_accuracy.append(epoch_acc)
else:
y_val_loss.append(epoch_loss)
y_val_accuracy.append(epoch_acc)
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
return x_epoch_data, y_train_loss, y_train_accuracy, y_val_loss, y_val_accuracy
Here, as with the book, the accuracy is calculated using the verification data as the VGG16 model read without training the first time. I think you can get a quantitative sense of how much you can improve by using the learning data. Also, although this is different from the book, I made data to graph how the loss and accuracy of the model will not improve by deepening the learning.
Now, run the model below. The number of epochs is 50.
#Perform learning / verification
num_epochs=50
train = train_model(net, dataloaders_dict, criterion, optimizer, num_epochs=num_epochs)
It takes about 2 hours and 30 minutes to study in my environment (GPU: GX 1070). After learning is completed, we will evaluate the learning below.
#Since Train does not execute the first epoch, the target epoch is deleted.
train_epoch = train[0].copy()
train_epoch.pop(0)
#Graphing the learning pattern
fig = plt.figure(figsize=(14, 5))
ax1 = fig.add_subplot(1, 2, 1)
line1, = ax1.plot(train_epoch,train[1],label='loss')
line2, = ax1.plot(train_epoch,train[2],label='accuracy')
ax1.set_title("train")
ax1.set_xlabel('epoch')
ax1.set_ylabel('loss, accuracy')
ax1.legend(loc='upper right')
ax2 = fig.add_subplot(1, 2, 2)
line1, = ax2.plot(train[0],train[3],label='loss')
line2, = ax2.plot(train[0],train[4],label='accuracy')
ax2.set_ylim(0.5, 1.0) #Fixed y-axis scale.
ax2.set_title("validation")
ax2.set_xlabel('epoch')
ax2.set_ylabel('loss, accuracy')
ax2.legend(loc='upper right')
plt.show()
The result is as follows.
In the training data, the loss decreases and the accuracy improves as the number of trainings increases. On the other hand, the accuracy of the verification data is quite low as it is at VGG16 in the first epoch, but the accuracy is improved by increasing the number of learnings like the training data. However, the improvement rate of the accuracy is not as high as that of the learning image. Since the learning and verification images were prepared from the same aerial photograph, I don't think there is a big difference. Perhaps, if the number of learnings is increased,'overfitting' may appear, in which the accuracy for learning increases but the accuracy for verification decreases. If you have time, please try it.
Now, save the parameters of the model learned here.
#Save PyTorch network parameters
save_path = './carcount_weights_fine_tuning.pth'
torch.save(net.state_dict(), save_path)
Next, play (load) the learning model saved here. This is not necessary if you want to continuously perform from model learning to execution, but when you want to load a model learned in the past (because learning takes time, when verifying or implementing a test model after learning), It will be executed from the following. It seems that the reading method differs depending on the case, such as when using a model created in a GPU environment in a GPU environment, or when using a model created in a GPU environment in a CPU environment without a GPU. For details, please refer to here.
Here is an example of model playback and execution in a GPU environment.
device = torch.device("cuda")
load_path = './carcount_weights_fine_tuning.pth'
net.load_state_dict(torch.load(load_path))
net.to(device)
Now, let's check the accuracy of the read model using the verification data.
correct = 0
total = 0
net.eval() #Evaluation mode
for i, (x, t) in enumerate(val_dataloader):
x, t = x.cuda(), t.cuda() #GPU compatible
y = net(x)
correct += (y.argmax(1) == t).sum().item()
total += len(x)
print("Correct answer rate:", str(correct/total*100) + "%")
The output was a correct answer rate: 75.05600467517289%. Next, let's understand the meaning of this correct answer rate.
Check the verification result of the verification image using the trained model. For example, try running the following code.
test_dataloader = torch.utils.data.DataLoader(
val_dataset, batch_size=batch_size, shuffle=True)
dataiter = iter(test_dataloader)
images, labels = dataiter.next() #Take out only one sample
plt.imshow(np.transpose(images[0], (1, 2, 0))) #Channel at the back
plt.tick_params(labelbottom=False, labelleft=False, bottom=False, left=False) #Hide labels and memory
plt.show()
net.eval() #Evaluation mode
x, t = images.cuda(), labels.cuda() #GPU compatible
print(x.shape)
#x, t = images, labels #CPU compatible
y = net(x)
print("Predicted number of cars:", y[0].argmax().item())
print('Number of correct cars:', t[0])
The output is as follows.
The image is difficult to understand, but you can see that there are 3 cars (the number of correct cars) in the photo, and the result predicted by the model is in agreement with 3 cars. But what about the number of other cars? Let's find the accuracy according to the distribution of the number of cars.
#Specify the size of the mini batch
batch_size = 32
val_dataloader = torch.utils.data.DataLoader(
val_dataset, batch_size=batch_size, shuffle=False)
classes = list(range(13))
class_correct = list(0. for i in range(13))
class_total = list(0. for i in range(13))
predicted_class = list(0. for i in range(13))
dataiter = iter(val_dataloader)
images, labels = dataiter.next()
net.eval() #Evaluation mode
x, t = images, labels #GPU compatible
for data in val_dataloader:
images, labels = data
x, t = images.cuda(), labels.cuda() #GPU compatible
#x, t = images, labels #CPU compatible
outputs = net(x)
_, predicted = torch.max(outputs, 1)
c = (predicted == t).squeeze()
for i in range(len(c)):
label = t[i]
label_p = predicted[i]
class_correct[label] += c[i].item()
predicted_class[label_p] += 1
class_total[label] += 1
for i in range(13):
print('Accuracy of %2s : %2d %%, %4d images, the number of predicted image: %4d' % (
classes[i], 100 * class_correct[i] / class_total[i], class_total[i], predicted_class[i]))
output
Accuracy of 0 : 94 %, 5773 images, the number of predicted image: 5618
Accuracy of 1 : 68 %, 1546 images, the number of predicted image: 1419
Accuracy of 2 : 52 %, 1171 images, the number of predicted image: 976
Accuracy of 3 : 47 %, 733 images, the number of predicted image: 878
Accuracy of 4 : 26 %, 465 images, the number of predicted image: 449
Accuracy of 5 : 26 %, 284 images, the number of predicted image: 370
Accuracy of 6 : 12 %, 153 images, the number of predicted image: 107
Accuracy of 7 : 21 %, 76 images, the number of predicted image: 237
Accuracy of 8 : 0 %, 40 images, the number of predicted image: 12
Accuracy of 9 : 25 %, 12 images, the number of predicted image: 106
Accuracy of 10 : 12 %, 8 images, the number of predicted image: 43
Accuracy of 11 : 0 %, 5 images, the number of predicted image: 21
Accuracy of 12 : 100 %, 1 images, the number of predicted image: 31
It can be seen that the accuracy of predicting 0 cars for the image of 0 cars is quite high at 94%. This is because most of the images do not show the car (5773 images), and you can imagine that the accuracy of 0 units will be high. (The number of images predicted to be 0 is 5618.) On the other hand, if there are many images, for example, 8 images, the accuracy does not match 0% at all. (There were 12 images that were predicted to be 8 models, but none of them had images with 8 correct labels.) You can see that the accuracy is getting worse as the number of units increases. However, even if they do not match exactly, it is highly likely that they are around that time. As I wrote at the beginning of the article, this time it is not the purpose to accurately determine the number of cars, but it is qualitative, but it is a trend by comparing the dense distribution of cars and images of the same area at different times. I hope that is required. It turns out that the estimated number of cars is more than the actual number. However, since it has never been highly accurate, it is better to try to improve it by changing the model and parameter settings and processing the training image.
Now that we have evaluated the accuracy of the constructed model, we will use this model to create a distribution map of the cars obtained from the aerial photographs.
The estimated distribution and number of cars are superimposed on the actual aerial photograph for the verification image. Those who have already imported the module are not required, but I will describe it just in case.
import argparse
import os
import shutil
import math
import numpy as np
from PIL import Image
from skimage import io
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np
import cv2
Image.MAX_IMAGE_PIXELS = 1000000000
Check the file name of the image to be processed next. At this time, create a directory to save the divided images for processing by PyTorch.
if not os.path.exists('../data/test'):
os.mkdir('../data/test')
val_path = '../cowc_car_counting/data/cowc/datasets/ground_truth_sets/Utah_AGRC/'
files =os.listdir(val_path)
#Get file name
print(files[0])
This time, we targeted aerial photographs of Utha.
Now, browse this image and check the size.
#Load the verification image with opencv.
im = cv2.imread(val_path + files[0])
im_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
#Image.fromarray(im_rgb).save('test.jpg')
plt.imshow(im_rgb)
#Check image size
v_size = im_rgb.shape[0]
h_size = im_rgb.shape[1]
print('v_size:', v_size)
print('h_size:', h_size)
It was confirmed that the size of the image is 7213 x 7226 pixels. I set the grid size to 100 pixels (the size is close to that because the model is 96 pixels), and this time I resized it to an integral multiple of that size.
#Prepare a place to put the divided image of the verification image.
if not os.path.exists('../data/test/val'):
os.mkdir('../data/test/val')
height = 100 #Since the model is 96 pixels, let's assume that it is close to 100 pixels.
width = 100
img_size = 7200 #The image size should be close to 100 pixels.
DIR_OUTPUTS = '../data/test/val/'
#Image splitting function
def ImgSplit(im):
#100 imported images*Divide into 100 pixel size. 72 in size*Divide into 72 sheets
buff = []
#Vertical division number
for h1 in range(int(img_size/height)):
#Horizontal division number
for w1 in range(int(img_size/width)):
w2 = w1 * height
h2 = h1 * width
#print(w2, h2, width + w2, height + h2)
c = im.crop((w2, h2, width + w2, height + h2))
buff.append(c)
return buff
#Execution of image division processing
hi=0
for ig in ImgSplit(img_resize):
hi=hi+1
#print(hi)
#Specifying the save destination folder
ig.save(DIR_OUTPUTS+str(hi)+".png ")
Lists the divided images and performs conversion processing. Since this is a test image, conversion is limited to image size, standardization, and Tensor processing for handling with PyTorch.
#test_Create list
def make_datapath_list_test(phase="test"):
"""
Create a list containing the data paths.
Parameters
----------
phase : 'train' or 'val'
Specify training data or verification data
Returns
-------
path_list : list
A list containing the paths to the data
"""
rootpath = "../data/"
target_path = osp.join(rootpath+phase+'/val/*.png')
print(target_path)
path_list = [] #Store here
#Get the file path to the subdirectory using glob
for path in glob.glob(target_path):
path_list.append(path)
return path_list
#Run
test_list = make_datapath_list_test(phase="test")
#test A class that preprocesses images
# resize,Perform normalize and totnesor.
class ImageTransform_test():
"""
Image preprocessing class. It behaves differently during training and verification.
Resize the image and standardize the colors.
During training, data augmentation is performed with Random Resized Crop and Random Horizontal Flip.
Attributes
----------
resize : int
The size of the image to be resized.
mean : (R, G, B)
The average value for each color channel.
std : (R, G, B)
Standard deviation of each color channel.
"""
def __init__(self, resize, mean, std):
self.data_transform = {
'test': transforms.Compose([
transforms.Resize(resize), #resize
#transforms.CenterCrop(resize), #Cut out the center of the image with resize x resize
transforms.ToTensor(), #Convert to tensor
transforms.Normalize(mean, std) #Standardization
])
}
def __call__(self, img, phase='test'):
"""
Parameters
----------
phase : 'train' or 'val'
Specify the preprocessing mode.
"""
return self.data_transform[phase](img)
The list of test files is not in split order, so use natsort to change them in order. This is the processing required to create a matrix of the number distribution of cars estimated by the model. If you do not have natsort installed, please execute it below.
#!pip install natsort
#Sort the order of files using natsoat
from natsort import natsorted
test_list = natsorted(test_list)
Now, let's check the created test image.
#Check the operation of image preprocessing during training
#The image of the processing result changes each time it is executed
# 1.Image loading
image_file_path = test_list[0]
img = Image.open(image_file_path).convert('RGB') # [height][width][Color RGB]
# 2.Display of original image
plt.imshow(img)
plt.show()
# 3.Image pre-processing and viewing of processed images
size = 96
mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)
transform = ImageTransform(size, mean, std)
img_transformed = transform(img, phase="val") # torch.Size([3, 224, 224])
# (Color, height, width)To(Height, width, color)Convert to 0-Display with limited value to 1
print(img_transformed.shape)
output
It is the upper left image of the entire aerial photograph. If you look at this, you can't see the car. Create a Dataset for this image data.
#Create a Dataset for the test image
class CarCountDataset_test(torch.utils.data.Dataset):
def __init__(self, file_list, transform=None, phase='train'):
self.file_list = file_list #List of file paths
self.transform = transform #Instance of preprocessing class
self.phase = phase #Specifying train or val
def __len__(self):
'''Returns the number of images'''
return len(self.file_list)
def __getitem__(self, index):
'''
Get Tensor format data and labels for preprocessed images
'''
#Load index th image
img_path = self.file_list[index]
img = Image.open(img_path).convert('RGB') # [height][width][Color RGB]
#Perform image preprocessing
img_transformed = self.transform(
img, self.phase) # torch.Size([3, 224, 224])
#Extract the image label from the file name
label == 0
return img_transformed, label
#Run
test_dataset = CarCountDataset_test(
file_list=test_list, transform=ImageTransform(size, mean, std), phase='val')
#Operation check
index = 0
print(test_dataset.__getitem__(index)[0].size())
print(test_dataset.__getitem__(index)[1])
After that, create a Dataloader and estimate the number of cars using the model built earlier.
batch_size =10
test_dataloader = torch.utils.data.DataLoader(
test_dataset, batch_size=batch_size, shuffle=False)
dataiter = iter(test_dataloader)
images, labels = dataiter.next() #Take out only one sample
plt.imshow(np.transpose(images[0], (1, 2, 0))) #Channel at the back
plt.tick_params(labelbottom=False, labelleft=False, bottom=False, left=False) #Hide labels and memory
plt.show()
net.eval() #Evaluation mode
x, t = images.cuda(), labels.cuda() #GPU compatible
print(x.shape)
#x, t = images, labels #CPU compatible
y = net(x)
print("Predicted number of cars:", y[0].argmax().item())
# 2.Display of original image
print('Display of original image')
plt.imshow(img)
plt.show()
The estimation result was that there was no car as expected. Next, the number of images is estimated by the model for all the divided images, and the distribution (line example) is obtained.
#Specify the size of the mini batch
batch_size = 10
#Create a dataloader for the target image
test_dataloader = torch.utils.data.DataLoader(
test_dataset, batch_size=batch_size, shuffle=False)
classes = list(range(13))
class_correct = list(0. for i in range(13))
class_total = list(0. for i in range(13))
net.eval() #Evaluation mode
test =[]
i = 0
for data in test_dataloader:
images, labels = data
x, t = images.cuda(), labels.cuda() #GPU compatible
#x, t = images, labels #CPU compatible
y = net(x)
for i in range(len(y)):
result = y[i].argmax().item() #You need to use argmax to convert the GPU tensor to np.
test.append(result)
print('Number of split images: ',len(test))
When I checked the number of divided images, it was 5184. Next, convert this list to the same matrix as the number of grids in the original image.
test2 = np.array(test)
cars_counted =test2.reshape(72, 72)
cars_counted
Since the 7200 size image is divided by a 100x100 size grid, it becomes a 72x72 matrix, and the output is as follows.
array([[0, 0, 1, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
Next, the process of superimposing the estimated distribution result of the number of cars on the test image is performed. First, import the modules required for image processing.
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import itertools
%matplotlib inline
Next, load the test image.
#Prepare the verification image as the base image.
image_path = val_path + files[0]
mosaic_image = io.imread(image_path)[:, :, :3]
Now, overlay the estimated car distribution map on the original image. Here, I used the code of Counting cars from aerial photographs with Deep Learning that triggered this article. thank you so much. Please refer to the following for details. [COWC Car Counting] (https://github.com/motokimura/cowc_car_counting) MIT License
def get_color_map(sns_palette):
color_map = np.empty(shape=[0, 3], dtype=np.uint8)
for color in sns_palette:
r = int(color[0] * 255)
g = int(color[1] * 255)
b = int(color[2] * 255)
rgb_byte = np.array([[r, g, b]], dtype=np.uint8)
color_map = np.append(color_map, rgb_byte, axis=0)
return color_map
def overlay_heatmap(
cars, background_image, car_max, grid_size, cmap,
line_rgb=[0, 0, 0], line_thickness=2, alpha=0.5, min_car_to_show=1, background_rgb=[0, 0, 0]):
yi_max, xi_max = cars.shape
result = background_image.copy()
heatmap = background_image.copy()
sns_palette = sns.color_palette(cmap, n_colors=car_max + 1)
color_map = get_color_map(sns_palette)
for yi in range(yi_max):
for xi in range(xi_max):
top, left = yi * grid_size, xi * grid_size
bottom, right = top + grid_size, left + grid_size
cars_counted = cars[yi, xi]
if cars_counted < min_car_to_show:
if background_rgb is not None:
heatmap[top:bottom, left:right] = np.array(background_rgb)
else:
heatmap[top:bottom, left:right] = color_map[cars_counted]
if line_thickness > 0:
cv2.rectangle(heatmap, (left, top), (right, bottom), line_rgb, thickness=line_thickness)
cv2.addWeighted(heatmap, alpha, result, 1 - alpha, 0, result)
return result
Save the processed image in the result directory.
if not os.path.exists('../data/result'):
os.mkdir('../data/result')
heatmap_overlayed = overlay_heatmap(cars_counted, mosaic_image, car_max, grid_size, cmap='viridis', line_thickness=-1)
fig = plt.figure(figsize=(15, 15))
plt.imshow(heatmap_overlayed)
plt.imsave('../data/result/heatmap_' + files[0], heatmap_overlayed)
output
This is hard to understand. For details, please check the code uploaded on github.
Next, the estimated number of cars is also displayed on the test image. I also used the provided code.
heatmap_overlayed_2 = overlay_heatmap(cars_counted, mosaic_image, car_max, grid_size, cmap='Reds')
def plot_counts_on_heatmap_2(heatmap_overlayed, aoi_tblr, cars, grid_size, min_car_to_show=1, figsize=(15, 15)):
top, bottom, left, right = aoi_tblr
yi_min, xi_min = int(math.floor(top / grid_size)), int(math.floor(left / grid_size))
yi_max, xi_max = int(math.ceil(bottom / grid_size)), int(math.ceil(right / grid_size))
top, left, bottom, right = yi_min * grid_size, xi_min * grid_size, yi_max * grid_size, xi_max * grid_size
fig = plt.figure(figsize=figsize)
plt.imshow(heatmap_overlayed[top:bottom, left:right])
for (yi, xi) in itertools.product(range(yi_min, yi_max), range(xi_min, xi_max)):
car_num = cars[yi, xi]
if car_num < min_car_to_show:
continue
plt.text(
(xi + 0.5) * grid_size - left, (yi + 0.5) * grid_size - top, format(car_num, 'd'),
horizontalalignment="center", verticalalignment="center", color="black"
)
plt.show()
fig.savefig('../data/result/heatmap_carcount_' + files[0])
top, bottom, left, right = 1000, 4500, 2000, 4500
heatmap_carcount = plot_counts_on_heatmap_2(heatmap_overlayed_2, (top, bottom, left, right), cars_counted, grid_size)
I wonder if it fits roughly. Perhaps it was a car processing plant, some cars were parked in a concentrated manner.
Now, we are ready to build a model to estimate the number of cars from the aerial photograph data, verify with a test image, and superimpose it on the original image.
It's been a long time, but next, let's perform the same processing on the satellite image and create a map of the number of cars.
Get the satellite image to be demonstrated. The resolution of the aerial photograph to be used in the model construction is 15 cm, and I searched for a high-resolution optical observation image that is as close as possible to that resolution. The candidate for the image provided as a commercial satellite is the World View Series of Digital Globe of the United States. The resolution of the observed image of this satellite is 30 cm. High-resolution optical observation images are quite expensive It is not easy to purchase. Therefore, this time, the processing was performed using the sample data that has been released. The sample image uses the image of the following site.
WorldView-3 Satellite Sensor/Satellite image corporation
Now, let's download the data.
#Preparing a directory to store satellite images
if not os.path.exists('../data/test/demo'):
os.mkdir('../data/test/demo')
if not os.path.exists('../data/test/demo/image'):
os.mkdir('../data/test/demo/image')
#Download satellite images (Rio de Janeiro, Brazil)
!wget -P ../data/test/demo https://content.satimagingcorp.com/static/galleryimages/Satellite-Image-2016-Olympics-Rio-De-Janeiro.jpg
Now, check the acquired satellite image and perform preprocessing.
View the image and check the size.
#Get the file path.
test_path = '../data/test/demo/'
files =os.listdir(test_path)
#Get file name
print(files[1])
#Load images with opencv
im = cv2.imread(test_path + files[1])
im_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
#Image.fromarray(im_rgb).save('test.jpg')
plt.imshow(im_rgb)
#Check image size
v_size = im_rgb.shape[0]
h_size = im_rgb.shape[1]
print('v_size:', v_size)
print('h_size:', h_size)
This image is an image taken at the time of the 2016 Rio de Janeiro Olympics. You can see the stadium of Olympic Cook. We were also able to confirm that many cars were parked around the nearby station. Next, this image is divided and the number of cars is estimated using the constructed model that learned aerial photography. The grid size of the image to be divided is 50 pixels in order to match the scale because it has half the resolution of the aerial photograph.
#The image size to be divided is 50 pixels.
vn = v_size //50 #Divided into 100 pixle
hn = h_size // 50
#Resize the image so that it is an integral multiple of the grid size.
img = Image.open(test_path + files[1])
img_size_v = vn * 50
img_size_h = hn * 50
#Image resizing process
img_resize=img.resize((img_size_h,img_size_v))
plt.imshow(img_resize)
Now, let's divide the image and create the usual Dataset and Dataloader.
#Divide the demo image by the standard grid size.
width = 50
height = 50
DIR_OUTPUTS = '../data/test/demo/image/'
#Image splitting function
def ImgSplit(im):
#100 imported images*Divide into 100 pixel size. 72 in size*Divide into 72 sheets
buff = []
#Vertical division number
for h1 in range(int(vn)):
#Horizontal division number
for w1 in range(int(hn)):
w2 = w1 * height
h2 = h1 * width
#print(w2, h2, width + w2, height + h2)
c = im.crop((w2, h2, width + w2, height + h2))
buff.append(c)
return buff
#Execution of image division processing
hi=0
for ig in ImgSplit(img_resize):
hi=hi+1
#print(hi)
#Specifying the save destination folder
ig.save(DIR_OUTPUTS+str(hi)+".png ")
#test_Create list
def make_datapath_list_test(phase="test"):
rootpath = "../data/test/demo/image"
target_path = osp.join(rootpath +'/*.png')
print(target_path)
path_list = [] #Store here
#Get the file path to the subdirectory using glob
for path in glob.glob(target_path):
path_list.append(path)
return path_list
#Run
test_list = make_datapath_list_test(phase="test")
#test A class that preprocesses images
# resize,Perform normalize and totnesor.
class ImageTransform_test():
def __init__(self, resize, mean, std):
self.data_transform = {
'test': transforms.Compose([
transforms.Resize(resize), #resize
#transforms.CenterCrop(resize), #Cut out the center of the image with resize x resize
transforms.ToTensor(), #Convert to tensor
transforms.Normalize(mean, std) #Standardization
])
}
def __call__(self, img, phase='test'):
"""
Parameters
----------
phase : 'train' or 'val'
Specify the preprocessing mode.
"""
return self.data_transform[phase](img)
As with aerial photographs, natsort sorts the files according to the order of the matrix.
#Sort the order of files using natsoat
from natsort import natsorted
test_list = natsorted(test_list)
Let's read the divided image.
#Check the operation of image preprocessing during training
#The image of the processing result changes each time it is executed
# 1.Image loading
image_file_path = test_list[0]
img = Image.open(image_file_path).convert('RGB') # [height][width][Color RGB]
# 2.Display of original image
plt.imshow(img)
plt.show()
# 3.Image pre-processing and viewing of processed images
size = 96
mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)
transform = ImageTransform(size, mean, std)
img_transformed = transform(img, phase="val") # torch.Size([3, 224, 224])
# (Color, height, width)To(Height, width, color)Convert to 0-Display with limited value to 1
print(img_transformed.shape)
This divided image is the image on the upper left of the satellite image, and no car can be found on the mountain road. Now, let's estimate the number of cars using the model constructed in this image. Create Dataset and Dataloader.
#Create a Dataset for the test image
class CarCountDataset_test(torch.utils.data.Dataset):
def __init__(self, file_list, transform=None, phase='train'):
self.file_list = file_list #List of file paths
self.transform = transform #Instance of preprocessing class
self.phase = phase #Specifying train or val
def __len__(self):
'''Returns the number of images'''
return len(self.file_list)
def __getitem__(self, index):
'''
Get Tensor format data and labels for preprocessed images
'''
#Load index th image
img_path = self.file_list[index]
img = Image.open(img_path).convert('RGB') # [height][width][Color RGB]
#Perform image preprocessing
img_transformed = self.transform(
img, self.phase) # torch.Size([3, 224, 224])
#Extract the image label from the file name
label == 0
return img_transformed, label
#Run
test_dataset = CarCountDataset_test(
file_list=test_list, transform=ImageTransform(size, mean, std), phase='val')
#Operation check
index = 0
print(test_dataset.__getitem__(index)[0].size())
print(test_dataset.__getitem__(index)[1])
batch_size =10
test_dataloader = torch.utils.data.DataLoader(
test_dataset, batch_size=batch_size, shuffle=False)
dataiter = iter(test_dataloader)
images, labels = dataiter.next() #Take out only one sample
plt.imshow(np.transpose(images[0], (1, 2, 0))) #Channel at the back
plt.tick_params(labelbottom=False, labelleft=False, bottom=False, left=False) #Hide labels and memory
plt.show()
net.eval() #Evaluation mode
x, t = images.cuda(), labels.cuda() #GPU compatible
print(x.shape)
#x, t = images, labels #CPU compatible
y = net(x)
print("Predicted number of cars:", y[0].argmax().item())
# 2.Display of original image
print('Display of original image')
plt.imshow(img)
plt.show()
The estimation result by the model was 0 units. As expected. Now that the image data for the estimation process has been prepared, the estimation process for all images and the superimposition process on the artificial satellite image will be performed in the same way as for aerial photographs.
Estimate processing is performed using a model constructed from aerial photographs of the divided satellite images.
#Specify the size of the mini batch
batch_size = 10
#Create a dataloader for the target image
test_dataloader = torch.utils.data.DataLoader(
test_dataset, batch_size=batch_size, shuffle=False)
classes = list(range(13))
class_correct = list(0. for i in range(13))
class_total = list(0. for i in range(13))
net.eval() #Evaluation mode
test =[]
i = 0
for data in test_dataloader:
images, labels = data
x, t = images.cuda(), labels.cuda() #GPU compatible
#x, t = images, labels #CPU compatible
#outputs = net(x)
#_, predicted = torch.max(outputs, 1)
#print('predicted:', predicted)
y = net(x)
for i in range(len(y)):
result = y[i].argmax().item() #You need to use argmax to convert the GPU tensor to np.
test.append(result)
print('Number of divided images: ', len(test))
The number of divided images was 11832. Now, let's convert this estimation result into a matrix for superimposing on the satellite image.
test2 = np.array(test)
cars_counted =test2.reshape(int(vn), int(hn))
Next, read the satellite image.
file_path = test_path + files[1]
mosaic_image = io.imread(file_path)[:, :, :3]
mosaic_image=cv2.resize(mosaic_image, (img_size_h,img_size_v))
print(mosaic_image.shape)
And like the aerial photograph, this code [COWC Car Counting] We will use (https://github.com/motokimura/cowc_car_counting) to perform overlay processing on satellite images (heat map focusing on areas with a large number of vehicles).
heatmap_overlayed = overlay_heatmap(cars_counted, mosaic_image, car_max, grid_size, cmap='viridis', line_thickness=-1)
fig = plt.figure(figsize=(15, 15))
plt.imshow(heatmap_overlayed)
plt.imsave('../data/result/heatmap_' + files[1], heatmap_overlayed)
Copyright©2016DigitalGlobe.
You can see that many cars are parked around the station on the upper left of the Olympic stadium. This image may be difficult to understand, so please execute the code yourself or check the Sample image on Github.
Next, display the number of cars as in the aerial photograph.
heatmap_overlayed_2 = overlay_heatmap(cars_counted, mosaic_image, car_max, grid_size, cmap='Reds')
def plot_counts_on_heatmap_2(heatmap_overlayed, aoi_tblr, cars, grid_size, min_car_to_show=1, figsize=(100, 100)):
top, bottom, left, right = aoi_tblr
yi_min, xi_min = int(math.floor(top / grid_size)), int(math.floor(left / grid_size))
yi_max, xi_max = int(math.ceil(bottom / grid_size)), int(math.ceil(right / grid_size))
top, left, bottom, right = yi_min * grid_size, xi_min * grid_size, yi_max * grid_size, xi_max * grid_size
fig = plt.figure(figsize=figsize)
plt.imshow(heatmap_overlayed[top:bottom, left:right])
for (yi, xi) in itertools.product(range(yi_min, yi_max), range(xi_min, xi_max)):
car_num = cars[yi, xi]
if car_num < min_car_to_show:
continue
plt.text(
(xi + 0.5) * grid_size - left, (yi + 0.5) * grid_size - top, format(car_num, 'd'),
horizontalalignment="center", verticalalignment="center", color="black", size=25
)
plt.show()
fig.savefig('../data/result/heatmap_carcount_' + files[1])
top, bottom, left, right = 0, 1550,2000, 4000
heatmap_carcount = plot_counts_on_heatmap_2(heatmap_overlayed_2, (top, bottom, left, right), cars_counted, grid_size)
Copyright©2016DigitalGlobe.
The number of cars displayed is small and difficult to see. The font size can be changed by changing the size of the above code, so please try it. We have also uploaded a sample image on github, so please check that as well.
This article, build an estimation model of the number of cars using aerial photographs by PyTorch, and build a satellite image using the built model. I made a map of the number of cars. If you are targeting satellite images, I think that you can obtain more accurate results if the training images have the same performance (resolution). It takes time, but you can check the number of cars in the divided image and divide it to create learning data. We introduced an example of how to estimate the number of cars in the picture, which is often introduced as a method of using satellite images. I think that the services actually provided are built with a more accurate model. The model introduced here is just one example, and if you try various things such as increasing the types to make the learning image more versatile, devising the Augmentation method, changing the model, etc. It may be interesting. Also, although we targeted the number of cars here, if it is a method of image classification instead of object detection like this time, the same method can be applied to objects other than cars, so it is good to try it. is not it.
As described in the text, it would be interesting to find the time-series changes using multiple images at different times, instead of finding the absolute value of the number of cars. I think that you can grasp trends such as how the distribution changes in a certain area in a year, whether it depends on the time, and the long-term trend at the annual level.
Last but not least, I would like to thank you again for Counting cars from aerial photographs with deep learning that triggered this experiment and article. We hope that this article will serve as a reference for your activities. If you have any mistakes or comments, I would appreciate it.
Counting cars from aerial photographs with Deep Learning How to create aerial image building segmentation with Pytorch. Book "Learn while making! Deep learning by PyTorch" (Yutaro Ogawa, Mynavi Publishing, 19/07/29) Object detection in deep learning Cars Overhead With Context WorldView-3 Satellite Sensor/Satellite image corporation