[PYTHON] Deep learning dramatically makes it easier to see the time-lapse of physical changes

Introduction

"Selfie (body)" is a habit of many trainees (people who love muscle training). It's a blissful time to take a picture of your pumped body after training and look back at it later. In addition, if you animate the captured image like a time lapse, you can see that muscle growth is more pickable! This article uses deep learning to dramatically improve the time-lapse of the body.

First from the result

ezgif.com-optimize (3).gif Changes in the body from December 2017 to March 2020

table of contents

Overview

I created a time lapse from the images I took. However, I was worried about the gap between the images, so I corrected it manually to create a smooth time lapse. Furthermore, in order to save the trouble of manual work, correction was automatically performed using deep learning.

1. Manual correction

1-1. Display as it is

For the time being, let's create a time-lapse that just switches the image as it is continuously.

Time-lapse creation code (part)



#You can make videos with opencv,
#In order to create an mp4 file that can be played on discord in the environment of google colab,
#I enjoyed using skvideo.
import skvideo.io

def create_video(imgs, out_video_path, size_wh):
  video = []
  vid_out = skvideo.io.FFmpegWriter(out_video_path,
      inputdict={
          "-r": "10"
      },
      outputdict={
          "-r": "10"
      })
  
  for img in imgs:
    img = cv2.resize(img, size_wh)
    vid_out.writeFrame(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

  vid_out.close()

imgs = load_images("images_dir")
create_video(imgs,  "video.mp4", (w,h))

The result is as follows.

ezgif.com-crop.gif

I'm worried about the gap and can't concentrate on my child (body).

1-2. Fixing the position

I want to somehow easily eliminate this gap. If I set a reference point somewhere on my body and fix it, I came up with the solution of "nipple" and "navel" in about 0.1 seconds. Here's how to fix your nipples and navel.

1-2-1. Nipple and navel coordinate addition tool

First, create a tool that gives UV coordinates to the nipple and navel. It may be possible to realize it by using cvat etc., but when I estimated the time to master it and the time to make my own tool, I concluded that it is faster to make it myself, so I made it.

The specification of the tool is that if you specify a folder, the images will be displayed continuously, so for each image, click the three points of the nipple and navel, and the clicked coordinates will be output to the csv file. Become. The GUI used tkinter (source is abbreviated).

1-2-2. Video creation

The location of the nipple and navel is fixed by affine transformation according to the first image.

Corrected time-lapse creation code (part)


def p3affine_img(img, src_p, dst_p):
    h, w, ch = img.shape
    pts1 = np.float32([src_p[0],src_p[1],src_p[2]])
    pts2 = np.float32([dst_p[0],dst_p[1],dst_p[2]])
    M = cv2.getAffineTransform(pts1,pts2)
    dst = cv2.warpAffine(img,M,(h, w))
    return dst


df = read_annotationd() #abridgement

imgs = []
src_p = None
for index, row in df.iterrows():
    img = cv2.imread(row.file)
    dst_p = [ [row.p1x, row.p1y], #Left nipple
              [row.p2x, row.p2y], #Right nipple
              [row.p3x, row.p3y]] #navel
    if src_p is None:
      src_p = dst_p
    else:
      img = p3affine_img(img, dst_p, src_p)
    
    imgs.append(img)

write_video(imgs) #abridgement

The results are as follows.

ezgif.com-optimize.gif

I was able to make the time lapse I expected, congratulations. ** Not! ** **

The number of sheets to which coordinates are given this time is 120 (the period is from September 9, 2019 to March 2020). However, I still have 281 images that I have taken since December 2017 and have not been given coordinates. In addition, we have to do muscle training for decades to come, that is, we have to keep giving coordinates for decades. Even just imagining, cortisol is secreted and it falls into catabolic. I thought about supplementing sugar to solve this.

That's right ~~ Let's go to the gym ~~ Deep learning.

2. Automatic correction using deep learning

Make a model to estimate the position of "nipple" and "navel". Once this is achieved, all you have to do is apply the affine transformation as before. Nipple and navel detection is approached as a segmentation task. Key point detection such as posture estimation seems to be better, but I personally have more experience with segmentation tasks, so I chose that.

The dataset is as follows. Since the coordinates have already been assigned from September 9, 2019 to March 2020, this will be used for the training image and verification image to automatically obtain the coordinates for the remaining period.

image.png

2-1. Annotation data creation

It is possible to solve by 4 classifications of "right nipple", "left nipple", "navel" and "background", but this time we have divided into 2 classifications of "right nipple / left nipple / navel" and "background". I thought it would be easy to classify them on a rule basis as long as I could detect three points. Now, let's make a mask image. Based on the coordinate data created earlier, make the coordinate points a little larger and fill them with 1. Other than that, it is the background, so set it to 0.

for index, row in df.iterrows():
  file = row.file
  mask = np.zeros((img_h, img_w), dtype=np.uint8)
  mask = cv2.circle(mask,(row.p1x, row.p1y,), 15, (1), -1)
  mask = cv2.circle(mask,(row.p2x, row.p2y,), 15, (1), -1)
  mask = cv2.circle(mask,(row.p3x, row.p3y,), 15, (1), -1)
  save_img(mask, row.file) #abridgement

Visually (1 is white, 0 is black), the data is as follows.

image.png

Make these pairs with the physical image.

2-2. Learning

For learning, I used DeepLab v3 (torch vision). The 120 images were split at 8: 2 for training and verification. Although the number of sheets is quite small, we did not expand the data for the following reasons.

However, I think it's better to expand the data (it's just not annoying).

Data set class / learning related functions


class MaskDataset(Dataset):
  def __init__(self, imgs_dir, masks_dir, scale=1, transforms=None):
    self.imgs_dir = imgs_dir
    self.masks_dir = masks_dir

    self.imgs = list(sorted(glob.glob(os.path.join(imgs_dir, "*.jpg "))))
    self.msks = list(sorted(glob.glob(os.path.join(masks_dir, "*.png "))))
    self.transforms = transforms
    self.scale = scale

  def __len__(self):
      return len(self.imgs_dir)

  @classmethod
  def preprocess(cls, pil_img, scale):

    #It looks good in grayscale, but I don't do it because it's troublesome
    # pil_img = pil_img.convert("L") 

    w, h = pil_img.size
    newW, newH = int(scale * w), int(scale * h)
    pil_img = pil_img.resize((newW, newH))

    img_nd = np.array(pil_img)

    if len(img_nd.shape) == 2:
      img_nd = np.expand_dims(img_nd, axis=2)

    # HWC to CHW
    img_trans = img_nd.transpose((2, 0, 1))
    if img_trans.max() > 1:
        img_trans = img_trans / 255

    return img_trans

  def __getitem__(self, i):
      
    mask_file = self.msks[i]
    img_file = self.imgs[i]

    mask = Image.open(mask_file)
    img = Image.open(img_file)

    img = self.preprocess(img, self.scale)
    mask = self.preprocess(mask, self.scale)

    item = {"image": torch.from_numpy(img), "mask": torch.from_numpy(mask)}
    if self.transforms:
      item = self.transforms(item)
    return item

from torchvision.models.segmentation.deeplabv3 import DeepLabHead

def create_deeplabv3(num_classes):
  model = models.segmentation.deeplabv3_resnet101(pretrained=True, progress=True)
  model.classifier = DeepLabHead(2048, num_classes)

  #It looks good in grayscale, but I don't do it because it's troublesome
  #model.backbone.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)

  return model

def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=25, print_freq=1):
  since = time.time()

  best_model_wts = copy.deepcopy(model.state_dict())
  best_loss = 1e15

  for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch+1, num_epochs))
    print('-' * 10)

    loss_history = {"train": [], "val": []}
    
    for phase in ["train", "val"]:
        
      if phase == "train":
        model.train()
      else:
        model.eval()

      for sample in tqdm(iter(dataloaders[phase])):
        imgs = sample["image"].to(device, dtype=torch.float)
        msks = sample["mask"].to(device, dtype=torch.float)

        optimizer.zero_grad()

        with torch.set_grad_enabled(phase == "train"):
          outputs = model(imgs)
          loss = criterion(outputs["out"], msks)

          if phase == "train":
            loss.backward()
            optimizer.step()

      epoch_loss = np.float(loss.data)
      if (epoch + 1) % print_freq == 0:
        print("Epoch: [%d/%d], Loss: %.4f" %(epoch+1, num_epochs, epoch_loss))
        loss_history[phase].append(epoch_loss)

      # deep copy the model
      if phase == "val" and epoch_loss < best_loss:
        best_loss = epoch_loss
        best_model_wts = copy.deepcopy(model.state_dict())

  time_elapsed = time.time() - since
  print("Training complete in {:.0f}m {:.0f}s".format(time_elapsed // 60, time_elapsed % 60))
  print("Best val Acc: {:4f}".format(best_loss))

  model.load_state_dict(best_model_wts)
  
  return model, loss_history

Learning execution



dataset = MaskDataset("images_dir", "masks_dir", 0.5, transforms=None)

#Separate for training and verification
val_percent= 0.2
batch_size=4
n_val = int(len(dataset) * val_percent)
n_train = len(dataset) - n_val
train, val = random_split(dataset, [n_train, n_val])
train_loader = DataLoader(train, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True, drop_last=True )
val_loader = DataLoader(val, batch_size=batch_size, shuffle=False, num_workers=8, pin_memory=True, drop_last=True )

dataloaders = {"train": train_loader, "val": val_loader}

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

#When using BCEWithLogitsLoss, specify 1 for binary classification
num_classes = 1 

model = create_deeplabv3(num_classes)

#For pre trained
#model.load_state_dict(torch.load("model.pth"))

model.to(device)

#Since the background is overwhelmingly large, pos_Adjust with weight
criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor(10000.0).to(device))

params = [p for p in model.parameters() if p.requires_grad]

#optimizer = torch.optim.SGD(params, lr=0.005,momentum=0.9, weight_decay=0.0005)
optimizer = optim.Adam(params)

total_epoch = 50

model, loss_dict = train_model(model, criterion, optimizer, dataloaders, device, total_epoch)

This time, when I turned about 50 epochs, the learning converged to some extent.

2-3. Application to unknown images

As a result, it was generally good, and 3 points responded properly, but occasionally there were also the following results (heat map expression).

image.png

Of course, there are never two left nipples, so the small dot on the top right is False Positive. By the way, there was no False Negative.

2-4. Post-processing

From the inference result above, the post-processing does the following:

  1. Truncate the output value of each pixel below the threshold
  2. Split the object
  3. If there are 4 or more clusters, select 3 in descending order of area and discard the rest.
  4. Find the center of gravity of each cluster
  5. Sort in ascending x-coordinate of the center of gravity of each cluster (right nipple → navel → left nipple)

2-4-1. Truncate if the output value of each pixel is below the threshold

Pixels other than those with a clear degree of certainty are truncated for the next step. The threshold this time is empirically set to 0.995.

2-4-2. Divide the object

Use cv2.connectedComponents for object partitioning (splitting into clusters). For details, please refer to How to label connected components with OpenCV --connectedComponents --pynote.

2-4-3. If there are 4 or more clusters, select 3 in descending order of area and discard the rest.

From the case study, it was found that the area of False Positives other than the nipple and navel was small. Therefore, we will select three with a large area. Actually, I don't think this kind of countermeasure is very robust, but this time it worked, so I will adopt it.

2-4-4. Find the center of gravity of each cluster

Use cv2.moments to find the centroid of each cluster. For details, refer to Calculating the center of gravity with Python + OpenCV --Introduction to CV image analysis.

2-4-5. Sort in ascending order of the x-coordinate of the center of gravity of each cluster (right nipple → navel → left nipple)

Since the points need to correspond when affine transformation, it is necessary to unify the coordinate order of the nipple and navel between the images. All of the images this time were taken upright, and there is no doubt that nipples → navels → nipples will appear in the horizontal axis direction, so simply sort by x coordinate.

At the time of reasoning



#3 points detected from mask
def triangle_pt(heatmask, thresh=0.995):
  mask = heatmask.copy()

  # 2-4-1.If the output value of each pixel is below the threshold, it will be truncated.
  mask[mask>thresh] = 255
  mask[mask<=thresh] = 0
  mask = mask.astype(np.uint8)
  # 2-4-2.Object split
  nlabels, labels = cv2.connectedComponents(mask)

  pt = []
  if nlabels != 4:

    #If less, do nothing
    #I really want to lower the threshold, but it's annoying
    if nlabels < 4:
      return None
    
    # 2-4-3.If there are 4 or more clusters, select 3 in descending order of area and discard the rest
    elif nlabels > 4:
      sum_px = []
      for i in range(1, nlabels):
        sum_px.append((labels==i).sum())
      #Background+1
      indices = [ x+1 for x in np.argsort(-np.array(sum_px))[:3]]

  else:
    indices = [x for x in range(1, nlabels)]

  # 2-4-4.Find the center of gravity of each cluster
  for i in indices:
    base = np.zeros_like(mask, dtype=np.uint8)
    base[labels==i] = 255
    mu = cv2.moments(base, False)
    x,y= int(mu["m10"]/mu["m00"]) , int(mu["m01"]/mu["m00"])
    pt.append([x,y])

  # 2-4-5.Sort in ascending x-coordinate of the center of gravity of each cluster (right nipple → navel → left nipple)
  sort_key = lambda v: v[0]
  pt.sort(key=sort_key)
  return np.array(pt)


def correct_img(model, device, in_dir, out_dir, 
                draw_heatmap=True, draw_triangle=True, correct=True):

  imgs = []

  base_3p = None
  model.eval()
  with torch.no_grad():
    imglist = sorted(glob.glob(os.path.join(in_dir, "*.jpg ")))
    
    for idx, img_path in enumerate(imglist):

      #Batch size 1 because it's annoying
      full_img = Image.open(img_path)
      img = torch.from_numpy(BasicDataset.preprocess(full_img, 0.5))
      img = img.unsqueeze(0)
      img = img.to(device=device, dtype=torch.float32)

      output = model(img)["out"]
      probs = torch.sigmoid(output)
      probs = probs.squeeze(0)

      tf = transforms.Compose(
                [
                    transforms.ToPILImage(),
                    transforms.Resize(full_img.size[0]),
                    transforms.ToTensor()
                ]
            )
      
      probs = tf(probs.cpu())
      full_mask = probs.squeeze().cpu().numpy()

      full_img = np.asarray(full_img).astype(np.uint8)
      full_img = cv2.cvtColor(full_img, cv2.COLOR_RGB2BGR)

      #triangle
      triangle = triangle_pt(full_mask)
      if draw_triangle and triangle is not None:
        cv2.drawContours(full_img, [triangle], 0, (0, 0, 255), 5)

      #Heat map
      if draw_heatmap:
        full_mask = (full_mask*255).astype(np.uint8)
        jet = cv2.applyColorMap(full_mask, cv2.COLORMAP_JET)

        alpha = 0.7
        full_img = cv2.addWeighted(full_img, alpha, jet, 1 - alpha, 0)

      #Affine transformation
      if correct:
        if base_3p is None and triangle is not None:
          base_3p = triangle
        elif triangle is not None:
          full_img = p3affine_img(full_img, triangle, base_3p)

      if out_dir is not None:
        cv2.imwrite(os.path.join(out_dir, os.path.basename(img_path)), full_img)

      imgs.append(full_img)

  return imgs

imgs = correct_img(model, device,
                   "images_dir", None,
                    draw_heatmap=False, draw_triangle=False, correct=True)

2-5. Results

The time lapse just before the correction is as follows.

ezgif.com-optimize (1).gif

The corrected time lapse is as follows.

ezgif.com-optimize (2).gif

Summary

By using deep learning to detect nipples and navels and automatically correcting the image, the time lapse is dramatically easier to see. This further motivated me to train. Of course, some people might think ** "Isn't it possible with such a non-deep CV?" **, but in my case, if I had time to think about the rules, I would like to raise a barbell. It feels like a solution with brute force. All development was done with google colab except the coordinate giving tool, 3150 u! The challenge is

However, cortisol is secreted, so don't worry about it being too hard!

Let's have a fun muscle training life!

Recommended Posts

Deep learning dramatically makes it easier to see the time-lapse of physical changes
You who color the log to make it easier to see
A story that makes it easier to see Model debugging in the Django + SQLAlchemy environment
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Visualize the effects of deep learning / regularization
I tried the common story of using Deep Learning to predict the Nikkei 225
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
Build a python environment to learn the theory and implementation of deep learning
How to install the deep learning framework Tensorflow 1.0 in the Anaconda environment of Windows
The story of doing deep learning with TPU
See the behavior of drunkenness with reinforcement learning
The background of the characters in the text image is overexposed to make it easier to read.
Count the number of parameters in the deep learning model
Techniques for understanding the basis of deep learning decisions
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
Deep nesting in Python makes it hard to read
[Deep learning] Investigating how to use each function of the convolutional neural network [DW day 3]
Judge whether it is my child from the picture of Shiba Inu by deep learning (1)
Deep learning 1 Practice of deep learning
[Part 4] Use Deep Learning to forecast the weather from weather images
[Part 1] Use Deep Learning to forecast the weather from weather images
Try to evaluate the performance of machine learning / regression model
[Part 3] Use Deep Learning to forecast the weather from weather images
Make the display of Python module exceptions easier to understand
One liner that formats JSON to make it easier to see
Try to evaluate the performance of machine learning / classification model
How to increase the number of machine learning dataset images
Attempt to automatically adjust the speed of time-lapse movies (Part 2)
[Part 2] Use Deep Learning to forecast the weather from weather images
How to see the contents of the Jupyter notebook ipynb file
[Machine learning] I tried to summarize the theory of Adaboost
I captured the Touhou Project with Deep Learning ... I wanted to.
I want to know the legend of the IT technology world
Chapter 1 Introduction to Python Cut out only the good points of deep learning made from scratch
Microsoft's Deep Learning framework "CNTK" is now compatible with Python, making it much easier to use