[PYTHON] What I learned by implementing how to create a Default Box for SSD

I tried to analyze how to create a Default Box for SSD

A few months ago, I was a little worried about the lack of knowledge about SSD (Single Shot Multibox Detector), so I decided to build my SSD little by little while looking at the paper and the implementation I found on Github etc. It was. To be honest, it's not over yet, but I've learned about how to create a Default Box (Prior), an important feature for SSDs, so I'd like to share it with you.

What is Default Box?

If you're reading this article, you probably know what Default Box is, but I'll explain it a bit just in case. The SSD outputs a convolved image called Feature Map when processing the image. The Feature Map, number and size are specified in the model settings, but basically it is about 5 or 6. In order to recognize the object from the Feature Map, the area where the object is likely to appear is specified, and that area is called the Default Box (sometimes called Prior) and is used for classification and regression. I am.

It looks like this when you visualize one set of Default Box. With my model settings, a total of 8732 will be created. example.png This is an image of COCO Dataset 2017.

Calculation explanation of Default Box of SSD paper

According to the paper's explanation, in order to calculate the Default Box, we need Scale (there is no specific definition, but something like the size of an object, Aspect Ratios (Aspect Ratio of Default Box) and the size of Feature Map.

In the paper, scale is defined by this function. m is the number of feature maps and k is the number of feature maps. And s_min and s_max are determined by the size of the object in the image. I would like to say that there is a way to decide, but it seems that there is no particular way.

The function itself looks a bit complicated, but in a nutshell it divides s_min and s_max equally into the same amount as the number of feature maps. image.png For example, if s_min = 0.2, s_max = 0.9 and m = 6, then s_k would be [0.2, 0.34, 0.48, 0.62, 0.76, 0.9]. Each scale is equally separated by 0.14.

Aspect Ratios, like s_min and s_max, are determined by the object. The paper states that the {1, 2, 3, 1/2, 1/3} Aspect Ratio was used. It was a little unclear just by looking at the paper, but (2, 1/2) and (3, 1/3) are like a combination, so if you are talking about reducing the Aspect Ratio, basically 3 and 1/3 It means that Aspect Ratio is not used.

Feature Maps was the most (personally) obvious part, but it's simply the size of the convolution layer output that you pass to the classification and regression heads in the SSD. For example, in my model the first output is 38 x 38 and the last output is 1 x 1.

Once you have the necessary parts above, you can calculate the Default Box. This function calculates the height and width of the Default Box.

image.png image.png

And now we can calculate the cx and cy points of the middle point of the Default Box. F_k here is simply the size of the feature map (eg 38).

image.png

By the way, the pattern [cx, cy, w, h] is common when representing the Default Box.

Create Default Box

The calculations for cx, cy and w, h described above are done for each aspect ratio in each feature map. However, 1 is a little exception. If it is 1, it calculates two Default Boxes. It is calculated with the Default Box calculated with a normal scale and another scale. Another method of calculating scale is defined by the following function. It can be calculated by the scale of the current feature map and the scale one level higher.

image.png

There is one Default Box for the Aspect Ratio of 1. For {1, 2, 3, 1/2, 1/3}, each block creates six Default Boxes. In the case of {1, 2, 1/2}, four are created.

Differences between SSD papers and SSD implementations

I've looked at some implementations to understand how to create a Default Box, but at some point I didn't understand after reading the paper.

First of all, we often see a variable called steps. f_k is calculated by dividing the image size by step. There is no particular explanation anywhere and it is not written in the paper, but steps is calculated by dividing the size of the image by the size of the feature map. For example, dividing 300 by 38 gives 7.89-> 8. steps: [8, 16, 32, 64, 100, 300]

The other thing that got stuck is the Aspect Ratio setting. There are many ways to write like this. aspect_ratios: [[2], [2,3], [2,3], [2,3], [2], [2]] I didn't have 1/2 or 1/3, so I thought, "What's this?", But I can simply wash 2 and 1/2 with only 2, and express 3 and 1/3 with only 3. [2,3] means {1, 2, 3, 1/2, 1/3}.

It is common to write such settings for scales. scales: [30, 60, 111, 162, 213, 264, 315] This is the image size multiplied by the already calculated s_k. If s_k is 0.1, the value of scales will be 30.

Finally, this may seem obvious to some, but when calculating the 2 and 1/2 Default Boxes, only 2 w and h are calculated, and the 1/2 Default Box is 2 h. Use as w and use w as h. The reason is that sqrt (2) == 1 / sqrt (1/2) and sqrt (1/2) == 1 / sqrt (2).

Finally

Thank you for reading to the end! My mother tongue is not Japanese, so there may be some things I can't explain well or use strange words. If you have any questions, please comment and I will try to answer as much as possible!

I'd like to post more and more while implementing SSD, so please look forward to the next post!

Recommended Posts

What I learned by implementing how to create a Default Box for SSD
How to create a shortcut command for LINUX
How to create a Kivy 1-line input box
What I thought and learned to study for 100 days at a programming school
[Go] How to create a custom error for Sentry
How to create a local repository for Linux OS
How to create a SAS token for Azure IoT Hub
I tried to create a bot for PES event notification
I want to create a Dockerfile for the time being.
How to create a Conda package
How to create a virtual bridge
How to create a Dockerfile (basic)
How to create a config file
How to create a label (mask) for segmentation with labelme (semantic segmentation mask)
What I learned by writing a Python Pull Request for the first time in my life
I tried to create a linebot (implementation)
How to create a clone from Github
How to create a git clone folder
I tried to create a linebot (preparation)
How to create * .spec files for pyinstaller.
How to create a repository from media
I want to create a nice Python development environment for my new Mac
I tried to explain what a Python generator is for as easily as possible.
I tried to create a button for Slack with Raspberry Pi + Tact Switch
I want to easily create a Noise Model
How to create a Python virtual environment (venv)
How to create a function object from a string
How to write a ShellScript Bash for statement
I want to create a plug-in type implementation
I read "How to make a hacking lab"
[Note] How to create a Ruby development environment
How to create a multi-platform app with kivy
How to create a Rest Api in Django
[Note] How to create a Mac development environment
What you can understand because you are a beginner How to create a file (first post)
How to create a Python 3.6.0 environment by putting pyenv on Amazon Linux and Ubuntu
I tried to create a reinforcement learning environment for Othello with Open AI gym
I want to create a lunch database [EP1] Django study for the first time
I want to create a lunch database [EP1-4] Django study for the first time
How to create a property of relations that can be prefetch_related by specific conditions
I forgot to operate VIM, so I made a video for memorization. 3 videos by level
What I learned by launching a photo site using administrative data and multiple APIs
Create a dataset of images to use for learning
I thought about how to learn programming for free.
I want to manually create a legend with matplotlib
How to save a table scraped by python to csv
How to build a development environment for TensorFlow (1.0.0) (Mac)
How to create a simple TCP server / client script
[Python] How to create a 2D histogram with Matplotlib
Compare how to write processing for lists by language
What I learned by participating in the ISUCON10 qualifying
How to create a kubernetes pod from python code
I tried to create a RESTful API by connecting the explosive Python framework FastAPI to MySQL.
How to set up WSL2 on Windows 10 and create a study environment for Linux commands
How to create a record by pasting a relation to the inheriting source Model in the Model inherited by Django
I want to create a karaoke sound source by separating instruments and vocals using Python
[Python] How to make a list of character strings character by character
I tried to automatically create a report with Markov chain
How to manage a README for both github and PyPI
I made a box to rest before Pepper gets tired
How to create a flow mesh around a cylinder with snappyHexMesh