Previously, using ** Deep Distance Learning ** ["Image Anomaly Detection"](https://qiita.com/shinmura0/items/06d81c72601c7578c6d3#%E3%83%99%E3%83%B3%E3% 83% 81% E3% 83% 9E% E3% 83% BC% E3% 82% AF) was performed. After that, ** the latest method of deep distance learning ** "AdaCos" appeared.

In this article, I would like to apply AdaCos to "anomaly detection" for a simple benchmark. The entire code is here

This is the presentation material of Python Data Analysis Study Group # 15.

From the conclusion

By applying AdaCos to anomaly detection, we found the following:

Accuracy is equivalent to ArcFace
Thank you for not having to tune the parameters

AdaCos Although AdaCos is said to be the latest method for deep-distance learning, as of January 2020, it has been more than half a year since the paper was published. It has passed. However, as far as I know, it is still in the frame of "pure deep distance learning". It seems to be SOTA.

AdaCos is a method based on ArcFace etc., and the parameters used in ArcFace etc. are used. It is a method to decide automatically. The accuracy of ArcFace changes dramatically depending on the parameter selection, so Parameter tuning was very severe. However, with the introduction of AdaCos, it automatically Being able to determine the parameters frees you from this tuning task.

In addition, it has been confirmed that the accuracy is also improved, which greatly improves work efficiency and accuracy. This is a highly useful method. For details, refer to the following articles.

Read AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations

Constraint

There are two types of AdaCos, Fixed and Dynamic. The method used in this paper is "Fixed Ada Cos".

Also, AdaCos can only be applied when the number of classes is 3 or more. This becomes a bottleneck when applying Self-supervised learning, but the consideration there is I will do it in Next article. This is not a problem because the number of classes in this experiment is 9.

Experiment

[Previous experiment](https://qiita.com/shinmura0/items/06d81c72601c7578c6d3#%E3%83%99%E3%83%B3%E3%83%81%E3%83%9E%E3%83 Perform the experiment under the same conditions as% BC% E3% 82% AF).

The code of AdaCos itself is almost the same as that of ArcheFace. However, the method of giving parameters has been changed to the AdaCos style.

The entire code is here

conditions

AdaCos epoch is 10, optimization method is Adam
Batch size is 128
Base model is MobileNet V2 (using trained model, that is, transfer learning)
Try 10 times to calculate AUC
Data uses Fashion-MNIST and cifar-10

Fashion-MNIST
The breakdown of the data is as follows. Normal is "sneakers", abnormal is "boots".

	Quantity	Number of classes	Remarks
Reference data for learning	8000	8	Excluding sneakers and boots
Normal data for learning	1000	1	sneakers
Test data (normal)	1000	1	sneakers
Test data (abnormal)	1000	1	boots

Learning reference data is data for comparison with normal learning data.

cifar-10
The breakdown of the data is as follows. Normal is "deer", abnormal is "horse".

	Quantity	Number of classes	Remarks
Reference data for learning	8000	8	Excluding deer and horse
Normal data for learning	1000	1	deer
Test data (normal)	1000	1	deer
Test data (abnormal)	1000	1	Horse

Fashion-MNIST results

"L2-Softmax Loss" and "ArcFace" show the results of the previous experiment. The result of "AdaCos" is the result of this experiment.

The median "AdaCos" is now about ** the same AUC as ArcFace. ** **

After all, I am grateful that parameter tuning is no longer necessary.

Results of CIFAR-10