[PYTHON] Image recognition model using deep learning in 2016

At first

Hello, you belong to CodeNext, it is @ aiskoaskosd. I have been indebted to Chainer on a regular basis, so I thought it would be great if I could give back, so I wrote an article. Today, I will focus on the image recognition model that has become a hot topic in the last 1-2 years, publish the implementation, and explain the contents of some papers. Some papers before 2013 are also exceptionally implemented. ** 22 out of 24 models were implemented in Chainer. ** Unfortunately, as of December 22, all implementations and verification with cifar10 have not been completed. We will update it one by one. I think there are some misinterpretations and implementation mistakes. In that case, I would be very happy if you could tell me.

paper

1. Netowork In Network 2. Very Deep Convolutional Networks for Large-Scale Image Recognition 3. Going deeper with convolutions 4. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 5. Rethinking the Inception Architecture for Computer Vision 6. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification 7. Training Very Deep Networks 8. Deep Residual Learning for Image Recognition 9. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning 10. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size 11. Identity Mappings in Deep Residual Networks 12. Resnet in Resnet: Generalizing Residual Architectures 13. Deep Networks with Stochastic Depth 14. Swapout: Learning an ensemble of deep architectures 15. Wide Residual Networks 16. FractalNet: Ultra-Deep Neural Networks without Residuals 17. Weighted Residuals for Very Deep Networks 18. Residual Networks of Residual Networks: Multilevel Residual Networks 19. Densely Connected Convolutional Networks 20. Xception: Deep Learning with Depthwise Separable Convolutions 21. Deep Pyramidal Residual Networks 22. Neural Architecture Search with Reinforcement Learning 23. Aggregated Residual Transformations for Deep Neural Networks 24. Deep Pyramidal Residual Networks with Separated Stochastic Depth

Introductory paper date model Number of parameters(10^6) cifar10 total accuracy paper(%) cifar10 total accuracy implementation imagenet top-5 error(%)
1 131116 Caffe implementation reference 0.1 91.19 \bigtriangleup(90.10) \times
1 131116 Caffe implementation reference with BN 0.1 No paper exists 91.52% No paper exists
2 140904 Model A 129 \times 92.1(Model A) 6.8(Model E)
3 140917 googlenet 6 \times 91.33% 6.67
4 150211 inceptionv2 10 \times 94.89% 4.9
5 151202 inceptionv3(reference) 22.5 \times 94.74% 3.58
6 150206 model A 43.9(global average pooling instead of spp) \times 94.98% 4.94
7 150722 Highway(Fitnet19) 2.8 92.46 \bigcirc(93.35%,However, BN is attached and the configuration of the higway part is different.) \times
8 151210 ResNet110 1.6 93.57 \bigcirc(93.34%) 3.57
9 160223 inception v4 \times \times \phi 3.1
10 160224 Squeezenet with BN 0.7 82%(alexnet without data augmentation) \bigcirc(92.63%) 17.5(withoutBNandsingle)
11 160316 ResNet164 1.6 94.54 \bigcirc(94.39%) 4.8(single)
12 160325 18-layer + wide RiR 9.5 94.99 \bigtriangleup(94.43%) \times
13 160330 ResNet110 1.7 94.75 \bigcirc(94.76%) \times
14 160520 Swapout v2(32)W×4 7.1 95.24 \bigcirc(95.34%) \times
15 160523 WRN28-10 36.2 96.0 \bigcirc(95.76%) \times
16 160524 20 layers 33.7 95.41 \bigtriangleup(93.77%) 7.39%(FractalNet-34)
17 160528 WResNet-d 19.1 95.3 \times \times
18 160809 RoR-3-WRN58-4 13.6 96.23 \times \times
19 160825 k=24, depth=100 27.0 96.26 \bigcirc 95.12%(k=12, depth=40) \times
20 161007 xception \times \times \phi 5.5(single)
21 161010 \alpha = 270 28.4 96.23 \bigcirc(95.9%) 4.7(\alpha = 450)
22 161105 depth=49 32(From the dissertation) 96.16 $\bigtriangleup$90.35(Appendix A: 4.1M) \times
23 161116 ResNeXt-29, 16×64d 68.3 96.42 \bigcirc(95.72%: 2x64d) \times
24 161205 depth=182, \alpha=150 16.5 96.69 \times \times

Implementation

https://github.com/nutszebra/chainer_image_recognition

Be careful when using a model that has not been verified

Numerical sense of cifar10

At the moment, I think that 97% or more will be SoTA. As far as I know, the highest accuracy is 96.69%. It may be time to focus on cifar100 or another dataset.

Recent trends

I think this year was the year of the Resnet family. The characteristic point is that depth = accuracy is over. Although googlenet and others have been insisting for a long time, various papers have shown that ** if it is deep to some extent, the accuracy will be higher if the width is wider than the depth **. From around March, the result that it is better to widen the width of Resnet has come out as a side effect, and I think that it became decisive in Wide Residual Netowrks released on May 23. I think it is clear this year that width is important. Looking at the paper from a bird's-eye view, it seems that the Resnet family was mostly a ** Res block modified **.

It's hard to tell which is the best derivative of this Res block. As everyone thinks, in the paper, the number of parameters and FLOPS of forward are different in each model, so it does not make much sense to simply compare the accuracy. Therefore, it is difficult to understand which method is essential and in the right direction even after reading the paper. I think there is currently a need for a rule that everyone builds a model with a certain metric tied up and then posts the test accuracy of a single model (like FLOPS?). All the papers are in a difficult state to evaluate, but I think the overall tendency is that ** ReLU is not applied to the final output value of the Res block **. I think it is better to base the Res block of BN-ReLU-Conv-BN-ReLU-Conv proposed in 11 instead of the original 8 at present. It's a personal impression, but it seems like a calm year when the accuracy has improved steadily. I don't think there are any new structures coming out this year, such as residual. Imagenet 2016 also saw a lot of ensembles based on Residual Networks and Inception.

Impressions

I personally think that the google paper from 22 is very shocking and there is a little possibility that something will develop from here (there is no basis). 22 is searching the network with 800 gpu and RNN + policy gradient, and has recorded 96.16% with cifar10. If mnist is 99% with this, it's over, but it is very amazing that cifar 10 has a value close to SoTA. Letting the data determine the network structure is a very fascinating idea, reminiscent of the advent of DNN (making feature design data). Also, I didn't introduce it here, but I remember that HyperNetworks was very interesting and shocked when I read it. HyperNetworks is a network that generates network weights, and is a technology that is likely to be fused or developed in the future. If there is nothing, it seems that the way of remodeling the Res block and connecting the Res block will develop in the future, but what will happen?

Introduction of some papers

4. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Batch normalization is, in a nutshell, "normalize each input between batches at the channel level". This Batch Normalization is very useful. There is no reason not to be on the network. Convergence is no longer possible and accuracy is slightly higher. Batch Normalization is important, so I will explain it firmly. First, I will explain the phenomenon called internal covariate shift, which is the basis of the motivation for Batch Normalization, and the specific algorithm.

Internal covariate shift

Suppose the weight of layer $ l_ {i} $ is changed by the error backpropagation method. Then, the distribution of the output value (the nature of the output) that is output using that weight changes. Layer $ l_ {i + 1} $ must learn the proper nonlinear map, corresponding to the distribution of the changed output values. What's wrong here is that learning a non-linear map for discrimination is very slow because of the great effort it takes to learn to adapt to the changing distribution of outputs. This phenomenon is defined in the paper as internal covariate shift. This internal covariate shift causes learning to stagnate. Let's assume that the learning coefficient of SGD is set large. Then, the weight of layer $ l_ {i} $ changes greatly. As a result, layer $ l_ {i + 1} $ cannot adapt to the changed output value (or it takes a tremendous amount of time to adapt), and learning stagnates. As a result, the learning coefficient must be set small in order to learn, and this time learning will slow down the convergence of learning. If you stand there, it feels like you can't stand here. This internal covariate shift becomes a more serious problem as the model gets deeper. Even if the output change in the lower layer is slight, it will be amplified in the upper layer, and even a small change will become a big change. It's like a butterfly effect. The solution to this is very simple. If the distribution of output values changes, you can adjust it each time so that the distribution is the same. BN (Batch Normalization) normalizes the input (mean 0, variance 1) and outputs it. Therefore, when the output of BN is used as an input, the distribution of the output becomes stable and the need for learning to respond to changes in the output distribution is reduced. You can focus on learning the non-linear maps that you essentially have to do, and as a result the learning converges quickly. Furthermore, since the distribution is stable, the learning coefficient can be set large. This also contributes significantly to the rapid convergence of learning. If you include this BN, the learning time of GoogLeNet will be about 7%. It's amazing.

Batch Normalization algorithm

Input is $ x_ {i, cxy} $. This means that the input is in the $ (x, y) $ position of channel $ c $ in batch $ i $. Let the mean on channel $ c $ be $ \ mu_ {c} $ and the variance on channel $ c $ be $ \ sigma_ {c} ^ 2 $. Then, when the number of batches is $ m $, the input height is $ Y $, and the input width is $ X $, $ \ mu_ {c} $ and $ \ sigma_ {c} ^ 2 $ can be expressed as follows. I will.

\begin{equation} \mu_{c} = \frac{1}{mXY}\sum_{i=1}^{m} \sum_{y=1}^{Y} \sum_{x=1}^{X} x_{i,cxy} \tag{4.1} \end{equation}
\begin{equation} \sigma_{c}^2 = \frac{1}{mXY}\sum_{i=1}^{m} \sum_{y=1}^{Y} \sum_{x=1}^{X} (x_{i,cxy} -\mu_c)^2 \tag{4.2} \end{equation}

Looking at equations (4.1) and (4.2), we can see that the mean $ \ mu_ {c} $ and the variance $ \ sigma_ {c} ^ 2 $ are calculated for each channel between batches. This is not learning. Next, let $ \ hat {x_ {i, cxy}} $ be the normalized version of each input $ x_ {i, cxy} $. Here we define the scale $ \ gamma_ {c} $ and the shift $ \ beta_ {c} $ for each channel. $ \ Gamma_ {c} $ and shift $ \ beta_ {c} $ are parameters learned by the error backpropagation method. The reason for introducing such a thing will be described later. If you enter $ x_ {i, cxy} $ in Batch Normalization, the final output $ y_ {i, cxy} $ and the normalized value $ \ hat {x_ {i, cxy}} $ Is defined as follows.

\begin{equation} \hat{x_{i,cxy}} = \frac{x_{i,cxy} - \mu_{c}}{\sqrt{\sigma_{c}^2 + \epsilon}} \tag{4.3} \end{equation}
\begin{equation} y_{i,cxy} = BN(x_{i,cxy}) = \gamma_{c} \hat{x_{i,cxy}} + \beta_{c} \tag{4.4} \end{equation}

You can see that equation (4.3) simply normalizes $ x_ {i, cxy} $ using equations (4.1) and (4.2) (variance $ \ sigma_ {c} ^ 2 $ is 0). If $ \ hat {x_ {i, cxy}} $ becomes infinite, add a small number $ \ epsilon $. Chainer uses $ 2.0 \ times 10 ^ {-5} $ as the default value It is.). The question here is equation (4.4), where I am linearly mapping with $ \ gamma_ {c} $ and shift $ \ beta_ {c} $, but equation (4.3) already normalizes the main enclosure. I'm done, what are you doing? Let $ \ gamma_ {c} = \ sigma_ {c}, \ beta_ {c} = \ mu_ {c} $. Then, when $ \ epsilon $ is small and ignored, $ y_ {i, cxy} $ will be as follows. $\begin{equation} y_{i,cxy} = \gamma_{c} \hat{x_{i,cxy}} + \beta_{c} = \sigma_{c} \times \frac{x_{i,cxy} - \mu_{c}}{\sqrt{\sigma_{c}^2}} + \mu_{c} = x_{i,cxy} \tag{4.5} \end{equation}$ Looking at equation (4.5), the normalized $ \ hat {x_ {i, cxy}} $ is $ \ gamma_ {c} = \ sigma_ {c} ^ 2, \ beta_ {c} = \ mu_ {c } $ Returns the original input $ x_ {i, cxy} $. By introducing $ \ gamma_ {c} $ and shift $ \ beta_ {c} $, this aims to retain significant features that disappear due to normalization. The only parameters learned by Batch Normlization are scale $ \ gamma_ {c} $ and shift $ \ beta_ {c} $. $ x_ {i, cxy}, \ mu_ {c}, \ sigma_ {c} ^ 2, \ hat {x_ {i, cxy}}, \ gamma_ {c}, \ beta_ {c} $ are differentiable. .. The derivation is in the paper. When I confirmed it, it was certainly differentiable. If you are interested, please try to derive it. Batch Normalization can be described in one line with Chainer. It's easy and very good.

5. Rethinking the Inception Architecture for Computer Vision The network itself is a graceful extension of googlenet. The good thing about this paper is that it verbalizes the network design policy. Especially before ** downsampling, it is very important to increase the number of channels on the network **.

6. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification This is a ReLU extension of xavier initialization, which is the standard initialization method of the current DNN. It is also called ReLU initialization, msra initialization, He initialization, etc. This xavier initialization is a very revolutionary method of calculating the weight variance so that the output variance value does not change during the forward. Thing. As data-driven initialization, All you need is a good init · Data-dependent Initializations of Convolutional Neural Networks /1511.06856) etc. were proposed this year, but this is also an extension of xavier initialization. I basically use the variance values calculated by msra initialization to generate diagonalized random matrices and use them as the initial values of the weights. I am. This paper proposes PReLU, which is an extension of ReLU other than initialization. The content is simple, such as changing the x <0 part to ax instead of 0 like ReLU. At this time, a is learned by the error back propagation method. The interesting thing about this paper is the learned value of a. The result is: Selection_038.png It is a large a in the initial layer, and becomes a small value in the upper layer. It seems that the initial layer retains the information and discards it as it goes up. It can be seen that the initial layer retains linearity and becomes non-linear as it goes up. This result is very close to CReLU. CReLU is a non-linear function of the idea of outputting a concatenation of ReLU (x) and ReLU (-x). Another interesting point is that the value of a increases before downsampling (pool). Information is dropped in the pool, so it looks like you're trying not to drop it. Looking at the value of a, I feel like I can understand the feelings of CNN, and I like it.

7. Training Very Deep Networks It's called Highway networks. Highway networks are mathematically as follows. $y=H(x, W_h)T(x, W_T)+xC(x, W_c)$ $ H (x, W_h) $ is a normal nonlinear function. $ T (x, W_T) $ is a non-linear function, $ C (x, W_c) $ Is a function that calculates how much to load the input x. Residual Networks is a simplification of $ T (x, W_T) = 1, C (x, W_c) = 1 $. Therefore, it is often said that Residual Networks is a simplification of Highway Networks. In the paper, we are building a network in the form of $ C (x, W_c) = 1-T (x, W_T), 0 \ le T (x, WT) \ le 1 $. The interesting thing about this paper is that it was observed that the accuracy of the network hardly changed even if the upper layer of the learned network was removed. In fact, the same phenomenon has been confirmed in Residual Networks, and it is known that the accuracy decreases when too many upper layers are tampered with, but the accuracy does not change even if several layers are removed or shuffled [[. Residual Networks Behave Like Ensembles of Relatively Shallow Networks]. I was surprised when I learned this.

8. Deep Residual Learning for Image Recognition It is a network that became the champion of ILSVRC2015 in the recognition department. It has a simplified structure of Highway Netoworks (7), and the characteristic part is called residual. Most of the networks that came out this year will be based on this and improved. Selection_056.png The figure above clearly shows the residual structure. Simply add the input x to F (x) with the nonlinear function applied. This F is composed of several layers of conv, BN, and ReLU. Below is an overview of the network. Selection_057.png

10. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size This squeezenet is a very cospa model. A module called fire module has been proposed, and a network is built using it. The fire module looks like this: Selection_039.png The idea of the fire module is very simple, such as inputting the output of 1x1conv with dimensionality reduction (reducing the number of input channels in the next stage) to 3x3conv and 1x1conv, and concatenating the respective outputs. If the network in the paper is given BN, the weight number is about 0.7M and 92.6% can be obtained with cifar10. If you're solving 2-100 class image recognition tasks in practice, this is pretty much a piece. 11. Identity Mappings in Deep Residual Networks A paper that has tried several types of Res blocks and has shown that BN-ReLU-Conv-BN-ReLU-Conv was good. Generally speaking, Residual Networks now refers to this. Selection_049.png

13. Deep Networks with Stochastic Depth It's called stochastic depth, and it's a regularization method that stochastically drops out Res blocks. This has already been adopted in multiple paper models and seems to have had some effect. However, there is a report that the normal stochastic depth was not effective at 24, so the evaluation for stochastic depth is tentative and should be watched. The method of decreasing the dropout probability for the Res block from the lower layer to the upper layer seems to be the most accurate. In the paper, set a drop probability of about 1 (do not drop) in the bottom layer and about 0.5 (drop with a 50% probability) in the upper layer, and apply the drop probability calculated linearly to those values to the middle layer. The way to do it is taken in the paper and the result is the best. Selection_048.png

14. Swapout: Learning an ensemble of deep architectures This is a method to drop out independently the residual part of the Res block and the output value of the nonlinear function. It is expressed by a mathematical formula as follows. $y=p_{1}F(x)+ p_{2}x$ $ y $ is the output of the Res block, $ x $ is the input, $ F $ is the nonlinear function, $ p_ {1}, p_ {2} $ is the Bernoulli distribution and takes an output value of 0 or 1. What this means is that the figure below is intuitive. Selection_040.png E in the figure is swap out, but as you can see, you can see that the output value is 0, x, F (x), F (x) + x. The above formula only expresses that. swapout is a network that applies this to all Res blocks. Assuming that p outputs 1 with a probability of T and 0 with a probability of 1-T, the pattern of increasing the value of T in the lower layers and decreasing it in the upper layers works well.

15. Wide Residual Networks The paper says that if you increase the width of the Res block instead of making it deeper, the accuracy will increase. The entire network looks like the figure below. Selection_058.png

The network configuration is determined by the number of blocks N and the width parameter k. The pattern with 4 for N and 10 for k works best in the paper. Of course, this alone is not a paper, so I am also verifying which Res block will give accuracy. The conclusion shows that it was good to use two 3x3 convs inside the Res block. It was interesting to find that the performance deteriorates in the case of stacking 1, 3 or 4 instead of stacking 2 3x3conv. It doesn't work with cifar10, but for tasks such as cifar100, inserting dropout between convs in the Res block seems to improve the accuracy. If you set it to wide, the learning speed will be slower (8 times the learning speed of Resnet-1001), and it is possible to learn even if you use up to 5 times the number of parameters compared to normal Resnet. Learning is really fast when you try to learn with the model you actually built. With cifar10, the accuracy of about 60% comes out at the 1st epoch.

16. FractalNet: Ultra-Deep Neural Networks without Residuals 16 is a fractal net, which has no residual and is characterized by a structure that takes an average value and a fractal network. The title has without Residuals to take the average value. The fractal structure is easy to understand in the figure below. Selection_047.png Configure your network with the Fractal Expansion Rule as shown. In this paper, the mean value is applied to the output from the two convs (join layer in the figure). I am very interested in what happens to the accuracy of the network when joining is retained (in the paper, only the average is applied to join). The title of the dissertation says without residuals, but I feel that averaging is essentially the same as residual. The author claims it is different, but I personally have doubts. ~~ What I found interesting when learning networks was that learning stagnated for the first 20 epochs (described in the paper). This is a behavior I haven't seen much. ~~ When I increased the learning coefficient, I learned from the 1st epoch (tears). 17. Weighted Residuals for Very Deep Networks When H is a nonlinear function, x is an input, $ \ alpha $ is a learning parameter, and y is the output of a Res block, the main idea of this paper is to define a Res block like the following formula. $y=\alpha H(x)+ x$ The diagram is as follows. Selection_052.png The interesting thing about this paper is the value of $ \ alpha $ learned by the error backpropagation method. Selection_054.png Looking at the figure, it can be seen that the value of $ \ alpha $ tends to be large in the upper layer. It seems that the non-linearity is actively utilized in the upper layer because H (x) is greatly added. However, the output value of H (x) may be very small, and there is no verification of that part in the paper. Selection_053.png The figure above shows how $ \ alpha $ changed. What is interesting is that it satisfies the symmetry. I don't know why this happens, but it's very interesting.

18. Residual Networks of Residual Networks: Multilevel Residual Networks It's a simple idea to put skip connections from other layers as well. It is easy to see the figure below. This is the only idea. Selection_055.png

19. Densely Connected Convolutional Networks 19 is Dense net, which is characterized by connecting the output to the input and putting it in the block. The output recursively becomes the next input, as shown in the figure below. Selection_045.png It seems to be doing essentially the same thing as a Residual network, except that it recursively concatenates the outputs instead of the residual.

20. Xception: Deep Learning with Depthwise Separable Convolutions It is a network configured by replacing the conv of the Res block with a separable conv. separable conv is a channel wise conv and 1x1 conv applied in order, and has the feature that the number of weights can be reduced. A channel wise conv is a conv that does not look at the correlation between channels, and is a conv that outputs without summing after calculating the convolution window. As claimed in this paper, it was observed that a vgg-based network could be comparable to inception v4 without a residual structure. However, it is also verified in this paper that the convergence is faster with the residual structure. Is it a feeling that the residual structure should be silently put in?

21. Deep Pyramidal Residual Networks The simple idea is to gradually increase the number of output channels in the Res block. In general Residual Networks, when applying stride2 conv as a down sample in the Res block, the number of channels is doubled in advance. pyramidal networks will be a network like d in the figure, which will gradually increase the number of channels instead of rapidly increasing the number of channels before downsample. Selection_041.png

22. Neural Architecture Search with Reinforcement Learning This is a paper that google uses 800 gpu and RNN to generate a network. Learn the RNN network that spits out the appropriate CNN network from the error backpropagation method using the accuracy of the validation data and the policy gradient. What is surprising is that cifar 10 is as accurate as SoTA. It seems that the reason why it worked was that the generated network was simplified (filter is only 3,5,7, etc.) and that the generated network was simply evaluated by SGD. The generated network is very interesting. The following is the generated lightweight network. Selection_043.png The points where the arrows meet are connected. There is no residual structure. Similar in configuration to the 19 dense net. Since the data determines the network structure, it is a very complex input that humans cannot understand. The interesting thing is that removing or increasing this arrow reduced the test accuracy. I feel that the more freedom you give to network generation, the more accurate it will be. However, if the degree of freedom is too high, it seems that the network cannot be generated properly, so hardware for adjustment work and experimentation is essential. I think there are only a few environments in the world where you can verify this experiment. .. ..

23. Aggregated Residual Transformations for Deep Neural Networks Selection_050.png In a network called ResNeXt, replace the Res block on the left in the figure above with something on the right. The Res block on the right of the above figure is equivalent to (a), (b), (c) in the figure below. Since there is no group conv in chainer, I implemented it in (b). Selection_051.png In the paper, the number of input branches in the Res block is called cardinality, and increasing the cardinality and block width appropriately is compared with Wide Residual Networks, which has the same number of parameters and simply increased the block width. It claims to be accurate.

24. Deep Pyramidal Residual Networks with Separated Stochastic Depth 24 is 21 pyramidal net with separated stochastic depth applied. Separated stochastic depth is the idea of applying stochastic depth independently to the part where the channel increases. It looks like the figure below. Selection_044.png

Recommended Posts

Image recognition model using deep learning in 2016
Deep learning image recognition 2 model implementation
Deep learning image recognition 3 after model creation
Deep learning image recognition 1 theory
Implementation of Deep Learning model for image recognition
Read & implement Deep Residual Learning for Image Recognition
Face image dataset sorting using machine learning model (# 3)
Count the number of parameters in the deep learning model
Sine wave prediction using RNN in deep learning library Keras
I tried hosting a TensorFlow deep learning model using TensorFlow Serving
Real-time image recognition on mobile devices with TensorFlow learning model
Classify CIFAR-10 image datasets using various models of deep learning
I tried deep learning using Theano
Creating a learning model using MNIST
Handwriting recognition using KNN in Python
Image recognition of fruits using VGG16
[AI] Deep Learning for Image Denoising
Image recognition
Deep Learning
Dealing with tensorflow suddenly stopped working using GPU in deep learning
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 3 [Character recognition using a model]
Python: Basics of image recognition using CNN
Category estimation using docomo's image recognition API
Deep learning learned by implementation 2 (image classification)
Python: Application of image recognition using CNN
Stock price forecast using deep learning (TensorFlow)
Image alignment: from SIFT to deep learning
Image recognition using CNN Horses and deer
"Deep Learning from scratch" in Haskell (unfinished)
Benefits of using slugfield in Django's model
I tried using the trained model VGG16 of the deep learning library Keras
I tried hosting Pytorch's deep learning model using TorchServe on Amazon SageMaker
Let's make Godzilla's image recognition model preprocessing, learning and deployment feel good
Data supply tricks using deques in machine learning
An amateur tried Deep Learning using Caffe (Introduction)
How to code a drone using image recognition
Machine Learning: Image Recognition of MNIST by using PCA and Gaussian Native Bayes
An amateur tried Deep Learning using Caffe (Practice)
Pattern recognition learning in video Part 1 Field of Pattern Recognition
Deep Learning Memorandum
Get image URL using Flickr API in Python
Start Deep learning
Inflated learning image
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
Do image recognition with Caffe model Chainer Yo!
Python: Deep Learning in Natural Language Processing: Basics
Python Deep Learning
[Causal search / causal reasoning] Execute causal search (SAM) using deep learning
An amateur tried Deep Learning using Caffe (Overview)
Deep learning × Python
Stock price forecast using deep learning [Data acquisition]
Make inferences using scikit-learn's trained model in PySpark
First deep learning in C #-Imitating implementation in Python-
Thinking about party attack-like growth tactics using deep learning
Model construction for face image dataset sorting-VGG19 transfer learning (# 2)
Implemented continuous learning using Mahalanobis distance in feature space
Deep learning image analysis starting with Kaggle and Keras
[Anomaly detection] Detect image distortion by deep distance learning
Model using convolutional neural network in natural language processing
[Django] Implement image file upload function without using model
Machine Learning with Caffe -1-Category images using reference model