Start studying: Saturday, December 7th
Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): 12/7 (Sat) -12/19 (Thu) read ・ Progate Python course (5 courses in total): 12/19 (Thursday) -12/21 (Saturday) end ・ Andreas C. Müller, Sarah Guido "(Japanese title) Machine learning starting with Python" (O'Reilly Japan, 2017): 12/21 (Sat) -December 23 (Sat) ・ Kaggle: Real or Not? NLP with Disaster Tweets: Posted on Saturday, December 28th to Friday, January 3rd Adjustment ・ Wes Mckinney "(Japanese title) Introduction to data analysis by Python" (O'Reilly Japan, 2018): 1/4 (Wednesday) to 1/13 (Monday) read ・ Yasuki Saito "Deep Learning from Zero" (O'Reilly Japan, 2016): 1/15 (Wed) -1/20 (Mon)
1/20 (Monday) End of reading.
・ The deeper neural network (deep learning) is the one that has increased the number of CNN layers mentioned in Chapter 7.
-Problems such as the MNIST dataset, which have little benefit in terms of scale, but greatly contribute to improving recognition accuracy in larger-scale general object recognition.
-Ensembles and Data Augmentation also contribute to improving recognition accuracy. Data Augmentation refers to increasing the number of images (increasing the volume of an image) by adding rotation or minute changes in the vertical and horizontal directions to the input image.
・ Advantages of deepening the layer 1 The number of parameters can be reduced. By stacking small filters to deepen the network, it is possible to cover a wide range of local spatial regions (receptive fields) that cause changes in neurons.
・ Advantages of deepening the layer 2 Increases learning efficiency. By deepening the network, the problem to be learned can be decomposed hierarchically and tackled as a simpler problem. Specifically, in the first layer, only the edges are emphasized for learning, and in the second and subsequent layers, more advanced patterns can be efficiently learned based on the information learned in the previous layer.
・ The reason why these advantages are brought about is that the technology and environment (big data, GPU, etc.) that can learn correctly even if the layers are deepened have been developed.
-** VGG **: A basic CNN consisting of a convolution layer and a pooling layer Repeat the process of continuously performing convolution layers with a small filter and halving the size with the pooling layer. Finally, the result is output via the fully connected layer. A base used by many engineers due to its simple structure and high versatility.
-** GoogLeNet **: A network with depth (spread) not only in the vertical direction but also in the horizontal direction Multiple filters and poolings of different sizes are applied, and what is called an inception structure that combines the results is used as one building block.
-** ResNet **: Network with skip structure (also called bypass or shortcut) By adopting a mechanism to output Gassan across the convolution layer of input data, it is possible to learn efficiently even if the layer is deepened. By flowing the input data as it is, there is no worry that the gradient will become small, it can be expected that a meaningful gradient will be transmitted, and there is a mitigation effect on the gradient disappearance problem caused by deepening the layer. Has been done.
・ In deep learning, it is necessary to perform a large amount of product-sum calculation (or matrix calculation). GPU is a big help for this The GPU was used as a dedicated board for graphics, but it is also good at a large amount of parallel operations, so it has a great influence on the development of deep learning. (On the contrary, the CPU is good at continuous and complicated calculations.)
-Practical example 1: Object detection Problem of classifying the image including the position identification of the object
-Practical example 2: Segmentation The problem of classifying images at the pixel level At the time of output, all pixels are classified.
-Practical example 3: Image caption generation A model called NIC (Neural Image Caption) is famous It consists of a deep CNN and an RNN (Recurrent Neural Network) for handling natural language. By extracting a feature from the image by CNN and passing it to RNN, the input text is recursively generated with this feature as the initial value. Processing by combining a plurality of types of information as in this practical example is called multimodal processing. It has been attracting attention in recent years.
・ Practical example 4: Automatic operation Among the constituent technologies, it contributes greatly to the technology that recognizes the surroundings, which is said to be particularly important. By becoming able to robustly recognize the track environment in any situation, it will be closer to realization.
・ Practical example 5: Deep Q-Network (reinforcement learning) The basic framework is that what is called an agent selects an action according to the situation of the environment, and the environment changes according to that action. Due to changes in the environment, agents will receive rewards (observation results) to determine future action guidelines. AlphaGo is also one of them.
Recommended Posts