[PYTHON] Deep Learning Specialization (Coursera) Self-study record (C2W3)

Introduction

This is the content of Course 2, Week 3 (C2W3) of Deep Learning Specialization.

(C2W3L01) Tuning process

Contents

--Explanation of how to tune Hyperparameter --The importance of Hyperparameter is as follows ――The most important - \alpha ――Second important - \beta (\sim 0.9) - #hidden_units - mini-batch size ――Third important - #layers - learning rate decay --Do not tune --Adam optimization algorithm $ \ beta_1 $, $ \ beta_2 $, $ \ epsilon $

--When trying Hyperparameter, try random values, don't use a grid --Coarse to Fine; If you find a value that looks good, scrutinize it in the vicinity.

(C2W3L02) Using an appropriate scale to pich hyperparameter

Contents

--Hyperparameters that can randomly select values on a linear scale - n^{[l]} - #layers L --Hyperparameters whose values should be randomly selected on the log scale - \alpha ; 0.0001 ~ 1 - \beta ; 0.9 ~ 0.999

r = -4 \ast \textrm{np.random.rand()} \\
\alpha = 10^r \\
r = -2 \ast \textrm{np.random.rand()} - 1 \\
1-\beta = 10^r\\
\beta = 1-10^r

(C2W3L03) Hyperparameter Tuning in Practice : Panda vs. Caviar

Contents

--Focus on one model and tune; babysitting one model (Panda) --Tuning many models in parallel (Caviar)

--When you have enough resources; Caviar --Large amount of data, large model; Panda

(C2W3L04) Normalizing Activations in a Network

Contents

--By applying the normalizing applied to input data to the hidden layer, the learning speed of $ W $ and $ b $ will be increased. --For $ z ^ {(i)} $ of hidden layer $ l $

\mu = \frac{1}{m}\sum_{i} z^{(i)} \\
\sigma^2 = \frac{1}{m} \sum_{i} \left( z^{(i)} - \mu \right)^2 \\
z^{(i)}_{norm} = \frac{z^{(i)} - \mu}{\sqrt{\sigma^2 + \epsilon}} \\
\tilde{z}^{(i)} = \gamma z^{(i)}_{norm} + \beta

-$ \ gamma $ and $ \ beta $ are learnable parameters -Set the mean and variance of $ z ^ {(i)} $ to the desired values in $ \ gamma $ and $ \ beta $

(C2W3L05) Fitting Batch Norm into a Neural Network

Contents

--Explanation of calculation method using Batch Norm --Normal; $ z ^ {[l]} \ rightarrow a ^ {[l]} $

Impressions

――Since various calculation methods have come out, honestly, I can not digest it (the content is not difficult, but there is a lot of volume)

(C2W3L06) Why Does Batch Norm Work?

Contents

--Intuitive explanation of why batch norm is used? --Applying batch norm to the mini-batch of interest has the effect of mixing some noise (like dropout).

Impressions

――Honestly, I didn't understand

(C2W3L07) Batch Norm at Test Time

Contents

-$ \ mu $ and $ \ sigma ^ 2 $ are calculated for each mini-batch. --But if the number of data is small at the time of testing, $ \ mu $ and $ \ sigma ^ 2 $ use exponentially weighted average (across mini-batch).

(C2W3L08) Softmax regression

Contents

--multi-class classification problem

z^{[L]} = W^{[L]} a^{[L-1]} + b^{[L]} \\
t = e^{z^{[L]}} \ \textrm{(element-wise)}\\
a^{[L]} = \frac{e^{z^{[L]}}}{\sum^C_{j=1}t_i}

--If there is no hidden layer, the image of dividing the area by a straight line --If it is a complicated neural network, it becomes a complicated boundary

(C2W3L09) Training a softmax classification

Contents

L\left( \hat{y}, y \right) = - \sum^{C}_{j=1} y_j \log \hat{y}_j\\
J = \frac{1}{m} \sum^m_{i=1} L\left( \hat{y}^{(i)}, y^{(i)} \right)

(C2W3L10) Deep Learning Frameworks

Contents

(C2W3L11) Tensorflow

Contents

-Explanation of how to use TensorFlow with the theme of finding $ w $ that minimizes $ J (w) = w ^ 2 -10w + 25 $ --If you set a cost function, back prop will be implemented automatically.

reference

-Deep Learning Specialization (Coursera) Self-study record (table of contents)

Recommended Posts

Deep Learning Specialization (Coursera) Self-study record (C3W1)
Deep Learning Specialization (Coursera) Self-study record (C1W3)
Deep Learning Specialization (Coursera) Self-study record (C4W3)
Deep Learning Specialization (Coursera) Self-study record (C1W4)
Deep Learning Specialization (Coursera) Self-study record (C1W2)
Deep Learning Specialization (Coursera) Self-study record (C3W2)
Deep Learning Specialization (Coursera) Self-study record (C2W2)
Deep Learning Specialization (Coursera) Self-study record (C4W1)
Deep Learning Specialization (Coursera) Self-study record (C2W3)
Deep Learning Specialization (Coursera) Self-study record (C4W2)
Learning record
Learning record # 1
Learning record # 2
Deep Learning
Learning record of reading "Deep Learning from scratch"
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
"Deep Learning from scratch" self-study memo (unreadable glossary)
"Deep Learning from scratch" Self-study memo (9) MultiLayerNet class
Deep Learning Memorandum
Python Deep Learning
Deep learning × Python
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
"Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation
"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
Learning record so far
First Deep Learning ~ Struggle ~
Python: Deep Learning Practices
Go language learning record
Deep learning / activation functions
Deep Learning from scratch
Learning record 4 (8th day)
Learning record 9 (13th day)
Learning record 3 (7th day)
Deep learning 1 Practice of deep learning
Deep learning / cross entropy
Learning record 5 (9th day)
Learning record 6 (10th day)
First Deep Learning ~ Preparation ~
Programming learning record day 2
First Deep Learning ~ Solution ~
Learning record 8 (12th day)
Learning record 1 (4th day)
Learning record 7 (11th day)
I tried deep learning
Learning record 2 (6th day)
Deep learning large-scale technology
Linux learning record ① Plan
Learning record 16 (20th day)
Learning record 22 (26th day)
Deep learning / softmax function
"Deep Learning from scratch" self-study memo (No. 18) One! Meow! Grad-CAM!
"Deep Learning from scratch" self-study memo (No. 19-2) Data Augmentation continued
"Deep Learning from scratch" self-study memo (No. 15) TensorFlow beginner tutorial