This is the content of Course 2, Week 3 (C2W3) of Deep Learning Specialization.
(C2W3L01) Tuning process
--Explanation of how to tune Hyperparameter
--The importance of Hyperparameter is as follows
――The most important
-
--When trying Hyperparameter, try random values, don't use a grid --Coarse to Fine; If you find a value that looks good, scrutinize it in the vicinity.
(C2W3L02) Using an appropriate scale to pich hyperparameter
--Hyperparameters that can randomly select values on a linear scale
-
r = -4 \ast \textrm{np.random.rand()} \\
\alpha = 10^r \\
r = -2 \ast \textrm{np.random.rand()} - 1 \\
1-\beta = 10^r\\
\beta = 1-10^r
(C2W3L03) Hyperparameter Tuning in Practice : Panda vs. Caviar
--Focus on one model and tune; babysitting one model (Panda) --Tuning many models in parallel (Caviar)
--When you have enough resources; Caviar --Large amount of data, large model; Panda
(C2W3L04) Normalizing Activations in a Network
--By applying the normalizing applied to input data to the hidden layer, the learning speed of $ W $ and $ b $ will be increased. --For $ z ^ {(i)} $ of hidden layer $ l $
\mu = \frac{1}{m}\sum_{i} z^{(i)} \\
\sigma^2 = \frac{1}{m} \sum_{i} \left( z^{(i)} - \mu \right)^2 \\
z^{(i)}_{norm} = \frac{z^{(i)} - \mu}{\sqrt{\sigma^2 + \epsilon}} \\
\tilde{z}^{(i)} = \gamma z^{(i)}_{norm} + \beta
-$ \ gamma $ and $ \ beta $ are learnable parameters -Set the mean and variance of $ z ^ {(i)} $ to the desired values in $ \ gamma $ and $ \ beta $
(C2W3L05) Fitting Batch Norm into a Neural Network
--Explanation of calculation method using Batch Norm --Normal; $ z ^ {[l]} \ rightarrow a ^ {[l]} $
tf.nn.batch_normalization
.――Since various calculation methods have come out, honestly, I can not digest it (the content is not difficult, but there is a lot of volume)
(C2W3L06) Why Does Batch Norm Work?
--Intuitive explanation of why batch norm is used? --Applying batch norm to the mini-batch of interest has the effect of mixing some noise (like dropout).
――Honestly, I didn't understand
(C2W3L07) Batch Norm at Test Time
-$ \ mu $ and $ \ sigma ^ 2 $ are calculated for each mini-batch. --But if the number of data is small at the time of testing, $ \ mu $ and $ \ sigma ^ 2 $ use exponentially weighted average (across mini-batch).
(C2W3L08) Softmax regression
--multi-class classification problem
z^{[L]} = W^{[L]} a^{[L-1]} + b^{[L]} \\
t = e^{z^{[L]}} \ \textrm{(element-wise)}\\
a^{[L]} = \frac{e^{z^{[L]}}}{\sum^C_{j=1}t_i}
--If there is no hidden layer, the image of dividing the area by a straight line --If it is a complicated neural network, it becomes a complicated boundary
(C2W3L09) Training a softmax classification
L\left( \hat{y}, y \right) = - \sum^{C}_{j=1} y_j \log \hat{y}_j\\
J = \frac{1}{m} \sum^m_{i=1} L\left( \hat{y}^{(i)}, y^{(i)} \right)
(C2W3L10) Deep Learning Frameworks
(C2W3L11) Tensorflow
-Explanation of how to use TensorFlow with the theme of finding $ w $ that minimizes $ J (w) = w ^ 2 -10w + 25 $ --If you set a cost function, back prop will be implemented automatically.
-Deep Learning Specialization (Coursera) Self-study record (table of contents)
Recommended Posts