Deep Learning von Grund auf neu Kapitel 5

Es scheint, dass die Kapitel 5 bis 7 in diesem Buch wichtig sind. Beschreiben Sie separat mit Schwerpunkt

Zwei Möglichkeiten, die Backpropagation-Methode zu verstehen ・ Verständnis nach "mathematischer Formel" ・ Verständnis durch "Rechengraph"

Dieses Buch erklärt letzteres

Berechnungsdiagramm

Berechnungsdiagramm: Eine grafische Darstellung des Berechnungsprozesses Grafik: Grafik als Datenstruktur, dargestellt durch mehrere Knoten und Kanten Die folgende Abbildung ist ein Berechnungsdiagramm, wenn Sie einen Apfel von 100 Yen pro Stück kaufen und die Verbrauchssteuer 10% beträgt. Vorwärtsausbreitung: Ausbreitung vom Startpunkt zum Endpunkt des Berechnungsgraphen Rückausbreitung: Rückwärtsausbreitung der Vorwärtsausbreitung

Funktionen des Berechnungsdiagramms: Weitergabe von "lokalen Berechnungen", um das Endergebnis zu erhalten In der obigen Abbildung wurden nur Äpfel verwendet, aber es wird kompliziert, wenn andere Einkäufe getätigt werden. Unabhängig davon, was das Ganze tut, können Sie die folgenden Ergebnisse nur anhand der Informationen erhalten, die sich auf Sie beziehen (Apfel im Beispiel).

Vorteile des Berechnungsgraphen: "Differenzierung" kann durch Ausbreitung in die entgegengesetzte Richtung effizient berechnet werden

Backpropagation des berechneten Graphen

Es braucht Zeit, um den Gradienten der Verlustfunktion des Gewichtsparameters des neuronalen Netzwerks durch numerische Differenzierung zu berechnen. Daher wird das Fehlerrückausbreitungsverfahren durchgeführt. Fehlerausbreitungsmethode: Eine Methode zur effizienten Berechnung des Gradienten von Gewichtsparametern

Rückausbreitung unter der Annahme, dass es eine Berechnung gibt y = f (x)

Additionsschicht

z=x+Die Differenzierung von y ist\\
\frac{\partial z}{\partial x} = 1 \\
\frac{\partial z}{\partial y} = 1

Wenn dies im Berechnungsdiagramm angezeigt wird

In Code

class AddLayer:
    #Konstrukteur
    def __init__(self):
        self.x = None
        self.y = None
    
    def forward(self, x, y):
        self.x = x
        self.y = y
        out = x+y
        
        return out

    def backward(self, dout):
        dx = dout * 1
        dy = dout * 1
        
        return dx, dy

Schicht multiplizieren

z=x*Die Differenzierung von y ist\\
\frac{\partial z}{\partial x} = y \\
\frac{\partial z}{\partial y} = x

Wenn dies im Berechnungsdiagramm angezeigt wird

In Code

class MulLayer:
    #Konstrukteur
    # self =Java dies
    def __init__(self):
        self.x = None
        self.y = None
    
    def forward(self, x, y):
        self.x = x
        self.y = y
        out = x*y
        
        return out

    def backward(self, dout):
        dx = dout * self.y
        dy = dout * self.x


        return dx, dy

Relu Schicht

y = \left\{
\begin{array}{ll}
x & (x \geq 0) \\
0 & (x \lt 0)
\end{array}
\right.

\frac{\partial y}{\partial x} = \left\{
\begin{array}{ll}
1 & (x \geq 0) \\
0 & (x \lt 0)
\end{array}
\right.

In Code

#ReLU-Schicht
class Relu:
    def __init__(self):
        self.mask = None
    
    def forward(self, x):
        self.mask = (x<=0)
        out = x.copy()
        out[self.mask] = 0

        return out

    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout
    
        return dx

Sigmoidschicht

Sigmoidfunktion y=\frac{1}{1+\exp(-x)} \\

Ergänzung zum Berechnungsdiagramm


\begin{align}

f(x)&=-x \\
\Rightarrow　f'(x)&=-1\\
f(x)&=\exp(x) \\
\Rightarrow　f'(x)&=\exp(x)\\
f(x)&=x+1 \\
\Rightarrow　f'(x)&=1\\
f(x)&=1/x \\
\Rightarrow　f'(x)&=-1/x^2=-f(x)^2\\

\end{align}

\begin{align}

\frac{\partial L}{\partial y}y^2\exp(-x) &=\frac{\partial L}{\partial y}y\frac{\exp(-x)}{1+\exp(-x)} \\
&=\frac{\partial L}{\partial y}y(1-y) \\

\end{align}

In Code

#Sigmoidschicht
class Sigmoid:
    def __init__(self):
        self.out = None
        
    def forward(self, x):
        out = 1 / (1 + np.exp(-x))
        self.out = out
        
        return out

    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.out
        
        return dx

Affine Schicht

In Code

#Stapelversion Affine Ebene
class Affine:
    def __init__(self, W, b):
        self.W = W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None
    
    def forward(self, x):
        self.x = x
        out = np.dot(x, self.W) + self.b
        
        return out

    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.dot(dout, axis=0)
        
        return dx

Beweis (einfacher Fall von N = 1)


\begin{align}
\frac{\partial L}{\partial Y} \cdot W^T&=

\bigr(\begin{matrix}
\frac{\partial L}{\partial y_1} & 
\frac{\partial Y}{\partial y_2}& 
\frac{\partial Y}{\partial y_2}
\end{matrix}\bigr)
\Biggl(
\begin{matrix}w_{11} & w_{21} \\
w_{12} & w_{22} \\ 
w_{13} & w_{23} 
\end{matrix}\Biggr)\\

&=\bigl(\begin{matrix}
\frac{\partial L}{\partial y_1}w_{11}+
\frac{\partial L}{\partial y_2}w_{12}+
\frac{\partial L}{\partial y_3}w_{13} &
\frac{\partial L}{\partial y_1}w_{21}+
\frac{\partial L}{\partial y_2}w_{22}+
\frac{\partial L}{\partial y_3}w_{23}
\end{matrix}\bigr)\\
&=\bigl(\begin{matrix}
\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_1}
+\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_1}
+\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_1} &
\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_2}
+\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_2}
+\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_2}
\end{matrix}\bigr)\\
&=\bigl(
\begin{matrix}
\frac{\partial L}{\partial Y}\frac{\partial Y}{\partial x_1} &
\frac{\partial L}{\partial Y}\frac{\partial Y}{\partial x_2} 
\end{matrix}\bigr)\\
&=\frac{\partial L}{\partial X}\\


X^T \cdot \frac{\partial L}{\partial Y} 
&=\Bigl(\begin{matrix}
x_1\\
x_2
\end{matrix}\Bigr)
\cdot
\bigr(\begin{matrix}
\frac{\partial L}{\partial y_1} &
\frac{\partial L}{\partial y_2} &
\frac{\partial L}{\partial y_3}
\end{matrix}\bigr)\\
&=
\bigr(\begin{matrix}
x_1\frac{\partial L}{\partial y_1} &
x_1\frac{\partial L}{\partial y_2} &
x_1\frac{\partial L}{\partial y_3}\\
x_2\frac{\partial L}{\partial y_1} &
x_2\frac{\partial L}{\partial y_2} &
x_2\frac{\partial L}{\partial y_3}
\end{matrix}\bigr)\\
&=
\bigr(\begin{matrix}
\frac{\partial L}{\partial y_1}x_1 &
\frac{\partial L}{\partial y_2}x_1 &
\frac{\partial L}{\partial y_3}x_1\\
\frac{\partial L}{\partial y_1}x_2 &
\frac{\partial L}{\partial y_2}x_2 &
\frac{\partial L}{\partial y_3}x_2
\end{matrix}\bigr)\\
&=
\bigr(\begin{matrix}
\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_{11}} &
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_{12}} &
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_{13}}\\
\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_{21}} &
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_{22}} &
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_{23}}
\end{matrix}\bigr)\\
&=
\bigr(\begin{matrix}
\frac{\partial L}{\partial w_{11}} &
\frac{\partial L}{\partial w_{12}} &
\frac{\partial L}{\partial w_{13}}\\
\frac{\partial L}{\partial w_{21}} &
\frac{\partial L}{\partial w_{22}} &
\frac{\partial L}{\partial w_{23}}
\end{matrix}\bigr)\\
&=\frac{\partial L}{\partial W}\\


Berechnungszusatz\\
Y&=X \cdot W+B\\
y_i&=x_1w_{1i}+x_2w_{2i}+b_i\\
Beispiel)\frac{\partial y_3}{\partial w_{23}}&=x_2

\end{align}

Softmax-with-Loss-Schicht

\begin{align}

&(y_1, y_2, y_3):Softmax-Layer-Ausgabe\\
&(t_1, t_2, t_3):Lehrerdaten\\
\\
&Mit anderen Worten(y_1-t_1, y_2-t_2, y_3-t_3)Ist\\
&Unterschied zwischen der Ausgabe der Softmax-Ebene und der Lehrerbezeichnung\\

\end{align}

# SoftmaxWithLoss
class SofmaxWithLoss:
    def __init(self):
        self.loss = None
        self.y = None
        self.t = None
    
    def forward(self, x, t):
        self.t = t
        self.y = sofmax(x)
        self.loss = cross_entropy_error(self.y, self.t)
        
        return self.loss

    def backward(self, dout=1):
        batch_size = self.t.shape[0]
        dx - (self.y - self.t) / bath_size
        
        return dx

Implementierung der Fehlerrückverbreitungsmethode

Implementierung eines neuronalen Netzwerks, das die Methode der Fehlerrückübertragung unterstützt

Ein neuronales Netzwerk kann erstellt werden, indem einfach die erforderlichen Schichten wie ein Legoblock zu den obigen Schichten hinzugefügt werden. Fügen Sie einige Kommentare hinzu, während die Quelle aktiv ist

# coding: utf-8
import sys, os
sys.path.append(os.pardir)  #Einstellungen zum Importieren von Dateien in das übergeordnete Verzeichnis
import numpy as np
from common.layers import *
from common.gradient import numerical_gradient
from collections import OrderedDict


class TwoLayerNet:

    #-------------------------------------------------
    # __init__:Initialisieren
    #     @self
    #     @input_size:Anzahl der Neuronen in der Eingabeschicht
    #     @hidden_size:Anzahl der Neuronen mit versteckter Schicht
    #     @output_size:Anzahl der Neuronen in der Ausgabeschicht
    #     @weight_init_std:Gaußsche Verteilungsskala bei Gewichtsinitialisierung
    #-------------------------------------------------
    def __init__(self, input_size, hidden_size, output_size, weight_init_std = 0.01):


        # params:Variable vom Typ Wörterbuch, die die Parameter des neuronalen Netzes enthält
        #Gewichtsinitialisierung
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size) 
        self.params['b2'] = np.zeros(output_size)

        # layer:"Geordnete" Variablen vom Typ Wörterbuch, die Schichten des neuronalen Netzwerks enthalten
        #Schichterzeugung:Es geht darum, in der richtigen Reihenfolge zu speichern
        #Infolgedessen ist es in Ordnung, die Ebene nur für die Vorwärtsausbreitung und die Rückwärtsausbreitung von der Rückseite aus aufzurufen.
        self.layers = OrderedDict()
        self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1'])
        self.layers['Relu1'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2'])

        #Die letzte Schicht des neuronalen Netzwerks:Hier SoftmaxWithLoss-Ebene
        self.lastLayer = SoftmaxWithLoss()
        

    #-------------------------------------------------
    # predict:Erkennung durchführen (Inferenz)
    #     @self
    #     @x:Bilddaten (Eingabedaten)
    #-------------------------------------------------
    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        
        return x

        
    #-------------------------------------------------
    # loss:Finden Sie die Verlustfunktion
    #     @self
    #     @x:Bilddaten (Eingabedaten)
    #     @t:Lehrerdaten
    #-------------------------------------------------
    def loss(self, x, t):
        y = self.predict(x)
        return self.lastLayer.forward(y, t)
    

    #-------------------------------------------------
    # accuracy:Finden Sie die Erkennungsgenauigkeit
    #     @self
    #     @x:Bilddaten (Eingabedaten)
    #     @t:Lehrerdaten
    #-------------------------------------------------
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        if t.ndim != 1 : t = np.argmax(t, axis=1)
        
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy
        

    #-------------------------------------------------
    # numerical_gradient:Finden Sie den Gradienten für den Gewichtsparameter durch numerische Differenzierung (wie bis zu Kapitel 4).
    #     @self
    #     @x:Bilddaten (Eingabedaten)
    #     @t:Lehrerdaten
    #-------------------------------------------------
    def numerical_gradient(self, x, t):
        loss_W = lambda W: self.loss(x, t)
        
        grads = {}
        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
        
        return grads


    #-------------------------------------------------
    # gradient:Ermitteln Sie den Gradienten für den Gewichtungsparameter mithilfe der Backpropagation-Methode
    #     @self
    #     @x:Bilddaten (Eingabedaten)
    #     @t:Lehrerdaten
    #-------------------------------------------------
    def gradient(self, x, t):

        #Punkt:Es bewegt tatsächlich die Ausbreitung, die als Schicht implementiert ist

        # forward:Vorwärtsausbreitung
        self.loss(x, t)

        # backward:Backpropagation
        dout = 1
        dout = self.lastLayer.backward(dout)
        
        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        #Aufbau
        grads = {}
        grads['W1'], grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W2'], grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db

        return grads

Gradientenbestätigung der Fehlerrückausbreitungsmethode

Diese Quelle ist lediglich eine Quelle, um zu bestätigen, dass es fast keinen Unterschied im Gradienten zwischen Vorwärtsausbreitung und Rückwärtsausbreitung gibt.

# coding: utf-8
import sys, os
sys.path.append(os.pardir)  #Einstellungen zum Importieren von Dateien in das übergeordnete Verzeichnis
import numpy as np
from dataset.mnist import load_mnist
from two_layer_net import TwoLayerNet

#Daten lesen
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

x_batch = x_train[:3]
t_batch = t_train[:3]

grad_numerical = network.numerical_gradient(x_batch, t_batch)
grad_backprop = network.gradient(x_batch, t_batch)

for key in grad_numerical.keys():
    diff = np.average( np.abs(grad_backprop[key] - grad_numerical[key]) )
    print(key + ":" + str(diff))

Ausführungsergebnis in meiner Umgebung W1:2.61413510374e-13 > 2.610.1^-13 W2:1.04099504538e-12 > 1.040.1^-12 b1:9.1090807423e-13 > 9.10.1^-13 b2:1.20348173094e-10 > 1.20.1^-10

Lernen mit der Methode der Fehlerrückübertragung

Diese Quelle ist im Grunde ein Mini-Batch, der iterativ trainiert wird (aktualisiert Gewichte und Verzerrungen).

# coding: utf-8
import sys, os
sys.path.append(os.pardir)

import numpy as np
from dataset.mnist import load_mnist
from two_layer_net import TwoLayerNet

#Daten lesen
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    #Steigung
    #grad = network.numerical_gradient(x_batch, t_batch)
    grad = network.gradient(x_batch, t_batch)
    
    #aktualisieren
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(train_acc, test_acc)

[PYTHON] [Lernnotiz] Deep Learning von Grund auf neu gemacht [Kapitel 5]

Deep Learning von Grund auf neu Kapitel 5

Berechnungsdiagramm

Backpropagation des berechneten Graphen

Additionsschicht

Schicht multiplizieren

Relu Schicht

Sigmoidschicht

Affine Schicht

Softmax-with-Loss-Schicht

Implementierung der Fehlerrückverbreitungsmethode

Implementierung eines neuronalen Netzwerks, das die Methode der Fehlerrückübertragung unterstützt

Gradientenbestätigung der Fehlerrückausbreitungsmethode

Lernen mit der Methode der Fehlerrückübertragung