[PYTHON] 4 [/] Vier Arithmetik durch maschinelles Lernen

english_la／b.png

Ich dachte über das Perzeptron der Teilung in Form von tiefem Lernen nach. I considered a division perceptron in the form of deep learning.

# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt

#Initial value
#number of learning
N = 1000
#layer
layer = [2, 2, 1]
#bias
#bias = [0.0, 0.0]
#learning rate
η = [0.001, 0.001]
#η = [0.000001, 0.000001]
#number of middle layers
H = len(η) - 1
#teacher value
t = [None for _ in range(N)]
#function output value
f_out = [[None for _ in range(H + 1)] for _ in range(N)]
#function input value
f_in = [[None for _ in range(H + 1)] for _ in range(N)]
#weight
w = [[None for _ in range(H + 1)] for _ in range(N + 1)]
for h in range(H + 1):   
    w[0][h] = np.random.uniform(-1.0, 1.0, (layer[h + 1], layer[h]))

for h in range(H + 1):
    print(w[0][h])
    
#squared error
dE = [None for _ in range(N)]
#∂E/∂IN
δ = [[None for _ in range(H + 1)] for _ in range(N)]

#Learning
for n in range(N):

    #input value
    f_out[n][0] = np.random.uniform(-10.0, 10.0, (layer[0]))
    
    #teacher value
    t[n] = f_out[n][0][0] / f_out[n][0][1]
    
    #order propagation
    f_in[n][0] = np.dot(w[n][0], f_out[n][0])
    f_out[n][1] = np.log(f_in[n][0]*f_in[n][0])
    f_in[n][1] = np.dot(w[n][1], f_out[n][1])

    #output value
    div = np.exp(f_in[n][1])

    #squared error
    dE[n] = div - t[n]#value after squared error differentiation due to omission of calculation

    #δ
    δ[n][1] = div * dE[n]
    δ[n][0] = (2.0 / f_in[n][0]) * np.dot(w[n][1].T, δ[n][1])
    
    #back propagation
    for h in range(H + 1):
        w[n + 1][h] = w[n][h] - η[h] * np.real(δ[n][h].reshape(len(δ[n][h]), 1) * f_out[n][h])
        

#Output
#Weight
for h in range(H + 1):
    print(w[N][h])
#figure
#area height
py = np.amax(layer)
#area width
px = (H + 1) * 2
#area size
plt.figure(figsize = (16, 9))
#horizontal axis
x = np.arange(0, N + 1, 1)
#drawing
for h in range(H + 1):
    for l in range(layer[h + 1]):
        #area matrix
        plt.subplot(py, px, px * l + h * 2 + 1)
        for m in range(layer[h]):                       
            #line
            plt.plot(x, np.array([w[n][h][l, m] for n in range(N + 1)]), label = "w[" + str(h) + "][" + str(l) + "," + str(m) + "]")        
        #grid line
        plt.grid(True)
        #legend
        plt.legend(bbox_to_anchor = (1, 1), loc = 'upper left', borderaxespad = 0, fontsize = 10)

#save
plt.savefig('graph_div.png') 
#show
plt.show()


Zeigt das Konzept der Gewichte.\\
I\ indicate\ the\ concept\ of\ weight.\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix},
w[1]=
\begin{pmatrix}
〇 & ●
\end{pmatrix}\\
　\\
Eingabewert und w[0]Produkt von\\
multiplication\ of\ input\ value\ and\ w[0]\\
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
\begin{pmatrix}
a\\
b
\end{pmatrix}\\
=
\begin{pmatrix}
△a+□b\\
▲a+■b
\end{pmatrix}\\
　\\
Eingabe der ersten Ebene\\
enter\ in\ the\ first\ layer\\
\begin{pmatrix}
log(△a+□b)^2\\
log(▲a+■b)^2
\end{pmatrix}\\
　\\
Die wahre Zahl wird quadriert, um der negativen Zahl zu entsprechen.\\
The\ exact\ number\ is\ squared\ to\ accommodate\ negative\ numbers.\\
　\\
1. Schicht Ausgabe und w[1]Produkt von\\
product\ of\ first\ layer\ output\ and\ w[1]\\
\begin{align}
\begin{pmatrix}
〇 & ●
\end{pmatrix}
\begin{pmatrix}
log(△a+□b)^2\\
log(▲a+■b)^2
\end{pmatrix}
=&〇log(△a+□b)^2+●log(▲a+■b)^2\\
=&log(△a+□b)^{2〇}-log(▲a+■b)^{-2●}\\
=&log\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}\\
\end{align}\\
　\\
Ausgabeebene\\
enter\ in\ the\ output\ layer\\
e^{log\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}}=\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}\\
　\\
\left\{
\begin{array}{l}
△=1,□=0,〇=0.5 \\
▲=0,■=1,●=-0.5
\end{array}
\right.\\
　\\
\frac{a}{b}\\
　\\
Im einfachsten Fall, wenn die obigen Bedingungen erfüllt sind, ist der Quotient a/Sie können b ausgeben.\\
In\ the\ simplest\ case,\ if\ the\ above\ conditions\ are\ met,\ quotient\ a/b\ can\ be\ output.\\


Der Anfangswert ist zufällig(-1.0～1.0)Nachdem ich mich entschieden hatte, versuchte ich zu sehen, ob es zum Zielwert konvergieren würde, wenn das Lernen wiederholt würde.\\
After\ deciding\ the\ initial\ value\ between\ random\ numbers\ (-1.0～1.0),\\
I\ tried\ to\ repeat\ the\ learning\ to\ converge\ to\ the\ target\ value.\\
　\\
Zielwert\\
Target\ value\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
,w[1]=
\begin{pmatrix}
○ & ●
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=1,□=0,〇=0.5 \\
▲=0,■=1,●=-0.5
\end{array}
\right.\\
　\\
Ursprünglicher Wert\\
Initial\ value\\
w[0]=
\begin{pmatrix}
-0.18845444 & -0.56031414\\
-0.48188658 & 0.6470921
\end{pmatrix}
,w[1]=
\begin{pmatrix}
0.80395641 & 0.80365676
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-0.18845444,□=-0.56031414,〇=0.80395641 \\
▲=-0.48188658,■=0.6470921,●=0.80365676
\end{array}
\right.\\
　\\
Berechnete Werte\\
Calculated\ value\\
w[0]=
\begin{pmatrix}
14601870.60282903 & -14866110.02378938\\
13556781.27758209 & -13802110.45958244
\end{pmatrix}
,w[1]=
\begin{pmatrix}
-1522732.53915774 & -6080851.59710287
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=14601870.60282903,□=-14866110.02378938,〇=-1522732.53915774 \\
▲=13556781.27758209,■=-13802110.45958244,●=-6080851.59710287
\end{array}
\right.\\


Es ist ein Misserfolg. Egal wie oft Sie es tun, das Gewicht wird zu einem lächerlichen Wert abweichen.\\
Ich habe nach der Ursache gesucht.\\
It\ is\ a\ failure.\\
No\ matter\ how\ many\ times\ I\ do,\ the\ weights\ will\ diverge\ to\ ridiculous\ values.\\
I\ investigated\ the\ cause.\\
　\\
Mit der Kettenregel der Fehlerrückausbreitung\\
In\ chain\ rule\ of\ the\ backpropagation\\
(log(x^2))'=\frac{2}{x}\\
\lim_{x \to ±∞} \frac{2}{x}=0\\
　\\
(e^x)'=e^x\\
\lim_{x \to -∞} e^x=0\\
Es stellte sich heraus, dass bei Verwendung eines extrem großen Werts der Gradient verschwindet.\\
It\ was\ found\ that\ such\ an\ extremely\ large\ value\ would\ cause\ the\ gradient\ to\ disappear.\\
　\\
Ich überlegte.\\
I reconsidered.

商ver2.png

# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt

#Initial value
#number of learning
N = 200000
#layer
layer = [2, 2, 1]
#bias
#bias = [0.0, 0.0]
#learning rate
η = [0.1, 0.1]
#η = [0.000001, 0.000001]
#clip value
#clip = 709
clip = 700
#number of middle layers
H = len(η) - 1
#teacher value
t = [None for _ in range(N)]
#function output value
f_out = [[None for _ in range(H + 1)] for _ in range(N)]
#function input value
f_in = [[None for _ in range(H + 1)] for _ in range(N)]
#weight
w = [[None for _ in range(H + 1)] for _ in range(N + 1)]
for h in range(H):   
    w[0][h] = np.random.uniform(-1.0, 1.0, (layer[h + 1], layer[h]))
w[0][H] = np.zeros((layer[H + 1], layer[H]))

for h in range(H + 1):
    print(w[0][h])
    
#squared error
dE = [None for _ in range(N)]
#∂E/∂IN
δ = [[None for _ in range(H + 1)] for _ in range(N)]

#Learning
for n in range(N):

    #input value
    t[n] = clip
    while np.abs(t[n]) > np.log(np.log(clip)):#Gradient vanishing problem Measure
        f_out[n][0] = np.random.uniform(0.0, 10.0, (layer[0]))
        f_out[n][0] = np.array(f_out[n][0], dtype=np.complex)
    
        #teacher value
        t[n] = f_out[n][0][0] / f_out[n][0][1]
    
    #order propagation
    f_in[n][0] = np.dot(w[n][0], f_out[n][0])    
    f_out[n][1] = np.log(f_in[n][0])    
    f_in[n][1] = np.dot(w[n][1], f_out[n][1])
    
    #output value
    div = np.exp(f_in[n][1])
    
    #squared error
    dE[n] = np.real(div - t[n])#value after squared error differentiation due to omission of calculation
    dE[n] = np.clip(dE[n], -clip, clip)
    dE[n] = np.nan_to_num(dE[n])

    #δ
    δ[n][1] = np.real(div * dE[n])
    δ[n][1] = np.clip(δ[n][1], -clip, clip)
    δ[n][1] = np.nan_to_num(δ[n][1])
    
    
    δ[n][0] = np.real((1.0 / f_in[n][0]) * np.dot(w[n][1].T, δ[n][1]))
    δ[n][0] = np.clip(δ[n][0], -clip, clip)  
    δ[n][0] = np.nan_to_num(δ[n][0]) 
    
    #back propagation
    for h in range(H + 1):
        #Gradient vanishing problem Measure
        # a*10^b a part only
        w10_u = np.real(δ[n][h].reshape(len(δ[n][h]), 1) * f_out[n][h])
        w10_u = np.clip(w10_u, -clip, clip)  
        w10_u = np.nan_to_num(w10_u)        
        w10_d = np.where(
            w10_u != 0.0,
            np.modf(np.log10(np.abs(w10_u)))[1],
            0.0
        )
        #Decimal not supported
        w10_d = np.clip(w10_d, 0.0, clip)
        
        w[n + 1][h] = w[n][h] - η[h] * (w10_u / np.power(10.0, w10_d))

#Output
#Weight
for h in range(H + 1):
    print(w[N][h])
#figure
#area height
py = np.amax(layer)
#area width
px = (H + 1) * 2
#area size
plt.figure(figsize = (16, 9))
#horizontal axis
x = np.arange(0, N + 1, 1) #0 bis N.+Bis zu 1 in 1 Schritten
#drawing
for h in range(H + 1):
    for l in range(layer[h + 1]):
        #area matrix
        plt.subplot(py, px, px * l + h * 2 + 1)
        for m in range(layer[h]):                       
            #line
            plt.plot(x, np.array([w[n][h][l, m] for n in range(N + 1)]), label = "w[" + str(h) + "][" + str(l) + "," + str(m) + "]")        
        #grid line
        plt.grid(True)
        #legend
        plt.legend(bbox_to_anchor = (1, 1), loc = 'upper left', borderaxespad = 0, fontsize = 10)

#save
plt.savefig('graph_div.png') 
#show
plt.show()

Als Gegenmaßnahme

Stellen Sie den Eingabewert auf eine komplexe Zahl ein. -Nur Daten, die nicht leicht mit dem Lehrerwert überlaufen. ・ Stellen Sie δ nicht auf einen Wert ein, der größer als ein bestimmter Wert ist. ・ Stellen Sie den Farbverlauf nur auf den a-Teil von a * 10 ^ b ein, damit das Gewicht nicht abweicht. (Nur wenn b eine positive Zahl ist) As a countermeasure ・ Ändern Sie den Eingabewert in eine komplexe Zahl. ・ Verwenden Sie nur Daten, die mit dem Lehrerwert nur schwer überlaufen können. ・ Machen Sie δ nicht größer als einen bestimmten Wert. ・ Stellen Sie den Gradienten nur auf einen Teil von a * 10 ^ b ein, damit das Gewicht nicht abweicht. (Nur wenn b eine positive Zahl ist)


Zielwert\\
Target\ value\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
,w[1]=
\begin{pmatrix}
○ & ●
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=1,□=0,〇=1 \\
▲=0,■=1,●=-1
\end{array}
\right.\\
　\\
Ursprünglicher Wert\\
Initial value\\
w[0]=
\begin{pmatrix}
-0.12716087 & 0.34977234\\
0.85436489 & 0.65970844
\end{pmatrix}
,w[1]=
\begin{pmatrix}
0.0 & 0.0
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-0.12716087,□=0.34977234,〇=0.0 \\
▲=0.85436489,■=0.65970844,●=0.0
\end{array}
\right.\\
　\\
Berechnete Werte\\
Calculated\ value\\
w[0]=
\begin{pmatrix}
-1.71228449e-08 & 1.00525062e+00\\
1.00525061e+00 & -4.72288257e-09
\end{pmatrix}
,w[1]=
\begin{pmatrix}
-0.99999998 & 0.99999998
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-1.71228449e-08,□=1.00525062e+00,〇=-0.99999998\\
▲=1.00525061e+00,■=-4.72288257e-09,●=0.99999998
\end{array}
\right.\\
　\\
Erfolgreich. Die Werte von △ □ und ▲ ■ sind umgekehrt.\\
Ich mag es nicht auf eine Weise, die die richtige Antwort hat und nah dran ist.\\
Trotzdem loggen Sie sich höchstens ein, um die Teilung zu lehren,exp,Mit komplexen Zahlen\\
Ich war in Schwierigkeiten, weil ich mich auf die Mathematik der High School ausdehnen musste.\\
　\\
Succeeded.\ The\ values\ of\ △□\ and\ ▲■\ are\ reversed.\\
I\ don't\ like\ it\ in\ the\ way\ that\ I\ get\ it\ right.\\
Even\ so,\ at\ the\ very\ least\ trying\ to\ teach\ division\\
log,\ exp,\ complex\ numbers\\
I\ had\ trouble\ expanding\ to\ high\ school\ mathematics.\\