The error back propagation of matrix multiplication was difficult to understand, so I will summarize it.
Reviewing the error backpropagation of the scalar product,

Assuming that the object to be gradient is L and $ \ frac {\ partial L} {\ partial y} $ is known in advance, from the chain rule
This is no problem, isn't it?
However, when it comes to matrix multiplication, it changes with intuition.
Somehow, it doesn't come with a pin. So, I will confirm it concretely.
The setting is considered to be connected to neuron Y via the inner product of two neurons X and four weights W.
** 1) First, find $ \ frac {\ partial L} {\ partial X} $. ** First, calculate these in advance.
While using this calculation on the way
** 2) Next, find $ \ frac {\ partial L} {\ partial y} $. ** First, calculate these in advance.
While using this calculation on the way
If x1 = X, x2 = Y, grad = $ \ frac {\ partial L} {\ partial y} $,
class MatMul(object):
def __init__(self, x1, x2):
self.x1 = x1
self.x2 = x2
def forward(self):
y = np.dot(self.x1, self.x2)
self.y = y
return y
def backward(self, grad):
grad_x1 = np.dot(grad, self.x2.T)
grad_x2 = np.dot(self.x1.T, grad)
return (grad_x1, grad_x2)
Recommended Posts