3.8 神经网络的梯度下降(Gradient descent for neural networks)

dZ[2]=A[2]YdZ^{[2]}=A^{[2]}-Y
dW[2]=1mdZ[2]A[1]TdW^{[2]}=\frac1mdZ^{[2]}A^{[1]T}
db[2]=1mnp.sum(dZ[2],axis=1,keepdim=True)db^{[2]}=\frac1mnp.sum(dZ^{[2]},axis=1,keepdim=True)
dZ[1]=W[2]TdZ[2]g(Z[1])dZ^{[1]}=W^{[2]T}dZ^{[2]}\ast g'(Z^{[1]})
dW[1]=1mdZ[1]XTdW^{[1]}=\frac1mdZ^{[1]}X^T
db[1]=1mnp.sum(dZ[1],axis=1,keepdim=True)db^{[1]}=\frac1mnp.sum(dZ^{[1]},axis=1,keepdim=True)

Last updated