Adam(Adaptive Moment Estimation)算法结合了动量梯度下降算法和RMSprop算法。其算法流程为:
VdW=0, SdW, Vdb=0, Sdb=0
On iteration t:
Cimpute dW, db VdW=β1VdW+(1−β1)dW, Vdb=β1Vdb+(1−β1)db SdW=β2SdW+(1−β2)dW2, Sdb=β2Sdb+(1−β2)db2 VdWcorrected=1−β1tVdW, Vdbcorrected=1−β1tVdb SdWcorrected=1−β2tSdW, Sdbcorrected=1−β2tSdb Adam算法结合了动量梯度下降和RMSprop各自的优点,使得神经网络训练速度大大提高