2.8 Adam 优化算法(Adam optimization algorithm)

Adam(Adaptive Moment Estimation)算法结合了动量梯度下降算法和RMSprop算法。其算法流程为:

VdW=0, SdW, Vdb=0, Sdb=0V_{dW}=0,\ S_{dW},\ V_{db}=0,\ S_{db}=0

On iteration t:On\ iteration\ t:

    Cimpute dW, db\ \ \ \ Cimpute\ dW,\ db
    VdW=β1VdW+(1β1)dW, Vdb=β1Vdb+(1β1)db\ \ \ \ V_{dW}=\beta_1V_{dW}+(1-\beta_1)dW,\ V_{db}=\beta_1V_{db}+(1-\beta_1)db
    SdW=β2SdW+(1β2)dW2, Sdb=β2Sdb+(1β2)db2\ \ \ \ S_{dW}=\beta_2S_{dW}+(1-\beta_2)dW^2,\ S_{db}=\beta_2S_{db}+(1-\beta_2)db^2
    VdWcorrected=VdW1β1t, Vdbcorrected=Vdb1β1t\ \ \ \ V_{dW}^{corrected}=\frac{V_{dW}}{1-\beta_1^t},\ V_{db}^{corrected}=\frac{V_{db}}{1-\beta_1^t}
    SdWcorrected=SdW1β2t, Sdbcorrected=Sdb1β2t\ \ \ \ S_{dW}^{corrected}=\frac{S_{dW}}{1-\beta_2^t},\ S_{db}^{corrected}=\frac{S_{db}}{1-\beta_2^t}
    W:=WαVdWcorrectedSdWcorrected+ε, b:=bαVdbcorrectedSdbcorrected+ε\ \ \ \ W:=W-\alpha\frac{V_{dW}^{corrected}}{\sqrt{S_{dW}^{corrected}}+\varepsilon},\ b:=b-\alpha\frac{V_{db}^{corrected}}{\sqrt{S_{db}^{corrected}}+\varepsilon}

Adam算法包含了几个超参数,分别是:α,β1,β2,ε\alpha,\beta_1,\beta_2,\varepsilon,β1\beta_1通常设置为0.9,β2\beta_2通常设置为0.999,ε\varepsilon通常设置为10810^{-8}。一般只需要对β1\beta_1β2\beta_2进行调试

Adam算法结合了动量梯度下降和RMSprop各自的优点,使得神经网络训练速度大大提高

Last updated