> For the complete documentation index, see [llms.txt](https://baozoulin.gitbook.io/neural-networks-and-deep-learning/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://baozoulin.gitbook.io/neural-networks-and-deep-learning/di-er-men-ke-gai-shan-shen-ceng-shen-jing-wang-luo-chao-can-shu-tiao-shi-zheng-ze-hua-yi-ji-you-hua/improving-deep-neural-networks/hyperparameter-tuning/39-xun-lian-yi-ge-softmax-fen-lei-qi-ff08-training-a-softmax-classifier.md).

# 3.9 训练一个 Softmax 分类器（Training a Softmax classifier）

C=4，某个样本的预测输出$$\hat y$$和真实输出$$y$$：

$$
\hat y=\left\[
\begin{matrix}
0.3 \\
0.2 \\
0.1 \\
0.4
\end{matrix}
\right]
$$

$$
y=\left\[
\begin{matrix}
0 \\
1 \\
0 \\
0
\end{matrix}
\right]
$$

从$$\hat y$$值来看，$$P(y=4|x)=0.4$$，概率最大，而真实样本属于第2类，该预测效果不佳

定义softmax classifier的loss function为：

$$
L(\hat y,y)=-\sum\_{j=1}^4y\_j\cdot log\ \hat y\_j
$$

$$L(\hat y,y)$$简化为：

$$
L(\hat y,y)=-y\_2\cdot log\ \hat y\_2=-log\ \hat y\_2
$$

让$$L(\hat y,y)$$更小，就应该让$$\hat y\_2$$越大越好。$$\hat y\_2$$反映的是概率

m个样本的cost function为：

$$
J=\frac{1}{m}\sum\_{i=1}^mL(\hat y,y)
$$

预测输出向量$$A^{\[L]}$$即$$\hat Y$$的维度为(4, m)

softmax classifier的反向传播过程:

先推导$$dZ^{\[L]}$$：

$$
da^{\[L]}=-\frac{1}{a^{\[L]}}
$$

![](/files/-Le0cu0_qkLHL8RCEl16)

$$
\frac{\partial a^{\[L]}}{\partial z^{\[L]}}=\frac{\partial}{\partial z^{\[L]}}\cdot (\frac{e^{z^{\[L]}*i}}{\sum*{i=1}^Ce^{z^{\[L]}\_i}})=a^{\[L]}\cdot (1-a^{\[L]})
$$

![](/files/-Le0cu0cqh_x9Co4oUXR)

![](/files/-Le0cu0eFYBxfRSGIihb)

![](/files/-Le0cu0gWv6UdF6fa9tV)

所有m个训练样本：

$$
dZ^{\[L]}=A^{\[L]}-Y
$$
