# 1.10 长短期记忆（LSTM（long short term memory）unit）

LSTM是另一种更强大的解决梯度消失问题的方法。它对应的RNN隐藏层单元结构如下图所示：

![](https://2314428465-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-Le0cHhI0S0DK8pwlrmD%2F-Le0cKOp1vaxoORIi4ak%2F-Le0csRSBGVLrIVAIhfQ%2FXnip2018-07-22_16-44-14.jpg?generation=1556953140251318\&alt=media)

相应的表达式为：

$$
\tilde c^{<t>}=tanh(W\_c\[a^{<t-1>},x^{<t>}]+b\_c)
$$

$$
\Gamma\_u=\sigma(W\_u\[a^{<t-1>},x^{<t>}]+b\_u)
$$

$$
\Gamma\_f=\sigma(W\_f\[a^{<t-1>},x^{<t>}]+b\_f)
$$

$$
\Gamma\_o=\sigma(W\_o\[a^{<t-1>},x^{<t>}]+b\_o)
$$

$$
c^{<t>}=\Gamma\_u\*\tilde c^{<t>}+\Gamma\_f\*c^{<t-1>}
$$

$$
a^{<t>}=\Gamma\_o\*c^{<t>}
$$

LSTM包含三个gates：$$\Gamma\_u,\Gamma\_f,\Gamma\_o$$，分别对应update gate，forget gate和output gate

在**LSTM**中不再有$$a^{<t>} = c^{<t>}$$的情况

红线显示了只要正确地设置了遗忘门和更新门，**LSTM**很容易把$$c^{<0>}$$的值一直往下传递到右边，比如$$c^{<3>} = c^{<0>}$$。这就是为什么**LSTM**和**GRU**非常擅长于长时间记忆某个值

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/94e871edbd87337937ce374e71d56e42.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/94e871edbd87337937ce374e71d56e42.png)

“**窥视孔连接**”（**peephole connection**）:门值不仅取决于$$a^{<t-1>}$$和$$x^{<t>}$$，也取决于上一个记忆细胞的值（$$c^{<t-1>}$$），即$$c^{<t-1>}$$也能影响门值

如果考虑$$c^{<t-1>}$$对$$\Gamma\_u,\Gamma\_f,\Gamma\_o$$的影响，可加入“窥视孔连接”，对LSTM的表达式进行修改：

$$
\tilde c^{<t>}=tanh(W\_c\[a^{<t-1>},x^{<t>}]+b\_c)
$$

$$
\Gamma\_u=\sigma(W\_u\[a^{<t-1>},x^{<t>},c^{<t-1>}]+b\_u)
$$

$$
\Gamma\_f=\sigma(W\_f\[a^{<t-1>},x^{<t>},c^{<t-1>}]+b\_f)
$$

$$
\Gamma\_o=\sigma(W\_o\[a^{<t-1>},x^{<t>},c^{<t-1>}]+b\_o)
$$

$$
c^{<t>}=\Gamma\_u\*\tilde c^{<t>}+\Gamma\_f\*c^{<t-1>}
$$

$$
a^{<t>}=\Gamma\_o\*c^{<t>}
$$

**LSTM**主要的区别：比如（上图编号13）有一个100维的隐藏记忆细胞单元，第$$i$$个$$c^{<t-1>}$$的元素只会影响第$$i$$个元素对应的那个门，所以关系是一对一的，并不是任意这100维的$$c^{<t-1>}$$可以影响所有的门元素

**LSTM**前向传播图：

[![ST](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/LSTM.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/LSTM.png)

![](https://2314428465-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-Le0cHhI0S0DK8pwlrmD%2F-Le0cKOp1vaxoORIi4ak%2F-Le0csR_vwB9OhLINSPL%2FXnip2018-07-22_16-50-24.jpg?generation=1556953140584587\&alt=media)

**GRU**：模型简单，更容易创建一个更大的网络，只有两个门，在计算上运行得更快，且可以扩大模型的规模

**LSTM：**&#x66F4;加强大和灵活，因为它有三个门而不是两个


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://baozoulin.gitbook.io/neural-networks-and-deep-learning/di-wu-men-ke-xu-lie-mo-xing-sequence-models/di-wu-men-kexulie-mo-578b28-sequence-models/recurrent-neural-networks/110-chang-duan-qi-ji-yi-ff08-lstm-long-short-term-memory-unit.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
