# 2.10 词嵌入除偏（Debiasing Word Embeddings）

根据训练模型所使用的文本，词嵌入能够反映出性别、种族、年龄、性取向等其他方面的偏见：

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/25430afa93f24dc6caa6f85503bbad27.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/25430afa93f24dc6caa6f85503bbad27.png)

假设已经完成一个词嵌入的学习，各个词的位置如图：

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/cf60f429ef532a2b3bbad3db98b054c5.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/cf60f429ef532a2b3bbad3db98b054c5.png)

首先做的事就是辨别出想要减少或想要消除的特定偏见的趋势

怎样辨别出偏见相似的趋势：

一、对于性别歧视，对所有性别对立的单词求差值，再平均：

$$
bias\ direction=\frac1N ((e\_{he}-e\_{she})+(e\_{male}-e\_{female})+\cdots)
$$

二、中和步骤，对于定义不确切的词可以将其处理一下，避免偏见。像**doctor**和**babysitter**使之在性别方面中立。将它们在这个轴（编号1）上进行处理，减少或是消除他们的性别歧视趋势的成分，即减少在水平方向上的距离（编号2方框内所示的投影）

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/4102795b004ff090ed83dc654f585852.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/4102795b004ff090ed83dc654f585852.png)

三、均衡步，**babysitter**和**grandmother**之间的距离或者说是相似度实际上是小于**babysitter**和**grandfather**之间的（编号1），因此这可能会加重不良状态，或者非预期的偏见，也就是说**grandmothers**相比于**grandfathers**最终更有可能输出**babysitting**。所以在最后的均衡步中，想要确保的是像**grandmother**和**grandfather**这样的词都能够有一致的相似度，或者说是相等的距离，做法是将**grandmother**和**grandfather**移至与中间轴线等距的一对点上（编号2），现在性别歧视的影响也就是这两个词与**babysitter**的距离就完全相同了（编号3）

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/9b27d865dff73a2f10abbdc1c7fc966b.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/9b27d865dff73a2f10abbdc1c7fc966b.png)

最后，掌握哪些单词需要中立化非常重要。一般来说，大部分英文单词，例如职业、身份等都需要中立化，消除embedding vector中性别这一维度的影响


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://baozoulin.gitbook.io/neural-networks-and-deep-learning/di-wu-men-ke-xu-lie-mo-xing-sequence-models/di-wu-men-kexulie-mo-578b28-sequence-models/natural-language-processing-and-word-embeddings/210-ci-qian-ru-chu-pian-ff08-debiasing-word-embeddings.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
