# 3.8 Anchor Boxes

对象检测存在的一个问题是每个格子只能检测出一个对象，如果想让一个格子检测出多个对象，可以使用**anchor box**

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/49b7d68a17e89dd109f96efecc223f5a.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/49b7d68a17e89dd109f96efecc223f5a.png)

> 行人的中点和汽车的中点都落入到同一个格子中

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/e001f5f3d2afa76a1c3710bd60bcad00.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/e001f5f3d2afa76a1c3710bd60bcad00.png)

**anchor box**的思路是：预先定义两个不同形状的**anchor box**，把预测结果和这两个**anchor box**关联起来

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/2e357b5b92122660c550dcfb0901519c.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/2e357b5b92122660c550dcfb0901519c.png)

定义类别标签：

$$
y= \begin{bmatrix} p\_{c} & b\_{x} & b\_{y} \&b\_{h} & b\_{w} & c\_{1} & c\_{2} & c\_{3} & p\_{c} & b\_{x} & b\_{y} & b\_{h} & b\_{w} \&c\_{1} & c\_{2} & c\_{3} \end{bmatrix}^{T}
$$

前面的$$p\_{c},b\_{x},b\_{y},b\_{h},b\_{w},c\_{1},c\_{2},c\_{3}$$（绿色方框标记的参数）是和**anchor box 1**关联的8个参数，后面的8个参数（橙色方框标记的元素）是和**anchor box 2**相关联

行人：$$p\_{c}= 1,b\_{x},b\_{y},b\_{h},b\_{w},c\_{1} = 1,c\_{2} = 0,c\_{3} = 0$$

车子的边界框更像**anchor box 2**，($$p\_{c}= 1,b\_{x},b\_{y},b\_{h},b\_{w},c\_{1} = 0,c\_{2} = 1,c\_{3} = 0$$)

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/e94aa7ea75300ea4692682b179834bb4.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/e94aa7ea75300ea4692682b179834bb4.png)

现在每个对象都分配到对象中点所在的格子中，以及分配到和对象形状交并比最高的**anchor box**中。然后观察哪个**anchor box**和实际边界框（编号1，红色框）的交并比更高

编号1对应同时有车和行人，编号3对应只有车：

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/322b15fe615c739ebd1d36b669748618.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/322b15fe615c739ebd1d36b669748618.png):

**anchor box**是为了处理两个对象出现在同一个格子的情况，实践中这种情况很少发生，特别用的是19×19网格

怎么选择**anchor box：**

* 一般手工指定**anchor box**形状，可以选择5到10个**anchor box**形状，覆盖到想要检测的对象的各种形状
* 更高级的是使用**k-平均算法**，将两类对象形状聚类，选择最具有代表性的一组**anchor box**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://baozoulin.gitbook.io/neural-networks-and-deep-learning/di-si-men-ke-juan-ji-shen-jing-wang-luo-convolutional-neural-networks/convolutional-neural-networks/object-detection/38-anchor-boxes.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
