# 3.1 目标定位（Object localization）

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/0107af10b33fcb955cc3c588dfb78d49.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/0107af10b33fcb955cc3c588dfb78d49.png)

定位分类问题：不仅要用算法判断图片中是不是一辆汽车，还要在图片中标记出它的位置，用边框或红色方框把汽车圈起来，“定位”的意思是判断汽车在图片中的具体位置

定位分类问题通常只有一个较大的对象位于图片中间位置，对它进行识别和定位。对象检测问题中图片可以含有多个对象，甚至单张图片中会有多个不同分类的对象

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/d4a47c2041807f891c0a606d246330c5.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/d4a47c2041807f891c0a606d246330c5.png)

构建汽车自动驾驶系统，对象可能包括以下几类：行人、汽车、摩托车和背景

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/6461ff27c00dff4205688de4cf9d8803.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/6461ff27c00dff4205688de4cf9d8803.png)

定位图片中汽车的位置：让神经网络输出一个边界框，标记为$$b\_{x}$$,$$b\_{y}$$,$$b\_{h}$$和$$b\_{w}$$，是被检测对象的边界框的参数化表示

红色方框的中心点表示为($$b\_{x}$$,$$b\_{y}$$)，边界框的高度为$$b\_{h}$$，宽度为$$b\_{w}$$。训练集不仅包含神经网络要预测的对象分类标签，还要包含表示边界框的这四个数字，接着采用监督学习算法，输出一个分类标签，还有四个参数值，从而给出检测对象的边框位置

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/21b37dcb413e7c86464f88484796420c.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/21b37dcb413e7c86464f88484796420c.png)

如何为监督学习任务定义目标标签 $$y$$：

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/02d85ab36285cd21b5df4d1c253df57e.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/02d85ab36285cd21b5df4d1c253df57e.png)

目标标签$$y$$的定义：$$y= \begin{bmatrix} p\_{c} \ b\_{x} \ b\_{y}\ b\_{h}\ b\_{w} \ c\_{1}\ c\_{2}\ c\_{3} \end{bmatrix}$$

$$p\_{c}$$表示是否含有对象，如果对象属于前三类（行人、汽车、摩托车），则$$p\_{c}= 1$$，如果是背景，则$$p\_{c} =0$$。$$p\_{c}$$表示被检测对象属于某一分类的概率，背景分类除外

如果检测到对象，就输出被检测对象的边界框参数$$b\_{x}$$、$$b\_{y}$$、$$b\_{h}$$和$$b\_{w}$$。$$p\_{c}=1$$，同时输出$$c\_{1}$$、$$c\_{2}$$和$$c\_{3}$$，表示该对象属于行人，汽车还是摩托车

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/fd37e4750b64a07cc1f29880c9b97261.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/fd37e4750b64a07cc1f29880c9b97261.png)

如果图片中没有检测对象:

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/131239883224f03709ddc66d9481c3c7.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/131239883224f03709ddc66d9481c3c7.png)

$$p\_{c} =0$$，$$y$$的其它参数全部写成问号，表示“毫无意义”的参数

神经网络的损失函数，如果采用平方误差策略：

$$
L\left(\hat{y},y \right) = \left( \hat{y\_1} - y\_{1} \right)^{2} + \left(\hat{y\_2} - y\_{2}\right)^{2} + \ldots+\left( \hat{y\_8} - y\_{8}\right)^{2}
$$

损失值等于每个元素相应差值的平方和

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/d50ae3ee809da4c728837fee2d055f00.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/d50ae3ee809da4c728837fee2d055f00.png)

如果图片中存在定位对象，$$y\_{1} =p\_{c}=1$$，损失值是不同元素的平方和

$$y\_{1}= p\_{c} = 0$$，损失值是$$\left(\hat{y\_1} - y\_{1}\right)^{2}$$，只需要关注神经网络输出$$p\_{c}$$的准确度

这里用平方误差简化了描述过程。实际应用中可以不对$$c\_{1}$$、$$c\_{2}$$、$$c\_{3}$$和**softmax**激活函数应用对数损失函数，并输出其中一个元素值，通常做法是对边界框坐标应用平方差，对$$p\_{c}$$应用逻辑回归函数，甚至采用平方预测误差


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://baozoulin.gitbook.io/neural-networks-and-deep-learning/di-si-men-ke-juan-ji-shen-jing-wang-luo-convolutional-neural-networks/convolutional-neural-networks/object-detection/31-mu-biao-ding-wei-ff08-object-localization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
