# 3.3 目标检测（Object detection）

通过卷积网络进行对象检测，采用的是基于滑动窗口的目标检测算法

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/2f4e567978bb62fcbec093887de37783.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/2f4e567978bb62fcbec093887de37783.png)

构建汽车检测算法步骤：

1. 首先创建一个标签训练集，$$x$$和$$y$$表示适当剪切的汽车图片样本，一开始可以使用适当剪切的图片，就是整张图片$$x$$几乎都被汽车占据，使汽车居于中间位置，并基本占据整张图片
2. 开始训练卷积网络，输入这些适当剪切过的图片（编号6），卷积网络输出$$y$$，0或1表示图片中有汽车或没有汽车

训练完这个卷积网络，用它来实现滑动窗口目标检测，具体步骤如下：

1.首先选定一个特定大小的窗口，将红色小方块输入卷积神经网络，卷积网络开始判断红色方框内有没有汽车

[\
![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/2ac2ab6dcdcc0fe26a9833ff9da49bd2.png)](https://legacy.gitbook.com/book/baozou/neural-networks-and-deep-learning/edit#)

2.滑动窗口目标检测算法继续处理第二个图像，红色方框稍向右滑动之后的区域，并输入给卷积网络，再次运行卷积网络，然后处理第三个图像，依次重复操作，直到这个窗口滑过图像的每一个角落

[\
![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/c55f22f302899d5f9d77bef958465660.png)](https://legacy.gitbook.com/book/baozou/neural-networks-and-deep-learning/edit#)

思路是以固定步幅移动窗口，遍历图像的每个区域，把这些剪切后的小图像输入卷积网络，对每个位置按0或1进行分类

3.重复上述操作，选择一个更大的窗口，截取更大的区域，并输入给卷积神经网络处理，输出0或1

[\
![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/34507c03fbda16049faeb3caf075fe50.png)](https://legacy.gitbook.com/book/baozou/neural-networks-and-deep-learning/edit#)

4.再以某个固定步幅滑动窗口，重复以上操作，遍历整个图像，输出结果

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/f2b6d5bfedc5298160bc2628544e315c.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/f2b6d5bfedc5298160bc2628544e315c.png)

5.第三次重复操作，选用更大的窗口

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/c14524aa0534ed78c433e1cd0a8dff50.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/c14524aa0534ed78c433e1cd0a8dff50.png)

这样不论汽车在图片的什么位置，总有一个窗口可以检测到

这种算法叫作滑动窗口目标检测：以某个步幅滑动这些方框窗口遍历整张图片，对这些方形区域进行分类，判断里面有没有汽车

[![](https://github.com/fengdu78/deeplearning_ai_books/raw/master/images/ef8afff4e50fc1c50a46b8443f1d6976.png)](https://github.com/fengdu78/deeplearning_ai_books/blob/master/images/ef8afff4e50fc1c50a46b8443f1d6976.png)

滑动窗口目标检测算法缺点：**计算成本**

* 如果选用的步幅很大，会减少输入卷积网络的窗口个数，粗糙间隔尺寸可能会影响性能
* 如果采用小粒度或小步幅，传递给卷积网络的小窗口会特别多，这意味着超高的计算成本


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://baozoulin.gitbook.io/neural-networks-and-deep-learning/di-si-men-ke-juan-ji-shen-jing-wang-luo-convolutional-neural-networks/convolutional-neural-networks/object-detection/33-mu-biao-jian-ce-ff08-object-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
