DeepLearning.ai深度学习课程笔记
  • Introduction
  • 第一门课 神经网络和深度学习(Neural-Networks-and-Deep-Learning)
    • 第一周:深度学习引言(Introduction to Deep Learning)
      • 1.1 神经网络的监督学习(Supervised Learning with Neural Networks)
      • 1.2 为什么神经网络会流行?(Why is Deep Learning taking off?)
    • 第二周:神经网络的编程基础(Basics of Neural Network programming)
      • 2.1 二分类(Binary Classification)
      • 2.2 逻辑回归(Logistic Regression)
      • 2.3 逻辑回归的代价函数(Logistic Regression Cost Function)
      • 2.4 逻辑回归的梯度下降(Logistic Regression Gradient Descent)
      • 2.5 梯度下降的例子(Gradient Descent on m Examples)
      • 2.6 向量化 logistic 回归的梯度输出(Vectorizing Logistic Regression’s Gradient Output)
      • 2.7 (选修)logistic 损失函数的解释(Explanation of logistic regression cost function )
      • Logistic Regression with a Neural Network mindset 代码
      • lr_utils.py
    • 第三周:浅层神经网络(Shallow neural networks)
      • 3.1 神经网络概述(Neural Network Overview)
      • 3.2 神经网络的表示(Neural Network Representation )
      • 3.3 计算一个神经网络的输出(Computing a Neural Network's output )
      • 3.4 多样本向量化(Vectorizing across multiple examples )
      • 3.5 激活函数(Activation functions)
      • 3.6 为什么需要( 非线性激活函数?(why need a nonlinear activation function?)
      • 3.7 激活函数的导数(Derivatives of activation functions )
      • 3.8 神经网络的梯度下降(Gradient descent for neural networks)
      • 3.9 (选修)直观理解反向传播(Backpropagation intuition )
      • 3.10 随机初始化(Random+Initialization)
      • Planar data classification with one hidden layer
      • planar_utils.py
      • testCases.py
    • 第四周:深层神经网络(Deep Neural Networks)
      • 4.1 深层神经网络(Deep L-layer neural network)
      • 4.2 前向传播和反向传播(Forward and backward propagation)
      • 4.3 深层网络中的前向传播(Forward propagation in a Deep Network )
      • 4.4 为什么使用深层表示?(Why deep representations?)
      • 4.5 搭建神经网络块(Building blocks of deep neural networks)
      • 4.6 参数 VS 超参数(Parameters vs Hyperparameters)
      • Building your Deep Neural Network Step by Step
      • dnn_utils.py
      • testCases.py
      • Deep Neural Network Application
      • dnn_app_utils.py
  • 第二门课 改善深层神经网络:超参数调试、 正 则 化 以 及 优 化 (Improving Deep Neural Networks:Hyperparameter tuning, Regulariza
    • 第二门课 改善深层神经网络:超参数调试、正则化以及优化(Improving Deep Neural Networks:Hyperparameter tuning, Regularization and
      • 第一周:深度学习的实用层面(Practical aspects of Deep Learning)
        • 1.1 训练,验证,测试集(Train / Dev / Test sets)
        • 1.2 偏差,方差(Bias /Variance)
        • 1.3 机器学习基础(Basic Recipe for Machine Learning)
        • 1.4 正则化(Regularization)
        • 1.5 为什么正则化有利于预防过拟合呢?(Why regularization reduces overfitting?)
        • 1.6 dropout 正则化(Dropout Regularization)
        • 1.7 理解 dropout(Understanding Dropout)
        • 1.8 其他正则化方法(Other regularization methods)
        • 1.9 归一化输入(Normalizing inputs)
        • 1.10 梯度消失/梯度爆炸(Vanishing / Exploding gradients)
        • 1.11 神经网络的权重初始化(Weight Initialization for Deep Networks)
        • 1.12 梯度的数值逼近(Numerical approximation of gradients)
        • 1.13 梯度检验(Gradient checking)
        • 1.14 梯度检验应用的注意事项(Gradient Checking Implementation Notes)
        • Initialization
        • Gradient Checking
        • Regularization
        • reg_utils.py
        • testCases.py
      • 第二周:优化算法 (Optimization algorithms)
        • 2.1 Mini-batch 梯度下降(Mini-batch gradient descent)
        • 2.2 理解 mini-batch 梯度下降法(Understanding mini-batch gradient descent)
        • 2.3 指数加权平均数(Exponentially weighted averages)
        • 2.4 理解指数加权平均数(Understanding exponentially weighted averages )
        • 2.5 指 数 加 权 平 均 的 偏 差 修 正 ( Bias correction in exponentially weighted averages )
        • 2.6 动量梯度下降法(Gradient descent with Momentum )
        • 2.7 RMSprop( root mean square prop)
        • 2.8 Adam 优化算法(Adam optimization algorithm)
        • 2.9 学习率衰减(Learning rate decay)
        • 2.10 局部最优的问题(The problem of local optima)
        • Optimization
        • opt_utils.py
        • testCases.py
      • 第 三 周 超 参 数 调 试 、 Batch 正 则 化 和 程 序 框 架 (Hyperparameter tuning)
        • 3.1 调试处理(Tuning process)
        • 3.2 为超参数选择合适的范围(Using an appropriate scale to pick hyperparameters)
        • 3.3 超参数训练的实践: Pandas VS Caviar(Hyperparameters tuning in practice: Pandas vs. Caviar)
        • 3.4 归一化网络的激活函数( Normalizing activations in a network)
        • 3.5 将 Batch Norm 拟合进神经网络(Fitting Batch Norm into a neural network)
        • 3.6 Batch Norm 为什么奏效?(Why does Batch Norm work?)
        • 3.7 测试时的 Batch Norm(Batch Norm at test time)
        • 3.8 Softmax 回归(Softmax regression)
        • 3.9 训练一个 Softmax 分类器(Training a Softmax classifier)
        • tensorflow tutorial
        • improv_utils.py
        • tf_utils.py
  • 第三门课 结构化机器学习项目(Structuring Machine Learning Projects)
    • 第三门课 结构化机器学习项目(Structuring Machine Learning Projects)
      • 第一周 机器学习(ML)策略(1)(ML strategy(1))
        • 1.1 为什么是 ML 策略?(Why ML Strategy?)
        • 1.2 正交化(Orthogonalization)
        • 1.3 单一数字评估指标(Single number evaluation metric)
        • 1.4 满足和优化指标(Satisficing and optimizing metrics)
        • 1.5 训练/开发/测试集划分(Train/dev/test distributions)
        • 1.6 开发集和测试集的大小(Size of dev and test sets)
        • 1.7 什么时候该改变开发/测试集和指标?(When to change dev/test sets and metrics)
        • 1.8 为什么是人的表现?( Why human-level performance?)
        • 1.9 可避免偏差(Avoidable bias)
        • 1.10 理解人的表现(Understanding human-level performance)
        • 1.11 超过人的表现(Surpassing human- level performance)
        • 1.12 改善你的模型的表现(Improving your model performance)
      • 第二周:机器学习策略(2)(ML Strategy (2))
        • 2.1 进行误差分析(Carrying out error analysis)
        • 2.2 清楚标注错误的数据(Cleaning up Incorrectly labeled data)
        • 2.3 快速搭建你的第一个系统,并进行迭代(Build your first system quickly, then iterate)
        • 2.4 在不同的划分上进行训练并测试(Training and testing on different distributions)
        • 2.5 不匹配数据划分的偏差和方差(Bias and Variance with mismatched data distributions)
        • 2.6 定位数据不匹配(Addressing data mismatch)
        • 2.7 迁移学习(Transfer learning)
        • 2.8 多任务学习(Multi-task learning)
        • 2.9 什么是端到端的深度学习?(What is end-to-end deep learning?)
        • 2.10 是否要使用端到端的深度学习?(Whether to use end-to-end learning?)
  • 第四门课 卷积神经网络(Convolutional Neural Networks)
    • 第四门课 卷积神经网络(Convolutional Neural Networks)
      • 第一周 卷积神经网络(Foundations of Convolutional Neural Networks)
        • 1.1 计算机视觉(Computer vision)
        • 1.2 边缘检测示例(Edge detection example)
        • 1.3 更多边缘检测内容(More edge detection)
        • 1.4 Padding
        • 1.5 卷积步长(Strided convolutions)
        • 1.6 三维卷积(Convolutions over volumes)
        • 1.7 单层卷积网络(One layer of a convolutional network)
        • 1.8 简单卷积网络示例(A simple convolution network example)
        • 1.9 池化层(Pooling layers)
        • 1.10 卷积神经网络示例(Convolutional neural network example)
        • 1.11 为什么使用卷积?(Why convolutions?)
        • Convolution model Step by Step
        • Convolutional Neural Networks: Application
        • cnn_utils
      • 第二周 深度卷积网络:实例探究(Deep convolutional models: case studies)
        • 2.1 经典网络(Classic networks)
        • 2.2 残差网络(Residual Networks (ResNets))
        • 2.3 残差网络为什么有用?(Why ResNets work?)
        • 2.4 网络中的网络以及 1×1 卷积(Network in Network and 1×1 convolutions)
        • 2.5 谷歌 Inception 网络简介(Inception network motivation)
        • 2.6 Inception 网络(Inception network)
        • 2.7 迁移学习(Transfer Learning)
        • 2.8 数据扩充(Data augmentation)
        • 2.9 计算机视觉现状(The state of computer vision)
        • Residual Networks
        • Keras tutorial - the Happy House
        • kt_utils.py
      • 第三周 目标检测(Object detection)
        • 3.1 目标定位(Object localization)
        • 3.2 特征点检测(Landmark detection)
        • 3.3 目标检测(Object detection)
        • 3.4 卷积的滑动窗口实现(Convolutional implementation of sliding windows)
        • 3.5 Bounding Box预测(Bounding box predictions)
        • 3.6 交并比(Intersection over union)
        • 3.7 非极大值抑制(Non-max suppression)
        • 3.8 Anchor Boxes
        • 3.9 YOLO 算法(Putting it together: YOLO algorithm)
        • 3.10 候选区域(选修)(Region proposals (Optional))
        • Autonomous driving application - Car detection
        • yolo_utils.py
      • 第四周 特殊应用:人脸识别和神经风格转换(Special applications: Face recognition &Neural style transfer)
        • 4.1 什么是人脸识别?(What is face recognition?)
        • 4.2 One-Shot学习(One-shot learning)
        • 4.3 Siamese 网络(Siamese network)
        • 4.4 Triplet 损失(Triplet 损失)
        • 4.5 面部验证与二分类(Face verification and binary classification)
        • 4.6 什么是深度卷积网络?(What are deep ConvNets learning?)
        • 4.7 代价函数(Cost function)
        • 4.8 内容代价函数(Content cost function)
        • 4.9 风格代价函数(Style cost function)
        • 4.10 一维到三维推广(1D and 3D generalizations of models)
        • Art Generation with Neural Style Transfer
        • nst_utils.py
        • Face Recognition for the Happy House
        • fr_utils.py
        • inception_blocks.py
  • 第五门课 序列模型(Sequence Models)
    • 第五门课 序列模型(Sequence Models)
      • 第一周 循环序列模型(Recurrent Neural Networks)
        • 1.1 为什么选择序列模型?(Why Sequence Models?)
        • 1.2 数学符号(Notation)
        • 1.3 循环神经网络模型(Recurrent Neural Network Model)
        • 1.4 通过时间的反向传播(Backpropagation through time)
        • 1.5 不同类型的循环神经网络(Different types of RNNs)
        • 1.6 语言模型和序列生成(Language model and sequence generation)
        • 1.7 对新序列采样(Sampling novel sequences)
        • 1.8 循环神经网络的梯度消失(Vanishing gradients with RNNs)
        • 1.9 GRU单元(Gated Recurrent Unit(GRU))
        • 1.10 长短期记忆(LSTM(long short term memory)unit)
        • 1.11 双向循环神经网络(Bidirectional RNN)
        • 1.12 深层循环神经网络(Deep RNNs)
        • Building your Recurrent Neural Network
        • rnn_utils.py
        • Dinosaurus Island -- Character level language model final
        • utils.py
        • shakespeare_utils.py
        • Improvise a Jazz Solo with an LSTM Network
      • 第二周 自然语言处理与词嵌入(Natural Language Processing and Word Embeddings)
        • 2.1 词汇表征(Word Representation)
        • 2.2 使用词嵌入(Using Word Embeddings)
        • 2.3 词嵌入的特性(Properties of Word Embeddings)
        • 2.4 嵌入矩阵(Embedding Matrix)
        • 2.5 学习词嵌入(Learning Word Embeddings)
        • 2.6 Word2Vec
        • 2.7 负采样(Negative Sampling)
        • 2.8 GloVe 词向量(GloVe Word Vectors)
        • 2.9 情感分类(Sentiment Classification)
        • 2.10 词嵌入除偏(Debiasing Word Embeddings)
        • Operations on word vectors
        • w2v_utils.py
        • Emojify
        • emo_utils.py
      • 第三周 序列模型和注意力机制(Sequence models & Attention mechanism)
        • 3.1 基础模型(Basic Models)
        • 3.2 选择最可能的句子(Picking the most likely sentence)
        • 3.3 集束搜索(Beam Search)
        • 3.4 改进集束搜索(Refinements to Beam Search)
        • 3.5 集束搜索的误差分析(Error analysis in beam search)
        • 3.6 Bleu 得分(选修)(Bleu Score (optional))
        • 3.7 注意力模型直观理解(Attention Model Intuition)
        • 3.8注意力模型(Attention Model)
        • 3.9语音识别(Speech recognition)
        • 3.10触发字检测(Trigger Word Detection)
        • Neural machine translation with attention
        • nmt_utils.py
        • Trigger word detection
        • td_utils.py
Powered by GitBook
On this page
  • TensorFlow Tutorial
  • 1 - Exploring the Tensorflow Library
  • 2 - Building your first neural network in tensorflow

Was this helpful?

  1. 第二门课 改善深层神经网络:超参数调试、 正 则 化 以 及 优 化 (Improving Deep Neural Networks:Hyperparameter tuning, Regulariza
  2. 第二门课 改善深层神经网络:超参数调试、正则化以及优化(Improving Deep Neural Networks:Hyperparameter tuning, Regularization and
  3. 第 三 周 超 参 数 调 试 、 Batch 正 则 化 和 程 序 框 架 (Hyperparameter tuning)

tensorflow tutorial

TensorFlow Tutorial

Welcome to this week's programming assignment. Until now, you've always used numpy to build neural networks. Now we will step you through a deep learning framework that will allow you to build neural networks more easily. Machine learning frameworks like TensorFlow, PaddlePaddle, Torch, Caffe, Keras, and many others can speed up your machine learning development significantly. All of these frameworks also have a lot of documentation, which you should feel free to read. In this assignment, you will learn to do the following in TensorFlow:

  • Initialize variables

  • Start your own session

  • Train algorithms

  • Implement a Neural Network

Programing frameworks can not only shorten your coding time, but sometimes also perform optimizations that speed up your code.

1 - Exploring the Tensorflow Library

To start, you will import the library:

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict


%matplotlib inline
np.random.seed(1)

Now that you have imported the library, we will walk you through its different applications. You will start with an example, where we compute for you the loss of one training example.

loss=L(y^,y)=(y^(i)−y(i))2        (1)loss = \mathcal{L}(\hat{y}, y) = (\hat y^{(i)} - y^{(i)})^2 \ \ \ \ \ \ \ \ (1)loss=L(y^​,y)=(y^​(i)−y(i))2        (1)
y_hat = tf.constant(36, name='y_hat') # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y') # Define y. Set to 39


loss = tf.Variable((y - y_hat)**2, name='loss') # Create a variable for the loss


init = tf.global_variables_initializer() # When init is run later (session.run(init)),
# the loss variable will be initialized and ready to be computed
with tf.Session() as session: # Create a session and print the output
session.run(init) # Initializes the variables
print(session.run(loss)) # Prints the loss

"""
9
"""

Writing and running programs in TensorFlow has the following steps:

  1. Create Tensors (variables) that are not yet executed/evaluated.

  2. Write operations between those Tensors.

  3. Initialize your Tensors.

  4. Create a Session.

  5. Run the Session. This will run the operations you'd written above.

Therefore, when we created a variable for the loss, we simply defined the loss as a function of other quantities, but did not evaluate its value. To evaluate it, we had to run init=tf.global_variables_initializer(). That initialized the loss variable, and in the last line we were finally able to evaluate the value of loss and print its value.

Now let us look at an easy example. Run the cell below:

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

"""
Tensor("Mul:0", shape=(), dtype=int32)
"""

As expected, you will not see 20! You got a tensor saying that the result is a tensor that does not have the shape attribute, and is of type "int32". All you did was put in the 'computation graph', but you have not run this computation yet. In order to actually multiply the two numbers, you will have to create a session and run it.

sess = tf.Session()
print(sess.run(c))

"""
20
"""

Great! To summarize, remember to initialize your variables, create a session and run the operations inside the session.

Next, you'll also have to know about placeholders. A placeholder is an object whose value you can specify only later. To specify values for a placeholder, you can pass in values by using a "feed dictionary" (feed_dict variable). Below, we created a placeholder for x. This allows us to pass in a number later when we run the session.

# Change the value of x in the feed_dict


x = tf.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
sess.close()

"""
20
"""

When you first defined x you did not have to specify a value for it. A placeholder is simply a variable that you will assign data to only later, when running the session. We say that you feed data to these placeholders when running the session.

Here's what's happening: When you specify the operations needed for a computation, you are telling TensorFlow how to construct a computation graph. The computation graph can have some placeholders whose values you will specify only later. Finally, when you run the session, you are telling TensorFlow to execute the computation graph.

1.1 - Linear function

Lets start this programming exercise by computing the following equation: Y=WX+bY = WX + bY=WX+b, where WWW and XXX are random matrices and b is a random vector.

Exercise: Compute WX+bWX + bWX+b where W,XW, XW,X, and bbb are drawn from a random normal distribution. W is of shape (4, 3), X is (3,1) and b is (4,1). As an example, here is how you would define a constant X that has shape (3,1):

X = tf.constant(np.random.randn(3,1), name = "X")

You might find the following functions helpful:

  • tf.matmul(..., ...) to do a matrix multiplication

  • tf.add(..., ...) to do an addition

  • np.random.randn(...) to initialize randomly

# GRADED FUNCTION: linear_function

def linear_function():
    """
    Implements a linear function: 
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns: 
    result -- runs the session for Y = WX + b 
    """

    np.random.seed(1)

    ### START CODE HERE ### (4 lines of code)
    X = tf.constant(np.random.randn(3, 1), name = 'X')
    W = tf.constant(np.random.randn(4, 3), name = 'W')
    b = tf.constant(np.random.randn(4, 1), name = 'b')
    Y = tf.add(tf.matmul(W, X), b)
    ### END CODE HERE ### 

    # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate

    ### START CODE HERE ###
    sess = tf.Session()
    result = sess.run(Y)
    ### END CODE HERE ### 

    # close the session 
    sess.close()

    return result
print( "result = " + str(linear_function()))
result = [[-2.15657382]
          [ 2.95891446]
          [-1.08926781]
          [-0.84538042]]

1.2 - Computing the sigmoid

Great! You just implemented a linear function. Tensorflow offers a variety of commonly used neural network functions like tf.sigmoid and tf.softmax. For this exercise lets compute the sigmoid function of an input.

You will do this exercise using a placeholder variable x. When running the session, you should use the feed dictionary to pass in the input z. In this exercise, you will have to (i) create a placeholder x, (ii) define the operations needed to compute the sigmoid using tf.sigmoid, and then (iii) run the session.

Exercise : Implement the sigmoid function below. You should use the following:

  • tf.placeholder(tf.float32, name = "...")

  • tf.sigmoid(...)

  • sess.run(..., feed_dict = {x: z})

Note that there are two typical ways to create and use sessions in tensorflow:

Method 1:

sess = tf.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session

Method 2:

with tf.Session() as sess:
    # run the variables initialization (if needed), run the operations
    result = sess.run(..., feed_dict = {...})
    # This takes care of closing the session for you :)
# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Computes the sigmoid of z

    Arguments:
    z -- input value, scalar or vector

    Returns: 
    results -- the sigmoid of z
    """

    ### START CODE HERE ### ( approx. 4 lines of code)
    # Create a placeholder for x. Name it 'x'.
    x = tf.placeholder(tf.float32, name = 'x')

    # compute sigmoid(x)
    sigmoid = tf.sigmoid(x)

    # Create a session, and run it. Please use the method 2 explained above. 
    # You should use a feed_dict to pass z's value to x. 
    with tf.Session() as session:
        # Run session and call the output "result"
        result = session.run(sigmoid, feed_dict = {x:z})

    ### END CODE HERE ###

    return result
print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))
sigmoid(0) = 0.5
sigmoid(12) = 0.999994

To summarize, you how know how to: 1. Create placeholders 2. Specify the computation graph corresponding to operations you want to compute 3. Create the session 4. Run the session, using a feed dictionary if necessary to specify placeholder variables' values.

1.3 - Computing the Cost

You can also use a built-in function to compute the cost of your neural network. So instead of needing to write code to compute this as a function of a[2](i)a^{[2](i)}a[2](i) and y(i)y^{(i)}y(i) for i=1...m:

J=−1m∑i=1m(y(i)log⁡a[2](i)+(1−y(i))log⁡(1−a[2](i)))         (2)J = - \frac{1}{m} \sum_{i = 1}^m \large ( \small y^{(i)} \log a^{ [2] (i)} + (1-y^{(i)})\log (1-a^{ [2] (i)} )\large )\small\ \ \ \ \ \ \ \ \ (2)J=−m1​i=1∑m​(y(i)loga[2](i)+(1−y(i))log(1−a[2](i)))         (2)

you can do it in one line of code in tensorflow!

Exercise: Implement the cross entropy loss. The function you will use is:

  • tf.nn.sigmoid_cross_entropy_with_logits(logits = ..., labels = ...)

Your code should input z, compute the sigmoid (to get a) and then compute the cross entropy cost JJJ. All this can be done using one call to tf.nn.sigmoid_cross_entropy_with_logits, which computes −1m∑i=1m(y(i)log⁡σ(z[2](i))+(1−y(i))log⁡(1−σ(z[2](i)))         (2)- \frac{1}{m} \sum_{i = 1}^m \large ( \small y^{(i)} \log \sigma(z^{[2](i)}) + (1-y^{(i)})\log (1-\sigma(z^{[2](i)})\large )\small\ \ \ \ \ \ \ \ \ (2)−m1​∑i=1m​(y(i)logσ(z[2](i))+(1−y(i))log(1−σ(z[2](i)))         (2)

# GRADED FUNCTION: cost

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 

    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" 
    in the TensorFlow documentation. So logits will feed into z, and labels into y. 

    Returns:
    cost -- runs the session of the cost (formula (2))
    """

    ### START CODE HERE ### 

    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z = tf.placeholder(tf.float32, name = 'z')
    y = tf.placeholder(tf.float32, name = 'y')

    # Use the loss function (approx. 1 line)
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y)

    # Create a session (approx. 1 line). See method 1 above.
    sess = tf.Session()

    # Run the session (approx. 1 line).
    cost = sess.run(cost,feed_dict = {z:logits, y:labels})

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    ### END CODE HERE ###

    return cost
logits = sigmoid(np.array([0.2,0.4,0.7,0.9]))
cost = cost(logits, np.array([0,0,1,1]))
print ("cost = " + str(cost))

"""
cost = [ 1.00538719 1.03664088 0.41385433 0.39956614]
"""

1.4 - Using One Hot encodings

Many times in deep learning you will have a y vector with numbers ranging from 0 to C-1, where C is the number of classes. If C is for example 4, then you might have the following y vector which you will need to convert as follows:

This is called a "one hot" encoding, because in the converted representation exactly one element of each column is "hot" (meaning set to 1). To do this conversion in numpy, you might have to write a few lines of code. In tensorflow, you can use one line of code:

  • tf.one_hot(labels, depth, axis)

Exercise: Implement the function below to take one vector of labels and the total number of classes CCC, and return the one hot encoding. Use tf.one_hot() to do this.

# GRADED FUNCTION: one_hot_matrix

def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j) 
                     will be 1. 

    Arguments:
    labels -- vector containing the labels 
    C -- number of classes, the depth of the one hot dimension

    Returns: 
    one_hot -- one hot matrix
    """

    ### START CODE HERE ###

    # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
    C = tf.constant(C, name = 'C')

    # Use tf.one_hot, be careful with the axis (approx. 1 line)
    one_hot_matrix = tf.one_hot(indices = labels, depth = C, axis = 0)

    # Create the session (approx. 1 line)
    sess = tf.Session()

    # Run the session (approx. 1 line)
    one_hot = sess.run(one_hot_matrix)

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    ### END CODE HERE ###

    return one_hot
labels = np.array([1, 2, 3, 0, 2, 1])
one_hot = one_hot_matrix(labels, C = 4)
print ("one_hot = " + str(one_hot))
one_hot = [[ 0. 0. 0. 1. 0. 0.]
           [ 1. 0. 0. 0. 0. 1.]
           [ 0. 1. 0. 0. 1. 0.]
           [ 0. 0. 1. 0. 0. 0.]]

1.5 - Initialize with zeros and ones

Now you will learn how to initialize a vector of zeros and ones. The function you will be calling is tf.ones(). To initialize with zeros you could use tf.zeros() instead. These functions take in a shape and return an array of dimension shape full of zeros and ones respectively.

Exercise: Implement the function below to take in a shape and to return an array (of the shape's dimension of ones).

  • tf.ones(shape)

# GRADED FUNCTION: ones

def ones(shape):
    """
    Creates an array of ones of dimension shape

    Arguments:
    shape -- shape of the array you want to create

    Returns: 
    ones -- array containing only ones
    """

    ### START CODE HERE ###

    # Create "ones" tensor using tf.ones(...). (approx. 1 line)
    ones = tf.ones(shape)

    # Create the session (approx. 1 line)
    sess = tf.Session()

    # Run the session to compute 'ones' (approx. 1 line)
    ones = sess.run(ones)

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    ### END CODE HERE ###
    return ones
print ("ones = " + str(ones([3])))

"""
ones = [ 1. 1. 1.]
"""

2 - Building your first neural network in tensorflow

In this part of the assignment you will build a neural network using tensorflow. Remember that there are two parts to implement a tensorflow model:

  • Create the computation graph

  • Run the graph

Let's delve into the problem you'd like to solve!

2.0 - Problem statement: SIGNS Dataset

One afternoon, with some friends we decided to teach our computers to decipher sign language. We spent a few hours taking pictures in front of a white wall and came up with the following dataset. It's now your job to build an algorithm that would facilitate communications from a speech-impaired person to someone who doesn't understand sign language.

  • Training set: 1080 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (180 pictures per number).

  • Test set: 120 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (20 pictures per number).

Note that this is a subset of the SIGNS dataset. The complete dataset contains many more signs.

Figure 1: SIGNS dataset

Run the following code to load the dataset.

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

Change the index below and run the cell to visualize some examples in the dataset.

# Example of a picture
index = 0
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

"""
y = 5
"""

As usual you flatten the image dataset, then normalize it by dividing by 255. On top of that, you will convert each label to a one-hot vector as shown in Figure 1. Run the cell below to do so.

# Flatten the training and test images
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten/255.
X_test = X_test_flatten/255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)


print ("number of training examples = " + str(X_train.shape[1]))
print ("number of test examples = " + str(X_test.shape[1]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

Note that 12288 comes from 64×64×364 \times 64 \times 364×64×3. Each image is square, 64 by 64 pixels, and 3 is for the RGB colors. Please make sure all these shapes make sense to you before continuing.

Your goal is to build an algorithm capable of recognizing a sign with high accuracy. To do so, you are going to build a tensorflow model that is almost the same as one you have previously built in numpy for cat recognition (but now using a softmax output). It is a great occasion to compare your numpy implementation to the tensorflow one.

The model is LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX. The SIGMOID output layer has been converted to a SOFTMAX. A SOFTMAX layer generalizes SIGMOID to when there are more than two classes.

2.1 - Create placeholders

Your first task is to create placeholders for X and Y. This will allow you to later pass your training data in when you run your session.

Exercise: Implement the function below to create the placeholders in tensorflow.

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)

    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"

    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

    ### START CODE HERE ### (approx. 2 lines)
    X = tf.placeholder(tf.float32, [n_x, None], name="X")
    Y = tf.placeholder(tf.float32, [n_y, None], name="Y")
    ### END CODE HERE ###

    return X, Y
X, Y = create_placeholders(12288, 6)
print("X = " + str(X))
print("Y = " + str(Y))
X = Tensor("Placeholder_3:0", shape=(12288, ?), dtype=float32)
Y = Tensor("Placeholder_4:0", shape=(6, ?), dtype=float32)

2.2 - Initializing the parameters

Your second task is to initialize the parameters in tensorflow.

Exercise: Implement the function below to initialize the parameters in tensorflow. You are going use Xavier Initialization for weights and Zero Initialization for biases. The shapes are given below. As an example, to help you, for W1 and b1 you could use:

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

Please use seed = 1 to make sure your results match ours.

# GRADED FUNCTION: initialize_parameters

def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]

    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """

    tf.set_random_seed(1)                   # so that your "random" numbers match ours

    ### START CODE HERE ### (approx. 6 lines of code)
    W1 = tf.get_variable("W1", [25, 12288], initializer = tf.contrib.layers.xavier_initializer(seed=1))
    b1 = tf.get_variable("b1", [25, 1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [12, 25], initializer = tf.contrib.layers.xavier_initializer(seed=1))
    b2 = tf.get_variable("b2", [12, 1], initializer = tf.zeros_initializer())
    W3 = tf.get_variable("W3", [6, 12], initializer = tf.contrib.layers.xavier_initializer(seed=1))
    b3 = tf.get_variable("b3", [6, 1], initializer = tf.zeros_initializer())
    ### END CODE HERE ###

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters
tf.reset_default_graph()
with tf.Session() as sess:
    parameters = initialize_parameters()
    print('W1 = ', parameters['W1'])
    print('b1 = ', parameters['b1'])
    print('W2 = ', parameters['W2'])
    print('b2 = ', parameters['b2'])
    print('W3 = ', parameters['W3'])
    print('b3 = ', parameters['b3'])
W1 =  <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref>
b1 =  <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref>
W2 =  <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref>
b2 =  <tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref>
W3 =  <tf.Variable 'W3:0' shape=(6, 12) dtype=float32_ref>
b3 =  <tf.Variable 'b3:0' shape=(6, 1) dtype=float32_ref>

As expected, the parameters haven't been evaluated yet.

2.3 - Forward propagation in tensorflow

You will now implement the forward propagation module in tensorflow. The function will take in a dictionary of parameters and it will complete the forward pass. The functions you will be using are:

  • tf.add(...,...) to do an addition

  • tf.matmul(...,...) to do a matrix multiplication

  • tf.nn.relu(...) to apply the ReLU activation

Question: Implement the forward pass of the neural network. We commented for you the numpy equivalents so that you can compare the tensorflow implementation to numpy. It is important to note that the forward propagation stops at z3. The reason is that in tensorflow the last linear layer output is given as input to the function computing the loss. Therefore, you don't need a3!

# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """

    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']

    ### START CODE HERE ### (approx. 5 lines)              # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1, X), b1)                                              # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)                                              # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)                                              # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)                                              # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)                                              # Z3 = np.dot(W3,Z2) + b3
    ### END CODE HERE ###

    return Z3
tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    print("Z3 = " + str(Z3))

"""
Z3 = Tensor("Add_2:0", shape=(6, ?), dtype=float32)
"""

You may have noticed that the forward propagation doesn't output any cache. You will understand why below, when we get to brackpropagation.

2.4 Compute cost

As seen before, it is very easy to compute the cost using:

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))

Question: Implement the cost function below.

  • It is important to know that the "logits" and "labels" inputs of tf.nn.softmax_cross_entropy_with_logits are expected to be of shape (number of examples, num_classes). We have thus transposed Z3 and Y for you.

  • Besides, tf.reduce_mean basically does the summation over the examples.

# GRADED FUNCTION: compute_cost 

def compute_cost(Z3, Y):
    """
    Computes the cost

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """

    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)

    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
    ### END CODE HERE ###

    return cost
tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))

"""
cost = Tensor("Mean:0", shape=(), dtype=float32)
"""

2.5 - Backward propagation & parameter updates

This is where you become grateful to programming frameworks. All the backpropagation and the parameters update is taken care of in 1 line of code. It is very easy to incorporate this line in the model.

After you compute the cost function. You will create an "optimizer" object. You have to call this object along with the cost when running the tf.session. When called, it will perform an optimization on the given cost with the chosen method and learning rate.

For instance, for gradient descent the optimizer would be:

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

To make the optimization you would do:

_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

This computes the backpropagation by passing through the tensorflow graph in the reverse order. From cost to inputs.

Note When coding, we often use _ as a "throwaway" variable to store values that we won't need to use later. Here, _ takes on the evaluated value of optimizer, which we don't need (and c takes the value of the cost variable).

2.6 - Building the model

Now, you will bring it all together!

Exercise: Implement the model. You will be calling the functions you had previously implemented.

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
          num_epochs = 1500, minibatch_size = 32, print_cost = True):
    """
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.

    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs

    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """

    ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables
    tf.set_random_seed(1)                             # to keep consistent results
    seed = 3                                          # to keep consistent results
    (n_x, m) = X_train.shape                          # (n_x: input size, m : number of examples in the train set)
    n_y = Y_train.shape[0]                            # n_y : output size
    costs = []                                        # To keep track of the cost

    # Create Placeholders of shape (n_x, n_y)
    ### START CODE HERE ### (1 line)
    X, Y = create_placeholders(n_x, n_y)
    ### END CODE HERE ###

    # Initialize parameters
    ### START CODE HERE ### (1 line)
    parameters = initialize_parameters()
    ### END CODE HERE ###

    # Forward propagation: Build the forward propagation in the tensorflow graph
    ### START CODE HERE ### (1 line)
    Z3 = forward_propagation(X, parameters)
    ### END CODE HERE ###

    # Cost function: Add cost function to tensorflow graph
    ### START CODE HERE ### (1 line)
    cost = compute_cost(Z3, Y)
    ### END CODE HERE ###

    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
    ### START CODE HERE ### (1 line)
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    ### END CODE HERE ###

    # Initialize all the variables
    init = tf.global_variables_initializer()

    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:

        # Run the initialization
        sess.run(init)

        # Do the training loop
        for epoch in range(num_epochs):

            epoch_cost = 0.                       # Defines a cost related to an epoch
            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:

                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch

                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
                ### START CODE HERE ### (1 line)
                _ , minibatch_cost = sess.run([optimizer, cost], feed_dict = {X:minibatch_X, Y:minibatch_Y,})
                ### END CODE HERE ###

                epoch_cost += minibatch_cost / num_minibatches

            # Print the cost every epoch
            if print_cost == True and epoch % 100 == 0:
                print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
                costs.append(epoch_cost)

        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # lets save the parameters in a variable
        parameters = sess.run(parameters)
        print ("Parameters have been trained!")

        # Calculate the correct predictions
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters

Run the following cell to train your model! On our machine it takes about 5 minutes. Your "Cost after epoch 100" should be 1.016458. If it's not, don't waste time; interrupt the training by clicking on the square (⬛) in the upper bar of the notebook, and try to correct your code. If it is the correct cost, take a break and come back in 5 minutes!

parameters = model(X_train, Y_train, X_test, Y_test)
Cost after epoch 0: 1.855702
Cost after epoch 100: 1.016458
Cost after epoch 200: 0.733102
Cost after epoch 300: 0.572940
Cost after epoch 400: 0.468774
Cost after epoch 500: 0.381021
Cost after epoch 600: 0.313822
Cost after epoch 700: 0.254158
Cost after epoch 800: 0.203829
Cost after epoch 900: 0.166421
Cost after epoch 1000: 0.141486
Cost after epoch 1100: 0.107580
Cost after epoch 1200: 0.086270
Cost after epoch 1300: 0.059371
Cost after epoch 1400: 0.052228
Parameters have been trained!
Train Accuracy: 0.999074
Test Accuracy: 0.716667

Amazing, your algorithm can recognize a sign representing a figure between 0 and 5 with 71.7% accuracy.

Insights:

  • Your model seems big enough to fit the training set well. However, given the difference between train and test accuracy, you could try to add L2 or dropout regularization to reduce overfitting.

  • Think about the session as a block of code to train the model. Each time you run the session on a minibatch, it trains the parameters. In total you have run the session a large number of times (1500 epochs) until you obtained well trained parameters.

2.7 - Test with your own image (optional / ungraded exercise)

Congratulations on finishing this assignment. You can now take a picture of your hand and see the output of your model. To do that: 1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub. 2. Add your image to this Jupyter Notebook's directory, in the "images" folder 3. Write your image's name in the following code 4. Run the code and check if the algorithm is right!

import scipy
from PIL import Image
from scipy import ndimage


## START CODE HERE ## (PUT YOUR IMAGE NAME)
my_image = "微信图片_20180426205725.jpg"
## END CODE HERE ##


# We preprocess your image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(64,64)).reshape((1, 64*64*3)).T
my_image_prediction = predict(my_image, parameters)


plt.imshow(image)
print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))

"""
Your algorithm predicts: y = 1

"""

You indeed deserved a "thumbs-up" although as you can see the algorithm seems to classify it incorrectly. The reason is that the training set doesn't contain any "thumbs-up", so the model doesn't know how to deal with it! We call that a "mismatched data distribution" and it is one of the various of the next course on "Structuring Machine Learning Projects".

What you should remember:

  • Tensorflow is a programming framework used in deep learning

  • The two main object classes in tensorflow are Tensors and Operators.

  • When you code in tensorflow you have to take the following steps:

  • Create a graph containing Tensors (Variables, Placeholders ...) and Operations (tf.matmul, tf.add, ...)

  • Create a session

  • Initialize the session

  • Run the session to execute the graph

  • You can execute the graph multiple times as you've seen in model()

  • The backpropagation and optimization is automatically done when running the session on the "optimizer" object.

Previous3.9 训练一个 Softmax 分类器(Training a Softmax classifier)Nextimprov_utils.py

Last updated 6 years ago

Was this helpful?

Here are examples for each number, and how an explanation of how we represent the labels. These are the original pictures, before we lowered the image resolutoion to 64 by 64 pixels.

##
##
##