DeepLearning.ai深度学习课程笔记
  • Introduction
  • 第一门课 神经网络和深度学习(Neural-Networks-and-Deep-Learning)
    • 第一周:深度学习引言(Introduction to Deep Learning)
      • 1.1 神经网络的监督学习(Supervised Learning with Neural Networks)
      • 1.2 为什么神经网络会流行?(Why is Deep Learning taking off?)
    • 第二周:神经网络的编程基础(Basics of Neural Network programming)
      • 2.1 二分类(Binary Classification)
      • 2.2 逻辑回归(Logistic Regression)
      • 2.3 逻辑回归的代价函数(Logistic Regression Cost Function)
      • 2.4 逻辑回归的梯度下降(Logistic Regression Gradient Descent)
      • 2.5 梯度下降的例子(Gradient Descent on m Examples)
      • 2.6 向量化 logistic 回归的梯度输出(Vectorizing Logistic Regression’s Gradient Output)
      • 2.7 (选修)logistic 损失函数的解释(Explanation of logistic regression cost function )
      • Logistic Regression with a Neural Network mindset 代码
      • lr_utils.py
    • 第三周:浅层神经网络(Shallow neural networks)
      • 3.1 神经网络概述(Neural Network Overview)
      • 3.2 神经网络的表示(Neural Network Representation )
      • 3.3 计算一个神经网络的输出(Computing a Neural Network's output )
      • 3.4 多样本向量化(Vectorizing across multiple examples )
      • 3.5 激活函数(Activation functions)
      • 3.6 为什么需要( 非线性激活函数?(why need a nonlinear activation function?)
      • 3.7 激活函数的导数(Derivatives of activation functions )
      • 3.8 神经网络的梯度下降(Gradient descent for neural networks)
      • 3.9 (选修)直观理解反向传播(Backpropagation intuition )
      • 3.10 随机初始化(Random+Initialization)
      • Planar data classification with one hidden layer
      • planar_utils.py
      • testCases.py
    • 第四周:深层神经网络(Deep Neural Networks)
      • 4.1 深层神经网络(Deep L-layer neural network)
      • 4.2 前向传播和反向传播(Forward and backward propagation)
      • 4.3 深层网络中的前向传播(Forward propagation in a Deep Network )
      • 4.4 为什么使用深层表示?(Why deep representations?)
      • 4.5 搭建神经网络块(Building blocks of deep neural networks)
      • 4.6 参数 VS 超参数(Parameters vs Hyperparameters)
      • Building your Deep Neural Network Step by Step
      • dnn_utils.py
      • testCases.py
      • Deep Neural Network Application
      • dnn_app_utils.py
  • 第二门课 改善深层神经网络:超参数调试、 正 则 化 以 及 优 化 (Improving Deep Neural Networks:Hyperparameter tuning, Regulariza
    • 第二门课 改善深层神经网络:超参数调试、正则化以及优化(Improving Deep Neural Networks:Hyperparameter tuning, Regularization and
      • 第一周:深度学习的实用层面(Practical aspects of Deep Learning)
        • 1.1 训练,验证,测试集(Train / Dev / Test sets)
        • 1.2 偏差,方差(Bias /Variance)
        • 1.3 机器学习基础(Basic Recipe for Machine Learning)
        • 1.4 正则化(Regularization)
        • 1.5 为什么正则化有利于预防过拟合呢?(Why regularization reduces overfitting?)
        • 1.6 dropout 正则化(Dropout Regularization)
        • 1.7 理解 dropout(Understanding Dropout)
        • 1.8 其他正则化方法(Other regularization methods)
        • 1.9 归一化输入(Normalizing inputs)
        • 1.10 梯度消失/梯度爆炸(Vanishing / Exploding gradients)
        • 1.11 神经网络的权重初始化(Weight Initialization for Deep Networks)
        • 1.12 梯度的数值逼近(Numerical approximation of gradients)
        • 1.13 梯度检验(Gradient checking)
        • 1.14 梯度检验应用的注意事项(Gradient Checking Implementation Notes)
        • Initialization
        • Gradient Checking
        • Regularization
        • reg_utils.py
        • testCases.py
      • 第二周:优化算法 (Optimization algorithms)
        • 2.1 Mini-batch 梯度下降(Mini-batch gradient descent)
        • 2.2 理解 mini-batch 梯度下降法(Understanding mini-batch gradient descent)
        • 2.3 指数加权平均数(Exponentially weighted averages)
        • 2.4 理解指数加权平均数(Understanding exponentially weighted averages )
        • 2.5 指 数 加 权 平 均 的 偏 差 修 正 ( Bias correction in exponentially weighted averages )
        • 2.6 动量梯度下降法(Gradient descent with Momentum )
        • 2.7 RMSprop( root mean square prop)
        • 2.8 Adam 优化算法(Adam optimization algorithm)
        • 2.9 学习率衰减(Learning rate decay)
        • 2.10 局部最优的问题(The problem of local optima)
        • Optimization
        • opt_utils.py
        • testCases.py
      • 第 三 周 超 参 数 调 试 、 Batch 正 则 化 和 程 序 框 架 (Hyperparameter tuning)
        • 3.1 调试处理(Tuning process)
        • 3.2 为超参数选择合适的范围(Using an appropriate scale to pick hyperparameters)
        • 3.3 超参数训练的实践: Pandas VS Caviar(Hyperparameters tuning in practice: Pandas vs. Caviar)
        • 3.4 归一化网络的激活函数( Normalizing activations in a network)
        • 3.5 将 Batch Norm 拟合进神经网络(Fitting Batch Norm into a neural network)
        • 3.6 Batch Norm 为什么奏效?(Why does Batch Norm work?)
        • 3.7 测试时的 Batch Norm(Batch Norm at test time)
        • 3.8 Softmax 回归(Softmax regression)
        • 3.9 训练一个 Softmax 分类器(Training a Softmax classifier)
        • tensorflow tutorial
        • improv_utils.py
        • tf_utils.py
  • 第三门课 结构化机器学习项目(Structuring Machine Learning Projects)
    • 第三门课 结构化机器学习项目(Structuring Machine Learning Projects)
      • 第一周 机器学习(ML)策略(1)(ML strategy(1))
        • 1.1 为什么是 ML 策略?(Why ML Strategy?)
        • 1.2 正交化(Orthogonalization)
        • 1.3 单一数字评估指标(Single number evaluation metric)
        • 1.4 满足和优化指标(Satisficing and optimizing metrics)
        • 1.5 训练/开发/测试集划分(Train/dev/test distributions)
        • 1.6 开发集和测试集的大小(Size of dev and test sets)
        • 1.7 什么时候该改变开发/测试集和指标?(When to change dev/test sets and metrics)
        • 1.8 为什么是人的表现?( Why human-level performance?)
        • 1.9 可避免偏差(Avoidable bias)
        • 1.10 理解人的表现(Understanding human-level performance)
        • 1.11 超过人的表现(Surpassing human- level performance)
        • 1.12 改善你的模型的表现(Improving your model performance)
      • 第二周:机器学习策略(2)(ML Strategy (2))
        • 2.1 进行误差分析(Carrying out error analysis)
        • 2.2 清楚标注错误的数据(Cleaning up Incorrectly labeled data)
        • 2.3 快速搭建你的第一个系统,并进行迭代(Build your first system quickly, then iterate)
        • 2.4 在不同的划分上进行训练并测试(Training and testing on different distributions)
        • 2.5 不匹配数据划分的偏差和方差(Bias and Variance with mismatched data distributions)
        • 2.6 定位数据不匹配(Addressing data mismatch)
        • 2.7 迁移学习(Transfer learning)
        • 2.8 多任务学习(Multi-task learning)
        • 2.9 什么是端到端的深度学习?(What is end-to-end deep learning?)
        • 2.10 是否要使用端到端的深度学习?(Whether to use end-to-end learning?)
  • 第四门课 卷积神经网络(Convolutional Neural Networks)
    • 第四门课 卷积神经网络(Convolutional Neural Networks)
      • 第一周 卷积神经网络(Foundations of Convolutional Neural Networks)
        • 1.1 计算机视觉(Computer vision)
        • 1.2 边缘检测示例(Edge detection example)
        • 1.3 更多边缘检测内容(More edge detection)
        • 1.4 Padding
        • 1.5 卷积步长(Strided convolutions)
        • 1.6 三维卷积(Convolutions over volumes)
        • 1.7 单层卷积网络(One layer of a convolutional network)
        • 1.8 简单卷积网络示例(A simple convolution network example)
        • 1.9 池化层(Pooling layers)
        • 1.10 卷积神经网络示例(Convolutional neural network example)
        • 1.11 为什么使用卷积?(Why convolutions?)
        • Convolution model Step by Step
        • Convolutional Neural Networks: Application
        • cnn_utils
      • 第二周 深度卷积网络:实例探究(Deep convolutional models: case studies)
        • 2.1 经典网络(Classic networks)
        • 2.2 残差网络(Residual Networks (ResNets))
        • 2.3 残差网络为什么有用?(Why ResNets work?)
        • 2.4 网络中的网络以及 1×1 卷积(Network in Network and 1×1 convolutions)
        • 2.5 谷歌 Inception 网络简介(Inception network motivation)
        • 2.6 Inception 网络(Inception network)
        • 2.7 迁移学习(Transfer Learning)
        • 2.8 数据扩充(Data augmentation)
        • 2.9 计算机视觉现状(The state of computer vision)
        • Residual Networks
        • Keras tutorial - the Happy House
        • kt_utils.py
      • 第三周 目标检测(Object detection)
        • 3.1 目标定位(Object localization)
        • 3.2 特征点检测(Landmark detection)
        • 3.3 目标检测(Object detection)
        • 3.4 卷积的滑动窗口实现(Convolutional implementation of sliding windows)
        • 3.5 Bounding Box预测(Bounding box predictions)
        • 3.6 交并比(Intersection over union)
        • 3.7 非极大值抑制(Non-max suppression)
        • 3.8 Anchor Boxes
        • 3.9 YOLO 算法(Putting it together: YOLO algorithm)
        • 3.10 候选区域(选修)(Region proposals (Optional))
        • Autonomous driving application - Car detection
        • yolo_utils.py
      • 第四周 特殊应用:人脸识别和神经风格转换(Special applications: Face recognition &Neural style transfer)
        • 4.1 什么是人脸识别?(What is face recognition?)
        • 4.2 One-Shot学习(One-shot learning)
        • 4.3 Siamese 网络(Siamese network)
        • 4.4 Triplet 损失(Triplet 损失)
        • 4.5 面部验证与二分类(Face verification and binary classification)
        • 4.6 什么是深度卷积网络?(What are deep ConvNets learning?)
        • 4.7 代价函数(Cost function)
        • 4.8 内容代价函数(Content cost function)
        • 4.9 风格代价函数(Style cost function)
        • 4.10 一维到三维推广(1D and 3D generalizations of models)
        • Art Generation with Neural Style Transfer
        • nst_utils.py
        • Face Recognition for the Happy House
        • fr_utils.py
        • inception_blocks.py
  • 第五门课 序列模型(Sequence Models)
    • 第五门课 序列模型(Sequence Models)
      • 第一周 循环序列模型(Recurrent Neural Networks)
        • 1.1 为什么选择序列模型?(Why Sequence Models?)
        • 1.2 数学符号(Notation)
        • 1.3 循环神经网络模型(Recurrent Neural Network Model)
        • 1.4 通过时间的反向传播(Backpropagation through time)
        • 1.5 不同类型的循环神经网络(Different types of RNNs)
        • 1.6 语言模型和序列生成(Language model and sequence generation)
        • 1.7 对新序列采样(Sampling novel sequences)
        • 1.8 循环神经网络的梯度消失(Vanishing gradients with RNNs)
        • 1.9 GRU单元(Gated Recurrent Unit(GRU))
        • 1.10 长短期记忆(LSTM(long short term memory)unit)
        • 1.11 双向循环神经网络(Bidirectional RNN)
        • 1.12 深层循环神经网络(Deep RNNs)
        • Building your Recurrent Neural Network
        • rnn_utils.py
        • Dinosaurus Island -- Character level language model final
        • utils.py
        • shakespeare_utils.py
        • Improvise a Jazz Solo with an LSTM Network
      • 第二周 自然语言处理与词嵌入(Natural Language Processing and Word Embeddings)
        • 2.1 词汇表征(Word Representation)
        • 2.2 使用词嵌入(Using Word Embeddings)
        • 2.3 词嵌入的特性(Properties of Word Embeddings)
        • 2.4 嵌入矩阵(Embedding Matrix)
        • 2.5 学习词嵌入(Learning Word Embeddings)
        • 2.6 Word2Vec
        • 2.7 负采样(Negative Sampling)
        • 2.8 GloVe 词向量(GloVe Word Vectors)
        • 2.9 情感分类(Sentiment Classification)
        • 2.10 词嵌入除偏(Debiasing Word Embeddings)
        • Operations on word vectors
        • w2v_utils.py
        • Emojify
        • emo_utils.py
      • 第三周 序列模型和注意力机制(Sequence models & Attention mechanism)
        • 3.1 基础模型(Basic Models)
        • 3.2 选择最可能的句子(Picking the most likely sentence)
        • 3.3 集束搜索(Beam Search)
        • 3.4 改进集束搜索(Refinements to Beam Search)
        • 3.5 集束搜索的误差分析(Error analysis in beam search)
        • 3.6 Bleu 得分(选修)(Bleu Score (optional))
        • 3.7 注意力模型直观理解(Attention Model Intuition)
        • 3.8注意力模型(Attention Model)
        • 3.9语音识别(Speech recognition)
        • 3.10触发字检测(Trigger Word Detection)
        • Neural machine translation with attention
        • nmt_utils.py
        • Trigger word detection
        • td_utils.py
Powered by GitBook
On this page
  • Emojify!
  • 1 - Baseline model: Emojifier-V1
  • 2 - Emojifier-V2: Using LSTMs in Keras:
  • 2.3 Building the Emojifier-V2
  • 😀😀😀😀😀😀
  • Acknowledgments

Was this helpful?

  1. 第五门课 序列模型(Sequence Models)
  2. 第五门课 序列模型(Sequence Models)
  3. 第二周 自然语言处理与词嵌入(Natural Language Processing and Word Embeddings)

Emojify

Emojify!

Welcome to the second assignment of Week 2. You are going to use word vector representations to build an Emojifier.

Have you ever wanted to make your text messages more expressive? Your emojifier app will help you do that. So rather than writing "Congratulations on the promotion! Lets get coffee and talk. Love you!" the emojifier can automatically turn this into "Congratulations on the promotion! 👍 Lets get coffee and talk. ☕️ Love you! ❤️"

You will implement a model which inputs a sentence (such as "Let's go see the baseball game tonight!") and finds the most appropriate emoji to be used with this sentence (⚾️). In many emoji interfaces, you need to remember that ❤️ is the "heart" symbol rather than the "love" symbol. But using word vectors, you'll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate words in the test set to the same emoji even if those words don't even appear in the training set. This allows you to build an accurate classifier mapping from sentences to emojis, even using a small training set.

In this exercise, you'll start with a baseline model (Emojifier-V1) using word embeddings, then build a more sophisticated model (Emojifier-V2) that further incorporates an LSTM.

Lets get started! Run the following cell to load the package you are going to use.

import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt


%matplotlib inline

1 - Baseline model: Emojifier-V1

1.1 - Dataset EMOJISET

Let's start by building a simple baseline classifier.

You have a tiny dataset (X, Y) where:

  • X contains 132 sentences (strings)

  • Y contains a integer label between 0 and 4 corresponding to an emoji for each sentence

Figure 1: EMOJISET - a classification problem with 5 classes. A few examples of sentences are given here.

Let's load the dataset using the code below. We split the dataset between training (132 examples) and testing (56 examples).

X_train, Y_train = read_csv('data/train_emoji.csv')
X_test, Y_test = read_csv('data/tesss.csv')
maxLen = len(max(X_train, key=len).split())

Run the following cell to print sentences from X_train and corresponding labels from Y_train. Change index to see different examples. Because of the font the iPy notebook uses, the heart emoji may be colored black rather than red.

index = 1
print(X_train[index], label_to_emoji(Y_train[index]))
I am proud of your achievements 😄

1.2 - Overview of the Emojifier-V1

In this part, you are going to implement a baseline model called "Emojifier-v1".

Figure 2: Baseline model (Emojifier-V1).

The input of the model is a string corresponding to a sentence (e.g. "I love you). In the code, the output will be a probability vector of shape (1,5), that you then pass in an argmax layer to extract the index of the most likely emoji output.

To get our labels into a format suitable for training a softmax classifier, lets convert YYY from its current shape current shape (m,1)(m, 1)(m,1) into a "one-hot representation" (m,5)(m, 5)(m,5), where each row is a one-hot vector giving the label of one example, You can do so using this next code snipper. Here, Y_oh stands for "Y-one-hot" in the variable names Y_oh_train and Y_oh_test:

Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)

Let's see what convert_to_one_hot() did. Feel free to change index to print out different values.

index = 50
print(Y_train[index], "is converted into one hot", Y_oh_train[index])
0 is converted into one hot [ 1. 0. 0. 0. 0.]

All the data is now ready to be fed into the Emojify-V1 model. Let's implement the model!

1.3 - Implementing Emojifier-V1

As shown in Figure (2), the first step is to convert an input sentence into the word vector representation, which then get averaged together. Similar to the previous exercise, we will use pretrained 50-dimensional GloVe embeddings. Run the following cell to load the word_to_vec_map, which contains all the vector representations.

word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

You've loaded:

  • word_to_index: dictionary mapping from words to their indices in the vocabulary (400,001 words, with the valid indices ranging from 0 to 400,000)

  • index_to_word: dictionary mapping from indices to their corresponding words in the vocabulary

  • word_to_vec_map: dictionary mapping words to their GloVe vector representation.

Run the following cell to check if it works.

word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])
the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos

Exercise: Implement sentence_to_avg(). You will need to carry out two steps: 1. Convert every sentence to lower-case, then split the sentence into a list of words. X.lower() and X.split() might be useful. 2. For each word in the sentence, access its GloVe representation. Then, average all these values.

# GRADED FUNCTION: sentence_to_avg

def sentence_to_avg(sentence, word_to_vec_map):
    """
    Converts a sentence (string) into a list of words (strings). Extracts the GloVe representation of each word
    and averages its value into a single vector encoding the meaning of the sentence.

    Arguments:
    sentence -- string, one training example from X
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation

    Returns:
    avg -- average vector encoding information about the sentence, numpy-array of shape (50,)
    """

    ### START CODE HERE ###
    # Step 1: Split sentence into list of lower case words (≈ 1 line)
    words = sentence.lower().split()

    # Initialize the average word vector, should have the same shape as your word vectors.
    avg = np.zeros((word_to_vec_map[words[0]].shape))

    # Step 2: average the word vectors. You can loop over the words in the list "words".
    for w in words:
        avg += word_to_vec_map[w]
    avg = avg/len(words)

    ### END CODE HERE ###

    return avg
avg = sentence_to_avg("Morrocan couscous is my favorite dish", word_to_vec_map)
print("avg = ", avg)
avg = [-0.008005 0.56370833 -0.50427333 0.258865 0.55131103 0.03104983
-0.21013718 0.16893933 -0.09590267 0.141784 -0.15708967 0.18525867
0.6495785 0.38371117 0.21102167 0.11301667 0.02613967 0.26037767
0.05820667 -0.01578167 -0.12078833 -0.02471267 0.4128455 0.5152061
0.38756167 -0.898661 -0.535145 0.33501167 0.68806933 -0.2156265
1.797155 0.10476933 -0.36775333 0.750785 0.10282583 0.348925
-0.27262833 0.66768 -0.10706167 -0.283635 0.59580117 0.28747333
-0.3366635 0.23393817 0.34349183 0.178405 0.1166155 -0.076433
0.1445417 0.09808667]

Model

You now have all the pieces to finish implementing the model() function. After using sentence_to_avg() you need to pass the average through forward propagation, compute the cost, and then backpropagate to update the softmax's parameters.

Exercise: Implement the model() function described in Figure (2). Assuming here that YohYohYoh ("Y one hot") is the one-hot encoding of the output labels, the equations you need to implement in the forward pass and to compute the cross-entropy cost are:

z(i)=W.avg(i)+bz^{(i)} = W . avg^{(i)} + bz(i)=W.avg(i)+b
a(i)=softmax(z(i))a^{(i)} = softmax(z^{(i)})a(i)=softmax(z(i))
L(i)=−∑k=0ny−1Yohk(i)∗log(ak(i))\mathcal{L}^{(i)} = - \sum_{k = 0}^{n_y - 1} Yoh^{(i)}_k * log(a^{(i)}_k)L(i)=−k=0∑ny​−1​Yohk(i)​∗log(ak(i)​)

It is possible to come up with a more efficient vectorized implementation. But since we are using a for-loop to convert the sentences one at a time into the avg^{(i)} representation anyway, let's not bother this time.

We provided you a function softmax().

# GRADED FUNCTION: model

def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
    """
    Model to train word vector representations in numpy.

    Arguments:
    X -- input data, numpy array of sentences as strings, of shape (m, 1)
    Y -- labels, numpy array of integers between 0 and 7, numpy-array of shape (m, 1)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    learning_rate -- learning_rate for the stochastic gradient descent algorithm
    num_iterations -- number of iterations

    Returns:
    pred -- vector of predictions, numpy-array of shape (m, 1)
    W -- weight matrix of the softmax layer, of shape (n_y, n_h)
    b -- bias of the softmax layer, of shape (n_y,)
    """

    np.random.seed(1)

    # Define number of training examples
    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 50                                # dimensions of the GloVe vectors 

    # Initialize parameters using Xavier initialization
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))

    # Convert Y to Y_onehot with n_y classes
    Y_oh = convert_to_one_hot(Y, C = n_y) 

    # Optimization loop
    for t in range(num_iterations):                       # Loop over the number of iterations
        for i in range(m):                                # Loop over the training examples

            ### START CODE HERE ### (≈ 4 lines of code)
            # Average the word vectors of the words from the i'th training example
            avg = sentence_to_avg(X[i], word_to_vec_map)

            # Forward propagate the avg through the softmax layer
            z = np.dot(W, avg) + b
            a = softmax(z)

            # Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)
            cost = - np.sum(Y_oh[i]*np.log(a))
            ### END CODE HERE ###

            # Compute gradients 
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz

            # Update parameters with Stochastic Gradient Descent
            W = W - learning_rate * dW
            b = b - learning_rate * db

        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = predict(X, Y, W, b, word_to_vec_map)

    return pred, W, b
print(X_train.shape)
print(Y_train.shape)
print(np.eye(5)[Y_train.reshape(-1)].shape)
print(X_train[0])
print(type(X_train))
Y = np.asarray([5,0,0,5, 4, 4, 4, 6, 6, 4, 1, 1, 5, 6, 6, 3, 6, 3, 4, 4])
print(Y.shape)


X = np.asarray(['I am going to the bar tonight', 'I love you', 'miss you my dear',
'Lets go party and drinks','Congrats on the new job','Congratulations',
'I am so happy for you', 'Why are you feeling bad', 'What is wrong with you',
'You totally deserve this prize', 'Let us go play football',
'Are you down for football this afternoon', 'Work hard play harder',
'It is suprising how people can be dumb sometimes',
'I am very disappointed','It is the best day in my life',
'I think I will end up alone','My life is so boring','Good job',
'Great so awesome'])


print(X.shape)
print(np.eye(5)[Y_train.reshape(-1)].shape)
print(type(X_train))

"""

(132,)
(132,)
(132, 5)
never talk to me again
<class 'numpy.ndarray'>
(20,)
(20,)
(132, 5)
<class 'numpy.ndarray'>
"""

Run the next cell to train your model and learn the softmax parameters (W,b).

pred, W, b = model(X_train, Y_train, word_to_vec_map)
print(pred)
Epoch: 0 --- cost = 1.95204988128
Accuracy: 0.348484848485
Epoch: 100 --- cost = 0.0797181872601
Accuracy: 0.931818181818
Epoch: 200 --- cost = 0.0445636924368
Accuracy: 0.954545454545
Epoch: 300 --- cost = 0.0343226737879
Accuracy: 0.969696969697
[[ 3.]
 [ 2.]
 [ 3.]
 ...
 ...
 ...
 [ 0.]
 [ 2.]]

Great! Your model has pretty high accuracy on the training set. Lets now see how it does on the test set.

1.4 - Examining test set performance

print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)
Training set:
Accuracy: 0.977272727273
Test set:
Accuracy: 0.857142857143

Random guessing would have had 20% accuracy given that there are 5 classes. This is pretty good performance after training on only 132 examples.

In the training set, the algorithm saw the sentence "I love you" with the label ❤️. You can check however that the word "adore" does not appear in the training set. Nonetheless, lets see what happens if you write "I adore you."

X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])


pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)
Accuracy: 0.833333333333
i adore you ❤️
i love you ❤️
funny lol 😄
lets play with a ball ⚾
food is ready 🍴
not feeling happy 😄

Amazing! Because adore has a similar embedding as love, the algorithm has generalized correctly even to a word it has never seen before. Words such as heart, dear, beloved or adore have embedding vectors similar to love, and so might work too---feel free to modify the inputs above and try out a variety of input sentences. How well does it work?

Note though that it doesn't get "not feeling happy" correct. This algorithm ignores word ordering, so is not good at understanding phrases like "not happy."

Printing the confusion matrix can also help understand which classes are more difficult for your model. A confusion matrix shows how often an example whose label is one class ("actual" class) is mislabeled by the algorithm with a different class ("predicted" class).

print(Y_test.shape)
print(' '+ label_to_emoji(0)+ ' ' + label_to_emoji(1) + ' ' + label_to_emoji(2)+ ' ' + label_to_emoji(3)+' ' + label_to_emoji(4))
print(pd.crosstab(Y_test, pred_test.reshape(56,), rownames=['Actual'], colnames=['Predicted'], margins=True))
plot_confusion_matrix(Y_test, pred_test)
(56,)
❤️ ⚾ 😄 😞 🍴
Predicted 0.0 1.0 2.0 3.0 4.0 All
Actual
0 6 0 0 1 0 7
1 0 8 0 0 0 8
2 2 0 16 0 0 18
3 1 1 2 12 0 16
4 0 0 1 0 6 7
All 9 9 19 13 6 56

What you should remember from this part:

  • Even with a 132 training examples, you can get a reasonably good model for Emojifying. This is due to the generalization power word vectors gives you.

  • Emojify-V1 will perform poorly on sentences such as "This movie is not good and not enjoyable" because it doesn't understand combinations of words--it just averages all the words' embedding vectors together, without paying attention to the ordering of words. You will build a better algorithm in the next part.

2 - Emojifier-V2: Using LSTMs in Keras:

Let's build an LSTM model that takes as input word sequences. This model will be able to take word ordering into account. Emojifier-V2 will continue to use pre-trained word embeddings to represent words, but will feed them into an LSTM, whose job it is to predict the most appropriate emoji.

Run the following cell to load the Keras packages.

import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)

Using TensorFlow backend.

2.1 - Overview of the model

Here is the Emojifier-v2 you will implement:

Figure 3: Emojifier-V2. A 2-layer LSTM sequence classifier.

2.2 Keras and mini-batching

In this exercise, we want to train Keras using mini-batches. However, most deep learning frameworks require that all sequences in the same mini-batch have the same length. This is what allows vectorization to work: If you had a 3-word sentence and a 4-word sentence, then the computations needed for them are different (one takes 3 steps of an LSTM, one takes 4 steps) so it's just not possible to do them both at the same time.

The common solution to this is to use padding. Specifically, set a maximum sequence length, and pad all sequences to the same length. For example, of the maximum sequence length is 20, we could pad every sentence with "0"s so that each input sentence is of length 20. Thus, a sentence "i love you" would be represented as (ei,elove,eyou,0⃗,0⃗,…,0⃗)(e_{i}, e_{love}, e_{you}, \vec{0}, \vec{0}, \ldots, \vec{0})(ei​,elove​,eyou​,0,0,…,0). In this example, any sentences longer than 20 words would have to be truncated. One simple way to choose the maximum sequence length is to just pick the length of the longest sentence in the training set.

2.3 - The Embedding layer

The Embedding() layer takes an integer matrix of size (batch size, max input length) as input. This corresponds to sentences converted into lists of indices (integers), as shown in the figure below.

Figure 4: Embedding layer.

This example shows the propagation of two examples through the embedding layer. Both have been zero-padded to a length of max_len=5. The final dimension of the representation is (2,max_len,50) because the word embeddings we are using are 50 dimensional.

The largest integer (i.e. word index) in the input should be no larger than the vocabulary size. The layer outputs an array of shape (batch size, max input length, dimension of word vectors).

The first step is to convert all your training sentences into lists of indices, and then zero-pad all these lists so that their length is the length of the longest sentence.

Exercise: Implement the function below to convert X (array of sentences as strings) into an array of indices corresponding to words in the sentences. The output shape should be such that it can be given to Embedding() (described in Figure 4).

# GRADED FUNCTION: sentences_to_indices

def sentences_to_indices(X, word_to_index, max_len):
    """
    Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.
    The output shape should be such that it can be given to `Embedding()` (described in Figure 4). 

    Arguments:
    X -- array of sentences (strings), of shape (m, 1)
    word_to_index -- a dictionary containing the each word mapped to its index
    max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 

    Returns:
    X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
    """

    m = X.shape[0]                                   # number of training examples

    ### START CODE HERE ###
    # Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)
    X_indices = np.zeros((m, max_len))

    for i in range(m):                               # loop over training examples

        # Convert the ith training sentence in lower case and split is into words. You should get a list of words.
        sentence_words = X[i].lower().split()

        # Initialize j to 0
        j = 0

        # Loop over the words of sentence_words
        for w in sentence_words:
            # Set the (i,j)th entry of X_indices to the index of the correct word.
            X_indices[i, j] = word_to_index[w]
            # Increment j to j + 1
            j = j + 1

    ### END CODE HERE ###

    return X_indices

Run the following cell to check what sentences_to_indices() does, and check your results.

X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"])
X1_indices = sentences_to_indices(X1,word_to_index, max_len = 5)
print("X1 =", X1)
print("X1_indices =", X1_indices)
X1 = ['funny lol' 'lets play baseball' 'food is ready for you']
X1_indices = [[ 155345. 225122. 0. 0. 0.]
              [ 220930. 286375. 69714. 0. 0.]
              [ 151204. 192973. 302254. 151349. 394475.]]

Let's build the Embedding() layer in Keras, using pre-trained word vectors. After this layer is built, you will pass the output of sentences_to_indices() to it as an input, and the Embedding() layer will return the word embeddings for a sentence.

# GRADED FUNCTION: pretrained_embedding_layer

def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.

    Arguments:
    word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    embedding_layer -- pretrained layer Keras instance
    """

    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)

    ### START CODE HERE ###
    # Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
    emb_matrix = np.zeros((vocab_len, emb_dim))

    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map[word]

    # Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False. 
    embedding_layer = Embedding(vocab_len, emb_dim, trainable = False)
    ### END CODE HERE ###

    # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
    embedding_layer.build((None,))

    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])

    return embedding_layer
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
print("weights[0][1][3] =", embedding_layer.get_weights()[0][1][3])
weights[0][1][3] = -0.3403

2.3 Building the Emojifier-V2

Lets now build the Emojifier-V2 model. You will do so using the embedding layer you have built, and feed its output to an LSTM network.

Figure 3: Emojifier-v2. A 2-layer LSTM sequence classifier.

# GRADED FUNCTION: Emojify_V2

def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.

    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """

    ### START CODE HERE ###
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(shape = input_shape, dtype = 'int32')

    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)

    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)   

    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences.
    # return_sequences : 是返回输出序列中的最后一个输出,还是返回完整序列
    X = LSTM(128, return_sequences = True)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    X = LSTM(128)(X)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(5)(X)
    # Add a softmax activation
    X = Activation('softmax')(X)

    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs = sentence_indices, outputs  = X)

    ### END CODE HERE ###

    return model

Run the following cell to create your model and check its summary. Because all sentences in the dataset are less than 10 words, we chose max_len = 10. You should see your architecture, it uses "20,223,927" parameters, of which 20,000,050 (the word embeddings) are non-trainable, and the remaining 223,877 are. Because our vocabulary size has 400,001 words (with valid indices from 0 to 400,000) there are 400,001*50 = 20,000,050 non-trainable parameters.

model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 10) 0
_________________________________________________________________
embedding_3 (Embedding) (None, 10, 50) 20000050
_________________________________________________________________
lstm_3 (LSTM) (None, 10, 128) 91648
_________________________________________________________________
dropout_3 (Dropout) (None, 10, 128) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_4 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 5) 645
_________________________________________________________________
activation_2 (Activation) (None, 5) 0
=================================================================
Total params: 20,223,927
Trainable params: 223,877
Non-trainable params: 20,000,050
_________________________________________________________________

As usual, after creating your model in Keras, you need to compile it and define what loss, optimizer and metrics your are want to use. Compile your model using categorical_crossentropy loss, adam optimizer and ['accuracy'] metrics:

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

It's time to train your model. Your Emojifier-V2 model takes as input an array of shape (m, max_len) and outputs probability vectors of shape (m, number of classes). We thus have to convert X_train (array of sentences as strings) to X_train_indices (array of sentences as list of word indices), and Y_train (labels as indices) to Y_train_oh (labels as one-hot vectors).

X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)

Fit the Keras model on X_train_indices and Y_train_oh. We will use epochs = 50 and batch_size = 32.

model.fit(X_train_indices, Y_train_oh, epochs = 150, batch_size = 32, shuffle=True)
Epoch 1/50
132/132 [==============================] - 0s - loss: 1.5990 - acc: 0.2652
Epoch 2/50
132/132 [==============================] - 0s - loss: 1.5248 - acc: 0.3636

...
...
...
Epoch 149/150
132/132 [==============================] - 0s - loss: 3.0110e-04 - acc: 1.0000     
Epoch 150/150
132/132 [==============================] - 0s - loss: 3.0041e-04 - acc: 1.0000    


<keras.callbacks.History at 0x7fe4bb3136a0>

Your model should perform close to 100% accuracy on the training set. The exact accuracy you get may be a little different. Run the following cell to evaluate your model on the test set.

X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print()
print("Test accuracy = ", acc)
32/56 [================>.............] - ETA: 0s
Test accuracy = 0.839285705771

You should get a test accuracy between 80% and 95%. Run the cell below to see the mislabelled examples.

# This code allows you to see the mislabelled examples
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
print('y_test_oh.shape:',y_test_oh.shape)
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
print('X_test_indices.shape:',X_test_indices.shape)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):
x = X_test_indices
num = np.argmax(pred[i])
if(num != Y_test[i]):
print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())
y_test_oh.shape: (56, 5)
X_test_indices.shape: (56, 10)
Expected emoji:😞 prediction: work is hard    😄
Expected emoji:😞 prediction: This girl is messing with me    ❤️
Expected emoji:😞 prediction: work is horrible    😄
Expected emoji:😄 prediction: you brighten my day    ❤️
Expected emoji:😞 prediction: she is a bully    ❤️
Expected emoji:😄 prediction: will you be my valentine    ❤️
Expected emoji:⚾ prediction: what is your favorite baseball game    😄
Expected emoji:😞 prediction: go away    ⚾
Expected emoji:😞 prediction: yesterday we lost again    ⚾

Now you can try it on your own example. Write your own sentence below.

# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.
x_test = np.array(['not feeling happy'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+ label_to_emoji(np.argmax(model.predict(X_test_indices))))
not feeling happy 😞

Previously, Emojify-V1 model did not correctly label "not feeling happy," but our implementation of Emojiy-V2 got it right. (Keras' outputs are slightly random each time, so you may not have obtained the same result.) The current model still isn't very robust at understanding negation (like "not happy") because the training set is small and so doesn't have a lot of examples of negation. But if the training set were larger, the LSTM model would be much better than the Emojify-V1 model at understanding such complex sentences.

Congratulations!

You have completed this notebook! ❤️❤️❤️

What you should remember:

  • If you have an NLP task where the training set is small, using word embeddings can help your algorithm significantly. Word embeddings allow your model to work on words in the test set that may not even have appeared in your training set.

  • Training sequence models in Keras (and in most other deep learning frameworks) requires a few important details:

  • To use mini-batches, the sequences need to be padded so that all the examples in a mini-batch have the same length.

  • An Embedding() layer can be initialized with pretrained values. These values can be either fixed or trained further on your dataset. If however your labeled dataset is small, it's usually not worth trying to train a large pre-trained set of embeddings.

  • LSTM() has a flag called return_sequences to decide if you would like to return every hidden states or only the last one.

  • You can use Dropout() right after LSTM() to regularize your network.

Congratulations on finishing this assignment and building an Emojifier. We hope you're happy with what you've accomplished in this notebook!

😀😀😀😀😀😀

Acknowledgments

Previousw2v_utils.pyNextemo_utils.py

Last updated 6 years ago

Was this helpful?

In Keras, the embedding matrix is represented as a "layer", and maps positive integers (indices corresponding to words) into dense vectors of fixed size (the embedding vectors). It can be trained or initialized with a pretrained embedding. In this part, you will learn how to create an layer in Keras, initialize it with the GloVe 50-dimensional vectors loaded earlier in the notebook. Because our training set is quite small, we will not update the word embeddings but will instead leave their values fixed. But in the code below, we'll show you how Keras allows you to either train or leave fixed this layer.

Exercise: Implement pretrained_embedding_layer(). You will need to carry out the following steps: 1. Initialize the embedding matrix as a numpy array of zeroes with the correct shape. 2. Fill in the embedding matrix with all the word embeddings extracted from word_to_vec_map. 3. Define Keras embedding layer. Use . Be sure to make this layer non-trainable, by setting trainable = False when calling Embedding(). If you were to set trainable = True, then it will allow the optimization algorithm to modify the values of the word embeddings. 4. Set the embedding weights to be equal to the embedding matrix

Exercise: Implement Emojify_V2(), which builds a Keras graph of the architecture shown in Figure 3. The model takes as input an array of sentences of shape (m, max_len, ) defined by input_shape. It should output a softmax probability vector of shape (m, C = 5). You may need Input(shape = ..., dtype = '...'), , , , and .

Thanks to Alison Darcy and the Woebot team for their advice on the creation of this assignment. Woebot is a chatbot friend that is ready to speak with you 24/7. As part of Woebot's technology, it uses word embeddings to understand the emotions of what you say. You can play with it by going to

Embedding()
Embedding()
LSTM()
Dropout()
Dense()
Activation()
http://woebot.io
png