Building a Personal AI Assistant: Part 2

Using intent classification & entity extraction to understand natural language - Intent Classification

Published in

Nerd For Tech

5 min readSep 2, 2021

Cover Photo — Photo by Thomas Kolnowski on Unsplash, edited

Welcome back! This is the second part of a mini-series I’m writing on building a personal virtual assistant. I recommend having read the previous article before starting this one because it contains vital code. If you are coming from my previous article, it’s nice to see you again! It’s time to use the dataset which we spent so long preparing and pre-processing. It’s time to build our model.

Transformers

Our intent classification model will be comprised of a pre-trained transformer with a few extra layers tacked on. Transformers are machine learning models which are especially good at understanding natural language. The models develop statistical understandings of language which allow them to understand the meaning behind words. However, the downside to transformers is that they are often huge, both in terms of layers and training data. GPT-2, as an example, used over 40GB of raw training data. Because of this, it’s best to use pre-trained models.

Pre-trained models

Pre-trained transformers, such as BERT, are transformers that have already been trained on a wide variety of tasks. This means that someone else has done most of the training work for us. All we need to do is fine-tune the model so it works optimally for our use case. In this article, I will be using distilBERT as my pre-trained transformer. DistilBERT is a transformer based on the much larger BERT model. By using distilBERT instead of BERT we’ll have a model which runs 60% faster with practically no downsides.

Building our model

Layout for the intent classification transformer.

Our intent classification model will take two inputs, input_ids and input_attention. These inputs were generated by the tokenizer and contain vital information for distilBERT. Both inputs are of size 128, which corresponds to the padding we specified in our tokenizer.

import tensorflow as tf
from transformers import TFDistilBertModel# define the input layers
input_ids_layer = 
tf.keras.layers.Input(
    shape=(128,),           # shape of 128 coresponds to the padding
    name='input_ids',
    dtype='int32',
)input_attention_layer = 
tf.keras.layers.Input(
    shape=(128,),           # shape of 128 coresponds to the padding
    name='input_attention',
    dtype='int32',
)

We can then create the distilbert-base-uncased transformer, the heart of the model. Both input layers are fed into this transformer which results in the last_hidden_state layer. This layer has a size of [batch_size, 128, 768]. DistilBert took in our text input and returned 768 unique vectors for each token.

# create the pre-trained model
transformer = TFDistilBertModel.from_pretrained('distilbert-base-uncased')# feed the inputs into the pre-trained model
# results in a layer of shape (
#    batch_size, 
#    sequence_length, 
#    hidden_size=768)last_hidden_state = transformer([
    input_ids_layer, 
    input_attention_layer])[0]

To make things easier for our classifier, we will only be using the first batch of 768 vectors. It’s not incredibly important that we use the first batch, it’s simply conventional. This layer will then be fed into a dense layer which acts as our output. The output layer applies the softmax function to convert the scalar outputs into probabilities so we can compare how likely each intent classification is.

# the cls token contains a condensed representation of the entire last_hidden_state tensor
cls_token = last_hidden_state[:, 0, :]# create the output layer
intent_output = tf.keras.layers.Dense(
    intent_class_count,
    activation='softmax',
    kernel_initializer=weight_initializer,
    kernel_constraint=None,
    bias_initializer='zeros',
    name='intent_output'
)(cls_token)

Finally, a Keras model can be defined to include all of our layers.

# define the model
model = tf.keras.Model(
    [input_ids_layer, input_attention_layer], 
    [intent_output])print(model.summary())

The last thing to do with the model is to compile it. I have found that the Adam optimizer paired with a Categorical Crossentropy loss function works best. Feel free to experiment with the optimizer, however, it’s best to keep the loss function as-is.

model.compile(
    optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5),
    loss      = tf.keras.losses.CategoricalCrossentropy(),
    metrics   = [tf.keras.metrics.CategoricalAccuracy('categorical_accuracy')])

Training the model

After compiling the intent classification model you should have ended up with a script similar to this one. Running model.fit on our dataset takes approximately forty seconds per epoch in Google Colab with a GPU runtime. Unless many new intents have been added, the model tends to converge after just two epochs.

#* train the model
history = model.fit(
    x = [x_train_ids, x_train_attention],
    y = [y_train_intents],
    epochs = 2,
    batch_size = 16,
    verbose = 1
)

One important thing to note is that no validation data is provided. This is because it’s much easier to see how well the model works through interaction. By interacting with the model and actively looking for mistakes one can add targeted sentence templates to the training dataset.

Results

In order to interact with the model you first need to write a quick interaction script. Since this isn’t directly related to this article’s topic, I suggest that you just copy the final two code blocks from the provided Google Colab. Once in your script, run the entire thing and see how well your intent classifier works.

Interacting with the intent classification model, testing each intent.

If you have any luck then your intent classifier should work first try. If not, here are a few of the problems I encountered.

Can’t import transformers: Install the transformers library with pip. This is also required in a Colab Notebook.

!pip install transformers

Model not training: Make sure that you are using a padding of 128 and that the model’s inputs are of shape (128,).
Long training times: Confirm that your runtime is using a GPU. This can be changed in Runtime>Runtime Type.
Loss low but model interaction is broken: Make sure that in the classify function you are converting the tokenized model_input from a dictionary into an array.

model_input = [
    model_input['input_ids'], 
    model_input['attention_mask']]

And that’s everything! With intent classification done, why not try to add on some more functionality like text parsing or entity extraction? Oh wait — I’m actually working on articles for both of those right now! Make sure to follow me so you get notified when I release those, so you can turn this virtual assistant into one which can rival Siri or Google home!

Thanks for reading my article! Feel free to check out my portfolio, message me on LinkedIn if you have anything to say, or follow me on Medium to get notified when I post another article!

Important links

Colab notebook with code for an interactive intent classification model:
https://colab.research.google.com/drive/10h9idWFH5sromz6Oy41mq1HUirKI3Rb7?usp=sharing
Codebase used to create custom intent classification datasets — documentation included:
https://github.com/Robert-MacWha/NLP-Intent-Classification/tree/Dataset-Generation
Codebase used for intent classification — documentation included:
https://github.com/Robert-MacWha/NLP-Intent-Classification/tree/Intent-Classification