Text Classification P2

Subscribe to Tech With Tim!

Understanding the Model Architecture

Here is the sequence of layers we defined for our text classification model:
- Word Embedding Layer
- GlobalAveragePooling1D
- Dense
- Dense

We are familiar with what the two dense layers are but what the heck is an embedding layer and what does GlobalAveragePooling1D do?

What is Word Embedding

To understand what a word embedding layer and why it is so important we will compare two very simple sentences. First in human readable form and second in integer encoded form:

Human Readable:
Have a great day

Have a good day

Integer encoded:
[0, 1, 2, 3]

[0, 1, 4, 3]

Mappings: {0: "Have", 1: "a", 2: "great", 3: "day", 4: "good"}

Looking at the sentences above, we as humans know that the two sentences are very similar and pretty well mean the same thing. However when we look at the integer encoded version all we can tell is that the words at index 2 (position 3) are different. We have no idea how different they are.

This is where a word embedding layer comes in. We want a way to determine not only the contents of a sentence but the context of the sentence. A word embedding layer will attempt to determine the meaning of each word in the sentence by mapping each word to a position in vector space. If you don't care about the math or don't understand it think of it as just grouping similar words together.

An example of something we'd hope an embedding layer would do for us:

Maybe "good", "great", "fantastic" and "awesome" are placed close to each other and words like "bad", "horrible", "sucks" are placed close together. We'd also hope that these groupings of words are placed far apart from each other representing that they have very different meanings.

GlobalPooling1D Layer

This layer is nothing special and simply scales down our data's dimension to make it easier computationally for our model in the later layers. Because our word embedding layer maps thousands and thousands of words to a location in vector space they usually do this in a high dimensional vector space. This means when we get our word vectors from the embedding layer they have multiple dimensions and can be scaled down by this layer.

Dense Layers

The last two layers in our network are dense fully connected layers. The output layer is one neuron that uses the sigmoid function to get a value between a 0 and a 1 which will represent the likelihood of the review being positive or negative. The layer before that contains 16 neurons with a relu activation function designed to find patterns between different words present in the review.