Text Classification P4
Subscribe to Tech with Tim
Saving the Model
Up until this point we have simply been retraining our models every time that we wanted to use them. This is fine for now on our small models that take only a few seconds to train but for larger models this is not realistic. Luckily for us keras provides a very easy way to save our models:
model.save("model.h5") # name it whatever you want but end with .h5
Loading the Model
Now that we have saved a trained model we never need to retrain it! We can simply load a saved model in by using the following. Simply ensure that the .h5 file is in the same directory as your python script.
model = keras.models.load_model("model.h5")
Making Predictions
Now it is time to used our saved model to make predictions. Now this is a little bit harder than it looks because we need to consider the following: – our model accepts integer encoded data – our model needs reviews that are of length 250 words
This means we can’t just pass any string of text into our model. It will need to be reshaped and reformed to meet the criteria above.
Transforming our Data
The data I’ll use for this tutorial will be simple raw text data of a movie review of one of my favorite movies, the lion king. I’m storing this data in a text file called “test.txt” that you can download here.
To start we will need to integer encode the data. We will do this using the following function:
def review_encode(s): encoded = [1] for word in s: if word.lower() in word_index: encoded.append(word_index[word.lower()]) else: encoded.append(2) return encoded
Next we will open our text file, read in each of the reviews (in this case just one) and use the model to predict whether it is positive or negative.
with open("test.txt", encoding="utf-8") as f: for line in f.readlines(): nline = line.replace(",", "").replace(".", "").replace("(", "").replace(")", "").replace(":", "").replace("\"","").strip().split(" ") encode = review_encode(nline) encode = keras.preprocessing.sequence.pad_sequences([encode], value=word_index["<PAD>"], padding="post", maxlen=250) # make the data 250 words long predict = model.predict(encode) print(line) print(encode) print(predict[0])