Tech With Tim Logo
Go back

KNN P.3 – Implementation

For the official SkLearn KNN documentation click here.

Training a KNN Classifier

Creating a KNN Classifier is almost identical to how we created the linear regression model. The only difference is we can specify how many neighbors to look for as the argument n_neighbors.

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=9)

To train our model we follow precisely the same steps as outlined earlier.

model.fit(x_train, y_train)

And once again to score our model we will do the following.

acc = model.score(x_test, y_test)
print(acc)

Testing Our Model

If we'd like to see how our model is performing on the unique elements of our test data we can do the following.

predicted = model.predict(x_test)
names = ["unacc", "acc", "good", "vgood"]

for x in range(len(predicted)):
    print("Predicted: ", names[predicted[x]], "Data: ", x_test[x], "Actual: ", names[y_test[x]])

# This will display the predicted class, our data and the actual class
# We create a names list so that we can convert our integer predictions into 
# their string representation 

Our output should look like the following. kuva_2023-04-23_172525134.png

Looking at Neighbors

The KNN model has a unique method that allows for us to see the neighbors of a given data point. We can use this information to plot our data and get a better idea of where our model may lack accuracy. We can use model.neighbors to do this.

Note: the .neighbors method takes 2D as input, this means if we want to pass one data point we need surround it with [] so that it is in the right shape. Parameters: The parameters for .neighbors are as follows: data(2D array), # of neighbors(int), distance(True or False) Return: This will return to us an array with the index in our data of each neighbor. If distance=True then it will also return the distance to each neighbor from our data point.

predicted = model.predict(x_test)
names = ["unacc", "acc", "good", "vgood"]

for x in range(len(predicted)):
    print("Predicted: ", names[predicted[x]], "Data: ", x_test[x], "Actual: ", names[y_test[x]])
    # Now we will we see the neighbors of each point in our testing data
    n = model.kneighbors([x_test[x]], 9, True)
    print("N: ", n)

Our output should now be a mess that looks like this. kuva_2023-04-23_172610064.png

##Full Code

import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import linear_model, preprocessing

data = pd.read_csv("car.data")

le = preprocessing.LabelEncoder()
buying = le.fit_transform(list(data["buying"]))
maint = le.fit_transform(list(data["maint"]))
door = le.fit_transform(list(data["door"]))
persons = le.fit_transform(list(data["persons"]))
lug_boot = le.fit_transform(list(data["lug_boot"]))
safety = le.fit_transform(list(data["safety"]))
cls = le.fit_transform(list(data["class"]))

predict = "class"

X = list(zip(buying, maint, door, persons, lug_boot, safety))
y = list(cls)

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1)

model = KNeighborsClassifier(n_neighbors=9)

model.fit(x_train, y_train)
acc = model.score(x_test, y_test)
print(acc)

predicted = model.predict(x_test)
names = ["unacc", "acc", "good", "vgood"]

for x in range(len(predicted)):
    print("Predicted: ", names[predicted[x]], "Data: ", x_test[x], "Actual: ", names[y_test[x]])
    n = model.kneighbors([x_test[x]], 9, True)
    print("N: ", n)
Design & Development by Ibezio Logo