Choosing Correct K Value for kneighbors Clustering Algorithm¶
Soumil Nitin Shah¶
Bachelor in Electronic Engineering | Masters in Electrical Engineering | Master in Computer Engineering |
- Website : https://soumilshah.herokuapp.com
- Github: https://github.com/soumilshah1995
- Linkedin: https://www.linkedin.com/in/shah-soumil/
- Blog: https://soumilshah1995.blogspot.com/
- Youtube : https://www.youtube.com/channel/UC_eOodxvwS_H7x2uLQa-svw?view_as=subscriber
- Facebook Page : https://www.facebook.com/soumilshah1995/
- Email : shahsoumil519@gmail.com
- projects : https://soumilshah.herokuapp.com/project
Hello! I’m Soumil Nitin Shah, a Software and Hardware Developer based in New York City. I have completed by Bachelor in Electronic Engineering and my Double master’s in Computer and Electrical Engineering. I Develop Python Based Cross Platform Desktop Application , Webpages , Software, REST API, Database and much more I have more than 2 Years of Experience in Pythonm
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
import scikitplot as skplt
%matplotlib inline
df = pd.read_csv('KNN_Project_Data')
df.head(2)
df = pd.read_csv('KNN_Project_Data')
X_Data = df[['XVPM', 'GWYH', 'TRAT', 'TLLZ', 'IGGA', 'HYKR', 'EDFS', 'GUUB', 'MGJM','JHZC']]
Y_Data = df ['TARGET CLASS']
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X_Data, Y_Data, test_size=0.4, random_state=101)
Choose K values¶
errorrate = []
for i in range(1,50):
newmodel = KNeighborsClassifier(n_neighbors = i)
newmodel.fit(X_Train, Y_Train)
pred = newmodel.predict(X_Test)
errorrate.append(np.mean(pred != Y_Test))
plt.figure(figsize=(10,6))
plt.plot(range(1, 50), errorrate, color='blue', linestyle='dashed', marker='o', markerfacecolor='red', markersize=10)
plt.title('Error Rate vs. K Value')
plt.xlabel('K')
plt.ylabel('Error Rate')
Create a model¶
model = KNeighborsClassifier(n_neighbors=58)
model.fit(X_Train, Y_Train)
pred = model.predict(X_Test)
pred
confusion_matrix(Y_Test,pred)
print(classification_report(Y_Test, pred))