Choosing Correct K Value for Kmean Clustering Algorithm¶
Soumil Nitin Shah¶
Bachelor in Electronic Engineering | Masters in Electrical Engineering | Master in Computer Engineering |
- Website : https://soumilshah.herokuapp.com
- Github: https://github.com/soumilshah1995
- Linkedin: https://www.linkedin.com/in/shah-soumil/
- Blog: https://soumilshah1995.blogspot.com/
- Youtube : https://www.youtube.com/channel/UC_eOodxvwS_H7x2uLQa-svw?view_as=subscriber
- Facebook Page : https://www.facebook.com/soumilshah1995/
- Email : shahsoumil519@gmail.com
- projects : https://soumilshah.herokuapp.com/project
Hello! I’m Soumil Nitin Shah, a Software and Hardware Developer based in New York City. I have completed by Bachelor in Electronic Engineering and my Double master’s in Computer and Electrical Engineering. I Develop Python Based Cross Platform Desktop Application , Webpages , Software, REST API, Database and much more I have more than 2 Years of Experience in Python
from IPython.display import YouTubeVideo
YouTubeVideo('Q_u3Rak8yyQ')
Step 1:¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
%matplotlib inline
df = pd.read_csv("AEP_hourly.csv")
df.head(2)
# Extract all Data Like Year MOnth Day Time etc
dataset = df
dataset["Month"] = pd.to_datetime(df["Datetime"]).dt.month
dataset["Year"] = pd.to_datetime(df["Datetime"]).dt.year
dataset["Date"] = pd.to_datetime(df["Datetime"]).dt.date
dataset["Time"] = pd.to_datetime(df["Datetime"]).dt.time
dataset["Week"] = pd.to_datetime(df["Datetime"]).dt.week
dataset["Day"] = pd.to_datetime(df["Datetime"]).dt.day_name()
dataset = df.set_index("Datetime")
dataset.index = pd.to_datetime(dataset.index)
dataset.head(1)
NewDataSet = dataset.resample('D').mean()
NewDataSet.head(2)
Create Dataset¶
x = dataset["Time"]
y = dataset["AEP_MW"]
y = y.values.reshape(-1,1)
print(y.shape)
y = dataset["AEP_MW"]
x = dataset["Time"]
y = y.values.reshape(-1,1)
error = []
k = []
for counter, i in enumerate(range(1,15)):
# Unsupervised Learning
kmean = KMeans(n_clusters=i)
kmean.fit(y)
error.append(kmean.inertia_)
k.append(counter)
plt.plot(k, error, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.grid(True)
plt.show()
Create Model with Correct K Values¶
kmean = KMeans(n_clusters=2,init='k-means++',max_iter=400,random_state=101)
y = dataset["AEP_MW"].values.reshape(-1,1)
x = dataset["Time"].values
kmean.fit(y)
kmean.labels_
dataset["Cluster"] = kmean.labels_
# Take First 2000 POINTS
x = dataset[dataset["Cluster"] == 0]["AEP_MW"][0:2000].values
y = dataset[dataset["Cluster"] == 1]["AEP_MW"][0:2000].values
cluster = dataset["Cluster"][0:2000].values
# CHECK THE SHAPE
print(x.shape)
print(y.shape)
print(cluster.shape)
import seaborn as sns
sns.scatterplot(x,y, s=120, hue=cluster)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x,
y,
zs=0,
marker="s",
s=40,
c = cluster.astype(float),
depthshade=True,
label='Cluster 1 and 2')
plt.title("Cluster 1 and Cluster 2")
No comments:
Post a Comment