Thursday, July 18, 2019

Data Analysis Over 10 years of hourly energy consumption using Python Seaborn Pandas

Exp3

Hourly Energy Consumption

Step 1:

Import Library

In [11]:
import pandas as pd
import numpy as np
import seaborn as sns
import os
import datetime
%matplotlib inline

Step 2:

Read the DataSet

In [6]:
df = pd.read_csv("AEP_hourly.csv")
df.head(3)
Out[6]:
Datetime AEP_MW
0 2004-12-31 01:00:00 13478.0
1 2004-12-31 02:00:00 12865.0
2 2004-12-31 03:00:00 12577.0
In [7]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 121273 entries, 0 to 121272
Data columns (total 2 columns):
Datetime    121273 non-null object
AEP_MW      121273 non-null float64
dtypes: float64(1), object(1)
memory usage: 1.9+ MB
In [8]:
df.describe()
Out[8]:
AEP_MW
count 121273.000000
mean 15499.513717
std 2591.399065
min 9581.000000
25% 13630.000000
50% 15310.000000
75% 17200.000000
max 25695.000000

Step 3:

seperate date and time

In [18]:
df["New_Date"] = pd.to_datetime(df["Datetime"]).dt.date
df["New_Time"] = pd.to_datetime(df["Datetime"]).dt.time
In [20]:
df1 = df
df1.head(2)
Out[20]:
Datetime AEP_MW New_Date New_Time
0 2004-12-31 01:00:00 13478.0 2004-12-31 01:00:00
1 2004-12-31 02:00:00 12865.0 2004-12-31 02:00:00

Step 4:

When was the higest Energy Consumption and which Year

Maximum

In [21]:
df1[df1["AEP_MW"] == df["AEP_MW"].max()]
Out[21]:
Datetime AEP_MW New_Date New_Time
30221 2008-10-20 14:00:00 25695.0 2008-10-20 14:00:00

Minimum

In [22]:
df1[df1["AEP_MW"] == df["AEP_MW"].min()]
Out[22]:
Datetime AEP_MW New_Date New_Time
100759 2016-10-02 05:00:00 9581.0 2016-10-02 05:00:00

Conclusion : From Step 4 we can say that Maximum Energy was Consumed during 2016-10-02 at 05:00:00 and it was 9581.0 MW and Minimum was on 2008-10-20 at 14:00:00 and was 25695.0 MW

Step 5:

Plot and Data visualization

In [24]:
sns.distplot(df1["AEP_MW"])
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c2c00b550>

Step 7: Extract Date and Time

In [36]:
df1.head(2)
Out[36]:
Datetime AEP_MW New_Date New_Time
0 2004-12-31 01:00:00 13478.0 2004-12-31 01:00:00
1 2004-12-31 02:00:00 12865.0 2004-12-31 02:00:00
In [38]:
df1["Year"] = pd.DatetimeIndex(df['New_Date']).year
In [39]:
df1.head(2)
Out[39]:
Datetime AEP_MW New_Date New_Time Year
0 2004-12-31 01:00:00 13478.0 2004-12-31 01:00:00 2004
1 2004-12-31 02:00:00 12865.0 2004-12-31 02:00:00 2004

Check how many Years are Unique

In [41]:
df1["Year"].unique()
Out[41]:
array([2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014,
       2015, 2016, 2017, 2018])

This Tell us that there are 10 Unique Year from 2004 to 2018

In [43]:
df1[df1["Year"] == 2013].nunique()
Out[43]:
Datetime    8758
AEP_MW      5601
New_Date     365
New_Time      24
Year           1
dtype: int64

shows the Relationship of Energy vs Year

In [44]:
sns.lineplot(x=df1["Year"],y=df1["AEP_MW"], data=df1)
Out[44]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c32358630>

Regression

In [48]:
sns.jointplot(x=df1["Year"],
              y=df1["AEP_MW"],
              data=df1,
             kind="reg")
Out[48]:
<seaborn.axisgrid.JointGrid at 0x1c2e91dbe0>
In [49]:
sns.jointplot(x=df1["Year"],
              y=df1["AEP_MW"],
              data=df1,
             kind="kde")
Out[49]:
<seaborn.axisgrid.JointGrid at 0x1c2e3e8358>

Let us see the relation between Energy and Time

In [51]:
sns.lineplot(x=df1["New_Time"],y=df1["AEP_MW"], data=df1)
Out[51]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c31c3b748>

No comments:

Post a Comment

Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST endpoint

gluecat Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST e...