Machine Learning Model for diabetes Prediction¶
Introduction:¶
Diabetes is a disease in which your blood glucose, or blood sugar, levels are too high. Glucose comes from the foods you eat. Insulin is a hormone that helps the glucose get into your cells to give them energy. With type 1 diabetes, your body does not make insulin. With type 2 diabetes, the more common type, your body does not make or use insulin well. Without enough insulin, the glucose stays in your blood. You can also have prediabetes. This means that your blood sugar is higher than normal but not high enough to be called diabetes. Having prediabetes puts you at a higher risk of getting type 2 diabetes.
Over time, having too much glucose in your blood can cause serious problems. It can damage your eyes, kidneys, and nerves. Diabetes can also cause heart disease, stroke and even the need to remove a limb. Pregnant women can also get diabetes, called gestational diabetes.
Blood tests can show if you have diabetes. One type of test, the A1C, can also check on how you are managing your diabetes. Exercise, weight control and sticking to your meal plan can help control your diabetes. You should also monitor your blood glucose level and take medicine if prescribed.
Hence predictiong Diabetes becomes very important. due to advances in machine Learning Algorithm now we can predict whether a patient is likely to get a Diabetes from following parameters¶
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Class
Author:
Soumil Nitin Shah,
Bachelor in Electronic Engineering ,
Master in Electrical Engineering ,
Master in Computer Engineering
Full Stack Python Developer
In this article i will show you how you can make a machine learning Model to learn and predict whether the person has Diabetes or not
Accuracy obtained for this project was about 78 %
Part 1 Clean DataSet¶
Step 1: import Library¶
import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Step 2: Read the DataSet¶
df = pd.read_csv("pima-indians-diabetes.csv")
df.columns
Step 3 : Rename the column¶
Col_name = ["Pregnancies","Glucose","BloodPressure",
"SkinThickness","Insulin","BMI","DiabetesPedigreeFunction",
"Age","Class"]
df = pd.read_csv("pima-indians-diabetes.csv",header=0,names=Col_name)
df.head(b b3)
Step 4: We need to Normalize the Column so our Machine learning can Predict Diabetes for efficiently !¶
col_norm =['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction']
Step 5: Lets us apply Custom Lambda Function¶
df1_norm = df[col_norm].apply(lambda x :( (x - x.min()) / (x.max()-x.min()) ) )
Step 6: we have done Normalized the Column¶
df1_norm.head(3)
Part 2 : Prepare Machine Learning Model¶
Step 1: Create a Feature Column¶
feat_Pregnancies = tf.feature_column.numeric_column('Pregnancies')
feat_Glucose = tf.feature_column.numeric_column('Glucose')
feat_BloodPressure = tf.feature_column.numeric_column('BloodPressure')
feat_SkinThickness_tricep = tf.feature_column.numeric_column('SkinThickness')
feat_Insulin = tf.feature_column.numeric_column('Insulin')
feat_BMI = tf.feature_column.numeric_column('BMI')
feat_DiabetesPedigreeFunction = tf.feature_column.numeric_column('DiabetesPedigreeFunction')
We dont have Categorical Column we shall leave that Blank else we would have used Token Bucket Approcah¶
Step 2: Create a Feature column List¶
feature_column = [feat_Pregnancies, feat_Glucose, feat_BloodPressure,
feat_SkinThickness_tricep, feat_Insulin,
feat_BMI , feat_DiabetesPedigreeFunction]
Step 3 :Lets create X_Data and Y_Data¶
X_Data = df1_norm
X_Data.head(2)
Create Y_Data¶
Y_Data = df["Class"]
Y_Data.head(3)
1 = Diabetes and 0 = No Diabetes¶
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X_Data,Y_Data, test_size=0.3,random_state=101)
Create a Estimator Model¶
input_func = tf.estimator.inputs.pandas_input_fn(x=X_Train, y=Y_Train,
batch_size=40,num_epochs =1000,
shuffle=True)
LinearClassifier Model¶
model = tf.estimator.LinearClassifier(feature_columns=feature_column,
n_classes=2)
Train Model¶
model.train(input_fn=input_func, steps = 5000)
Loss Drops from 17 to 25¶
Testing Neural Network on Test Data¶
Neural Network has Never seen This Data lets See how it Performs¶
Create Test Estimator¶
eval_input_func = tf.estimator.inputs.pandas_input_fn(x=X_Test,
y=Y_Test,
batch_size=40,
num_epochs=1,
shuffle=False)
results = model.evaluate(eval_input_func)
results["accuracy"]
Accuracy is 72 %¶
Approach 2 Deep Dense Neural Network¶
dnnmodel = tf.estimator.DNNClassifier(hidden_units=[50,50,50],feature_columns=feature_column,n_classes=2)
Create a Estimator API¶
input_func = tf.estimator.inputs.pandas_input_fn(X_Train,
Y_Train,
batch_size=20,
num_epochs=2000,
shuffle=True)
Create DNN Model with 3 Layers 50 Neuron Each¶
dnnmodel = tf.estimator.DNNClassifier(hidden_units=[30, 30, 30],
feature_columns=feature_column,
n_classes=2)
dnnmodel.train(input_fn=input_func,
steps=2000)
Create a Test Function¶
eveal_input_func = tf.estimator.inputs.pandas_input_fn(x=X_Test,
y=Y_Test,
batch_size=20,
num_epochs=1,
shuffle=False)
dnnmodel.evaluate(eveal_input_func)
dnnmodel.evaluate(eveal_input_func)["accuracy"]
nice one soumil. Can you explain what is random state and why it is set to 101
ReplyDeleteREALLY HELPFUL TO ME
ReplyDeleteGood work
ReplyDelete