Machine Learning Predict Whether Patient Suffer from diabetes

Diabetes Prediction

Machine Learning Model for diabetes Prediction


Diabetes is a disease in which your blood glucose, or blood sugar, levels are too high. Glucose comes from the foods you eat. Insulin is a hormone that helps the glucose get into your cells to give them energy. With type 1 diabetes, your body does not make insulin. With type 2 diabetes, the more common type, your body does not make or use insulin well. Without enough insulin, the glucose stays in your blood. You can also have prediabetes. This means that your blood sugar is higher than normal but not high enough to be called diabetes. Having prediabetes puts you at a higher risk of getting type 2 diabetes.

Over time, having too much glucose in your blood can cause serious problems. It can damage your eyes, kidneys, and nerves. Diabetes can also cause heart disease, stroke and even the need to remove a limb. Pregnant women can also get diabetes, called gestational diabetes.

Blood tests can show if you have diabetes. One type of test, the A1C, can also check on how you are managing your diabetes. Exercise, weight control and sticking to your meal plan can help control your diabetes. You should also monitor your blood glucose level and take medicine if prescribed.

Hence predictiong Diabetes becomes very important. due to advances in machine Learning Algorithm now we can predict whether a patient is likely to get a Diabetes from following parameters

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Class


Soumil Nitin Shah,

Bachelor in Electronic Engineering ,

Master in Electrical Engineering ,

Master in Computer Engineering

Full Stack Python Developer

In this article i will show you how you can make a machine learning Model to learn and predict whether the person has Diabetes or not

Accuracy obtained for this project was about 78 %

Part 1 Clean DataSet

Step 1: import Library

In [3]:
import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

Step 2: Read the DataSet

In [5]:
df = pd.read_csv("pima-indians-diabetes.csv")
Index(['6', '148', '72', '35', '0', '33.6', '0.627', '50', '1'], dtype='object')

Step 3 : Rename the column

In [12]:
Col_name = ["Pregnancies","Glucose","BloodPressure",
In [14]:
df = pd.read_csv("pima-indians-diabetes.csv",header=0,names=Col_name)
df.head(b b3)
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Class
0 1 85 66 29 0 26.6 0.351 31 0
1 8 183 64 0 0 23.3 0.672 32 1
2 1 89 66 23 94 28.1 0.167 21 0

Step 4: We need to Normalize the Column so our Machine learning can Predict Diabetes for efficiently !

In [15]:
col_norm =['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction']

Step 5: Lets us apply Custom Lambda Function

In [16]:
df1_norm = df[col_norm].apply(lambda x :( (x - x.min()) / (x.max()-x.min()) ) )

Step 6: we have done Normalized the Column

In [17]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction
0 0.058824 0.427136 0.540984 0.292929 0.000000 0.396423 0.116567
1 0.470588 0.919598 0.524590 0.000000 0.000000 0.347243 0.253629
2 0.058824 0.447236 0.540984 0.232323 0.111111 0.418778 0.038002

Part 2 : Prepare Machine Learning Model

Step 1: Create a Feature Column

In [20]:
feat_Pregnancies = tf.feature_column.numeric_column('Pregnancies')

feat_Glucose = tf.feature_column.numeric_column('Glucose')

feat_BloodPressure = tf.feature_column.numeric_column('BloodPressure')

feat_SkinThickness_tricep = tf.feature_column.numeric_column('SkinThickness')

feat_Insulin = tf.feature_column.numeric_column('Insulin')

feat_BMI = tf.feature_column.numeric_column('BMI')

feat_DiabetesPedigreeFunction  = tf.feature_column.numeric_column('DiabetesPedigreeFunction')

We dont have Categorical Column we shall leave that Blank else we would have used Token Bucket Approcah

Step 2: Create a Feature column List

In [21]:
feature_column = [feat_Pregnancies, feat_Glucose, feat_BloodPressure, 
                  feat_SkinThickness_tricep, feat_Insulin, 
                 feat_BMI , feat_DiabetesPedigreeFunction] 

Step 3 :Lets create X_Data and Y_Data

In [25]:
X_Data = df1_norm
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction
0 0.058824 0.427136 0.540984 0.292929 0.0 0.396423 0.116567
1 0.470588 0.919598 0.524590 0.000000 0.0 0.347243 0.253629

Create Y_Data

In [27]:
Y_Data = df["Class"]
0    0
1    1
2    0
Name: Class, dtype: int64

1 = Diabetes and 0 = No Diabetes

In [38]:
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X_Data,Y_Data, test_size=0.3,random_state=101)

Create a Estimator Model

In [39]:
input_func = tf.estimator.inputs.pandas_input_fn(x=X_Train, y=Y_Train,
                                                 batch_size=40,num_epochs =1000, 

LinearClassifier Model

In [40]:
model = tf.estimator.LinearClassifier(feature_columns=feature_column, 
<tensorflow_estimator.python.estimator.canned.linear.LinearClassifier at 0x120b433c8>

Loss Drops from 17 to 25

Testing Neural Network on Test Data

Neural Network has Never seen This Data lets See how it Performs

Create Test Estimator

In [43]:
eval_input_func = tf.estimator.inputs.pandas_input_fn(x=X_Test,
In [36]:
results = model.evaluate(eval_input_func)
In [45]:

Accuracy is 72 %

Approach 2 Deep Dense Neural Network

In [46]:
dnnmodel = tf.estimator.DNNClassifier(hidden_units=[50,50,50],feature_columns=feature_column,n_classes=2)
In [65]:
{'accuracy': 0.77056277,
 'accuracy_baseline': 0.64935064,
 'auc': 0.79971194,
 'auc_precision_recall': 0.6760917,
 'average_loss': 0.52830994,
 'label/mean': 0.35064936,
 'loss': 10.169967,
 'precision': 0.73333335,
 'prediction/mean': 0.3230063,
 'recall': 0.54320985,
 'global_step': 2000}
In [68]:
Accuracy is 77 %


