Friday, May 17, 2019

Create a Linear Classifier Model in 5 Steps using Tensorflow Real World Data Set

Linear Classification Model

Step 1:

import modules

In [4]:
import tensorflow as tf

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix,classification_report


%matplotlib inline

Step 2:

process data

In [5]:
def Data_Process():
    
    """
    This will read the CSV and Normalize the Data and
    Perform Train Test Split and Return
    X_Train, X_Test, Y_Train, Y_Test
    
    """
    # Name for the column  or Features Map
    columns_to_named = ["Pregnancies","Glucose","BloodPressure",
           "SkinThickness","Insulin","BMI","DiabetesPedigreeFunction",
           "Age","Class"]
    
    # Read the Dataset and Rename the Column
    df = pd.read_csv("pima-indians-diabetes.csv",header=0,names=columns_to_named)

    col_norm =['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction']
    
    # Normalization using Custom Lambda Function
    
    df1_norm = df[col_norm].apply(lambda x :( (x - x.min()) / (x.max()-x.min()) ) )
    
    X_Data = df1_norm
    Y_Data = df["Class"]
    
    X_Train, X_Test, Y_Train, Y_Test = train_test_split(X_Data,Y_Data, test_size=0.3,random_state=101)
    
    return X_Train, X_Test, Y_Train, Y_Test

Step 3:

Define Feature Columns

In [6]:
def create_feature_column():
    
    feat_Pregnancies = tf.feature_column.numeric_column('Pregnancies')
    feat_Glucose = tf.feature_column.numeric_column('Glucose')
    feat_BloodPressure = tf.feature_column.numeric_column('BloodPressure')
    feat_SkinThickness_tricep = tf.feature_column.numeric_column('SkinThickness')
    feat_Insulin = tf.feature_column.numeric_column('Insulin')
    feat_BMI = tf.feature_column.numeric_column('BMI')
    feat_DiabetesPedigreeFunction  = tf.feature_column.numeric_column('DiabetesPedigreeFunction')
    
    feature_column = [feat_Pregnancies, feat_Glucose, feat_BloodPressure, 
                  feat_SkinThickness_tricep, feat_Insulin, 
                 feat_BMI , feat_DiabetesPedigreeFunction] 
    
    return feature_column

Create Input Function and Test Function

In [7]:
X_Train, X_Test, Y_Train, Y_Test = Data_Process()
feature_column = create_feature_column()

input_func = tf.estimator.inputs.pandas_input_fn(x=X_Train, y=Y_Train,
                                                 batch_size=40,num_epochs =1000, 
                                                 shuffle=True)

eval_input_func = tf.estimator.inputs.pandas_input_fn(x=X_Test,
                                                      y=Y_Test,
                                                      batch_size=40,
                                                      num_epochs=1,
                                                      shuffle=False)

Step 4 :

Create Linear Classifier Model

In [8]:
model = tf.estimator.LinearClassifier(feature_columns=feature_column, 
                                      n_classes=2)
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp6ydp5d48
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp6ydp5d48', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1a308afe48>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Step 5:

Train

In [10]:
history = model.train(input_fn=input_func, steps = 1000)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp6ydp5d48/model.ckpt-5000
WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 5000 into /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp6ydp5d48/model.ckpt.
INFO:tensorflow:loss = 22.763405, step = 5001
INFO:tensorflow:global_step/sec: 341.279
INFO:tensorflow:loss = 18.86869, step = 5101 (0.294 sec)
INFO:tensorflow:global_step/sec: 277.223
INFO:tensorflow:loss = 20.387632, step = 5201 (0.363 sec)
INFO:tensorflow:global_step/sec: 372.544
INFO:tensorflow:loss = 18.063845, step = 5301 (0.269 sec)
INFO:tensorflow:global_step/sec: 472.739
INFO:tensorflow:loss = 15.5112705, step = 5401 (0.210 sec)
INFO:tensorflow:global_step/sec: 562.262
INFO:tensorflow:loss = 18.075052, step = 5501 (0.180 sec)
INFO:tensorflow:global_step/sec: 562.68
INFO:tensorflow:loss = 21.951363, step = 5601 (0.178 sec)
INFO:tensorflow:global_step/sec: 318.397
INFO:tensorflow:loss = 20.899546, step = 5701 (0.312 sec)
INFO:tensorflow:global_step/sec: 331.633
INFO:tensorflow:loss = 19.533257, step = 5801 (0.301 sec)
INFO:tensorflow:global_step/sec: 428.697
INFO:tensorflow:loss = 21.393614, step = 5901 (0.235 sec)
INFO:tensorflow:Saving checkpoints for 6000 into /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp6ydp5d48/model.ckpt.
INFO:tensorflow:Loss for final step: 19.936054.

Test

In [16]:
results = model.evaluate(eval_input_func)
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-05-17T16:07:31Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp6ydp5d48/model.ckpt-6000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-05-17-16:07:31
INFO:tensorflow:Saving dict for global step 6000: accuracy = 0.74458873, accuracy_baseline = 0.64935064, auc = 0.7916461, auc_precision_recall = 0.6702014, average_loss = 0.52323383, global_step = 6000, label/mean = 0.35064936, loss = 20.144503, precision = 0.7037037, prediction/mean = 0.3514547, recall = 0.4691358
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 6000: /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp6ydp5d48/model.ckpt-6000
In [20]:
results["accuracy"]
Out[20]:
0.74458873

No comments:

Post a Comment

Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST endpoint

gluecat Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST e...