Wednesday, May 15, 2019

Machine Learning Model for Wine DataSet

Untitled

Machine Learning Model for Wine DataSet

Results 98.14 %

In [79]:
skplt.metrics.plot_confusion_matrix(Y_Test, prediction)
Out[79]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a33958be0>

Import the Library

In [75]:
from sklearn.datasets import load_wine
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix,classification_report
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import pandas as pd
import numpy as np
import scikitplot as skplt
In [74]:
!pip install scikit-plot
Collecting scikit-plot
  Downloading https://files.pythonhosted.org/packages/7c/47/32520e259340c140a4ad27c1b97050dd3254fdc517b1d59974d47037510e/scikit_plot-0.3.7-py3-none-any.whl
Requirement already satisfied: scikit-learn>=0.18 in /anaconda3/lib/python3.7/site-packages (from scikit-plot) (0.20.3)
Collecting joblib>=0.10 (from scikit-plot)
  Downloading https://files.pythonhosted.org/packages/cd/c1/50a758e8247561e58cb87305b1e90b171b8c767b15b12a1734001f41d356/joblib-0.13.2-py2.py3-none-any.whl (278kB)
    100% |████████████████████████████████| 286kB 8.4MB/s eta 0:00:01
Requirement already satisfied: matplotlib>=1.4.0 in /anaconda3/lib/python3.7/site-packages (from scikit-plot) (3.0.3)
Requirement already satisfied: scipy>=0.9 in /anaconda3/lib/python3.7/site-packages (from scikit-plot) (1.2.1)
Requirement already satisfied: numpy>=1.8.2 in /anaconda3/lib/python3.7/site-packages (from scikit-learn>=0.18->scikit-plot) (1.16.2)
Requirement already satisfied: cycler>=0.10 in /anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->scikit-plot) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->scikit-plot) (1.0.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->scikit-plot) (2.3.1)
Requirement already satisfied: python-dateutil>=2.1 in /anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->scikit-plot) (2.8.0)
Requirement already satisfied: six in /anaconda3/lib/python3.7/site-packages (from cycler>=0.10->matplotlib>=1.4.0->scikit-plot) (1.12.0)
Requirement already satisfied: setuptools in /anaconda3/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=1.4.0->scikit-plot) (40.8.0)
Installing collected packages: joblib, scikit-plot
Successfully installed joblib-0.13.2 scikit-plot-0.3.7
In [ ]:
 

Load the Dataset

In [2]:
wine_data = load_wine()
In [3]:
type(wine_data)
Out[3]:
sklearn.utils.Bunch
In [4]:
wine_data.keys()
Out[4]:
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])
In [5]:
print(wine_data["DESCR"])
.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
   - Alcohol
   - Malic acid
   - Ash
  - Alcalinity of ash  
   - Magnesium
  - Total phenols
   - Flavanoids
   - Nonflavanoid phenols
   - Proanthocyanins
  - Color intensity
   - Hue
   - OD280/OD315 of diluted wines
   - Proline

    - class:
            - class_0
            - class_1
            - class_2
  
    :Summary Statistics:
    
    ============================= ==== ===== ======= =====
                                   Min   Max   Mean     SD
    ============================= ==== ===== ======= =====
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0.98  3.88    2.29  0.63
    Flavanoids:                   0.34  5.08    2.03  1.00
    Nonflavanoid Phenols:         0.13  0.66    0.36  0.12
    Proanthocyanins:              0.41  3.58    1.59  0.57
    Colour Intensity:              1.3  13.0     5.1   2.3
    Hue:                          0.48  1.71    0.96  0.23
    OD280/OD315 of diluted wines: 1.27  4.00    2.61  0.71
    Proline:                       278  1680     746   315
    ============================= ==== ===== ======= =====

    :Missing Attribute Values: None
    :Class Distribution: class_0 (59), class_1 (71), class_2 (48)
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML Wine recognition datasets.
https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

The data is the results of a chemical analysis of wines grown in the same
region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of
wine.

Original Owners: 

Forina, M. et al, PARVUS - 
An Extendible Package for Data Exploration, Classification and Correlation. 
Institute of Pharmaceutical and Food Analysis and Technologies,
Via Brigata Salerno, 16147 Genoa, Italy.

Citation:

Lichman, M. (2013). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
School of Information and Computer Science. 

.. topic:: References

  (1) S. Aeberhard, D. Coomans and O. de Vel, 
  Comparison of Classifiers in High Dimensional Settings, 
  Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of  
  Mathematics and Statistics, James Cook University of North Queensland. 
  (Also submitted to Technometrics). 

  The data was used with many others for comparing various 
  classifiers. The classes are separable, though only RDA 
  has achieved 100% correct classification. 
  (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) 
  (All results using the leave-one-out technique) 

  (2) S. Aeberhard, D. Coomans and O. de Vel, 
  "THE CLASSIFICATION PERFORMANCE OF RDA" 
  Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of 
  Mathematics and Statistics, James Cook University of North Queensland. 
  (Also submitted to Journal of Chemometrics).

In [49]:
label = ["Alcohol","Malic Acid","Ash","Alcalinity of Ash","Magnesium","Total Phenols","Flavanoids",
         "Nonflavanoid Phenols", "Proanthocyanins","Colour Intensity","Hue","OD280/OD315 of diluted wines",
        "Proline"]

Get the X_Data

In [6]:
feat_data = wine_data["data"]

Get the Y_Data

In [7]:
label = wine_data["target"]

Create Train and Test Data using Train Test Split

In [8]:
X_Train, X_Test, Y_Train, Y_Test = train_test_split(feat_data, label, test_size=0.3,random_state=101)

Create a Min Max Scalar to Scale the Dataset

In [9]:
scalar = MinMaxScaler()

Scale the X_Data

In [10]:
scaled_X_Data = scalar.fit_transform(X_Train)

Scale the Y_ Data

In [11]:
scaled_X_Test = scalar.fit_transform(X_Test)

Check the Shape of Data set

In [12]:
scaled_X_Data.shape
Out[12]:
(124, 13)
In [13]:
scaled_X_Test.shape
Out[13]:
(54, 13)

Create a Feature Colummn

In [14]:
feat_cols = [tf.feature_column.numeric_column('x',shape=[13])]

Create Model

In [21]:
model = tf.estimator.DNNClassifier(
                                    feature_columns=feat_cols,
                                    hidden_units=[13,13,13],
                                    optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.01),
                                    n_classes=3,
                                    dropout=None,
                                    batch_norm=False
)
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1a32b9d7b8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Create a input Function

In [22]:
input_func = tf.estimator.inputs.numpy_input_fn({'x':scaled_X_Data},
                                               y=Y_Train,
                                               shuffle=True,
                                               batch_size=10,
                                               num_epochs=100)

Create a Test input Function

In [23]:
test_func = tf.estimator.inputs.numpy_input_fn({'x':scaled_X_Test},
                                               y=Y_Test,
                                               shuffle=False,
                                               batch_size=10,
                                               num_epochs=1)
In [24]:
model.train(input_fn=input_func,steps=500)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l/model.ckpt.
INFO:tensorflow:loss = 11.588047, step = 1
INFO:tensorflow:global_step/sec: 659.638
INFO:tensorflow:loss = 2.1665013, step = 101 (0.153 sec)
INFO:tensorflow:global_step/sec: 1021.02
INFO:tensorflow:loss = 6.5470495, step = 201 (0.098 sec)
INFO:tensorflow:global_step/sec: 1008.66
INFO:tensorflow:loss = 1.2689465, step = 301 (0.099 sec)
INFO:tensorflow:global_step/sec: 997.743
INFO:tensorflow:loss = 0.32332096, step = 401 (0.100 sec)
INFO:tensorflow:Saving checkpoints for 500 into /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l/model.ckpt.
INFO:tensorflow:Loss for final step: 0.054181002.
Out[24]:
<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x1a32b9d6a0>

Test the Data

In [35]:
preds = list(model.predict(input_fn=test_func))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l/model.ckpt-500
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
In [37]:
prediction = [p["class_ids"][0] for p in preds]
In [ ]:
label = []
In [41]:
data = classification_report(Y_Test,prediction,label)
In [54]:
print(data)
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.96      1.00      0.98        22
           2       1.00      0.92      0.96        13

   micro avg       0.98      0.98      0.98        54
   macro avg       0.99      0.97      0.98        54
weighted avg       0.98      0.98      0.98        54

In [57]:
conmat = confusion_matrix(Y_Test,prediction)
In [61]:
df = pd.DataFrame(data=conmat)
df
Out[61]:
0 1 2
0 19 0 0
1 0 22 0
2 0 1 12

Confusion Matrix

In [78]:
skplt.metrics.plot_confusion_matrix(Y_Test, prediction)
Out[78]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a338c1a58>

Accuracy is 98.14

No comments:

Post a Comment

Learn How to configure your Spark Session to Join Managed (S3 Table Buckets) and Unmanaged Iceberg Tables | Hands on Labs

test-tble-bucket-joins Learn How to configure your Spark Session to Join Managed (S...