Machine Learning Model for Wine DataSet¶

Results 98.14 %¶

skplt.metrics.plot_confusion_matrix(Y_Test, prediction)

<matplotlib.axes._subplots.AxesSubplot at 0x1a33958be0>

Import the Library¶

from sklearn.datasets import load_wine
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix,classification_report
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import pandas as pd
import numpy as np
import scikitplot as skplt

!pip install scikit-plot

Collecting scikit-plot
  Downloading https://files.pythonhosted.org/packages/7c/47/32520e259340c140a4ad27c1b97050dd3254fdc517b1d59974d47037510e/scikit_plot-0.3.7-py3-none-any.whl
Requirement already satisfied: scikit-learn>=0.18 in /anaconda3/lib/python3.7/site-packages (from scikit-plot) (0.20.3)
Collecting joblib>=0.10 (from scikit-plot)
  Downloading https://files.pythonhosted.org/packages/cd/c1/50a758e8247561e58cb87305b1e90b171b8c767b15b12a1734001f41d356/joblib-0.13.2-py2.py3-none-any.whl (278kB)
    100% |████████████████████████████████| 286kB 8.4MB/s eta 0:00:01
Requirement already satisfied: matplotlib>=1.4.0 in /anaconda3/lib/python3.7/site-packages (from scikit-plot) (3.0.3)
Requirement already satisfied: scipy>=0.9 in /anaconda3/lib/python3.7/site-packages (from scikit-plot) (1.2.1)
Requirement already satisfied: numpy>=1.8.2 in /anaconda3/lib/python3.7/site-packages (from scikit-learn>=0.18->scikit-plot) (1.16.2)
Requirement already satisfied: cycler>=0.10 in /anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->scikit-plot) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->scikit-plot) (1.0.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->scikit-plot) (2.3.1)
Requirement already satisfied: python-dateutil>=2.1 in /anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->scikit-plot) (2.8.0)
Requirement already satisfied: six in /anaconda3/lib/python3.7/site-packages (from cycler>=0.10->matplotlib>=1.4.0->scikit-plot) (1.12.0)
Requirement already satisfied: setuptools in /anaconda3/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=1.4.0->scikit-plot) (40.8.0)
Installing collected packages: joblib, scikit-plot
Successfully installed joblib-0.13.2 scikit-plot-0.3.7

Load the Dataset¶

wine_data = load_wine()

type(wine_data)

sklearn.utils.Bunch

wine_data.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

print(wine_data["DESCR"])

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
   - Alcohol
   - Malic acid
   - Ash
  - Alcalinity of ash  
   - Magnesium
  - Total phenols
   - Flavanoids
   - Nonflavanoid phenols
   - Proanthocyanins
  - Color intensity
   - Hue
   - OD280/OD315 of diluted wines
   - Proline

    - class:
            - class_0
            - class_1
            - class_2
  
    :Summary Statistics:
    
    ============================= ==== ===== ======= =====
                                   Min   Max   Mean     SD
    ============================= ==== ===== ======= =====
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0.98  3.88    2.29  0.63
    Flavanoids:                   0.34  5.08    2.03  1.00
    Nonflavanoid Phenols:         0.13  0.66    0.36  0.12
    Proanthocyanins:              0.41  3.58    1.59  0.57
    Colour Intensity:              1.3  13.0     5.1   2.3
    Hue:                          0.48  1.71    0.96  0.23
    OD280/OD315 of diluted wines: 1.27  4.00    2.61  0.71
    Proline:                       278  1680     746   315
    ============================= ==== ===== ======= =====

    :Missing Attribute Values: None
    :Class Distribution: class_0 (59), class_1 (71), class_2 (48)
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML Wine recognition datasets.
https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

The data is the results of a chemical analysis of wines grown in the same
region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of
wine.

Original Owners: 

Forina, M. et al, PARVUS - 
An Extendible Package for Data Exploration, Classification and Correlation. 
Institute of Pharmaceutical and Food Analysis and Technologies,
Via Brigata Salerno, 16147 Genoa, Italy.

Citation:

Lichman, M. (2013). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
School of Information and Computer Science. 

.. topic:: References

  (1) S. Aeberhard, D. Coomans and O. de Vel, 
  Comparison of Classifiers in High Dimensional Settings, 
  Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of  
  Mathematics and Statistics, James Cook University of North Queensland. 
  (Also submitted to Technometrics). 

  The data was used with many others for comparing various 
  classifiers. The classes are separable, though only RDA 
  has achieved 100% correct classification. 
  (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) 
  (All results using the leave-one-out technique) 

  (2) S. Aeberhard, D. Coomans and O. de Vel, 
  "THE CLASSIFICATION PERFORMANCE OF RDA" 
  Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of 
  Mathematics and Statistics, James Cook University of North Queensland. 
  (Also submitted to Journal of Chemometrics).

label = ["Alcohol","Malic Acid","Ash","Alcalinity of Ash","Magnesium","Total Phenols","Flavanoids",
         "Nonflavanoid Phenols", "Proanthocyanins","Colour Intensity","Hue","OD280/OD315 of diluted wines",
        "Proline"]

Get the X_Data¶

feat_data = wine_data["data"]

Get the Y_Data¶

label = wine_data["target"]

Create Train and Test Data using Train Test Split¶

X_Train, X_Test, Y_Train, Y_Test = train_test_split(feat_data, label, test_size=0.3,random_state=101)

Create a Min Max Scalar to Scale the Dataset¶

scalar = MinMaxScaler()

Scale the X_Data¶

scaled_X_Data = scalar.fit_transform(X_Train)

Scale the Y_ Data¶

scaled_X_Test = scalar.fit_transform(X_Test)

Check the Shape of Data set¶

scaled_X_Data.shape

(124, 13)

scaled_X_Test.shape

(54, 13)

Create a Feature Colummn¶

feat_cols = [tf.feature_column.numeric_column('x',shape=[13])]

Create Model¶

model = tf.estimator.DNNClassifier(
                                    feature_columns=feat_cols,
                                    hidden_units=[13,13,13],
                                    optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.01),
                                    n_classes=3,
                                    dropout=None,
                                    batch_norm=False
)

INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1a32b9d7b8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Create a input Function¶

input_func = tf.estimator.inputs.numpy_input_fn({'x':scaled_X_Data},
                                               y=Y_Train,
                                               shuffle=True,
                                               batch_size=10,
                                               num_epochs=100)

Create a Test input Function¶

test_func = tf.estimator.inputs.numpy_input_fn({'x':scaled_X_Test},
                                               y=Y_Test,
                                               shuffle=False,
                                               batch_size=10,
                                               num_epochs=1)

model.train(input_fn=input_func,steps=500)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l/model.ckpt.
INFO:tensorflow:loss = 11.588047, step = 1
INFO:tensorflow:global_step/sec: 659.638
INFO:tensorflow:loss = 2.1665013, step = 101 (0.153 sec)
INFO:tensorflow:global_step/sec: 1021.02
INFO:tensorflow:loss = 6.5470495, step = 201 (0.098 sec)
INFO:tensorflow:global_step/sec: 1008.66
INFO:tensorflow:loss = 1.2689465, step = 301 (0.099 sec)
INFO:tensorflow:global_step/sec: 997.743
INFO:tensorflow:loss = 0.32332096, step = 401 (0.100 sec)
INFO:tensorflow:Saving checkpoints for 500 into /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l/model.ckpt.
INFO:tensorflow:Loss for final step: 0.054181002.

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x1a32b9d6a0>

Test the Data¶

preds = list(model.predict(input_fn=test_func))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/yh/7gktt0ls0fj77fnrs694ht6m0000gn/T/tmp183_az3l/model.ckpt-500
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

prediction = [p["class_ids"][0] for p in preds]

label = []

data = classification_report(Y_Test,prediction,label)

print(data)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.96      1.00      0.98        22
           2       1.00      0.92      0.96        13

   micro avg       0.98      0.98      0.98        54
   macro avg       0.99      0.97      0.98        54
weighted avg       0.98      0.98      0.98        54

conmat = confusion_matrix(Y_Test,prediction)

df = pd.DataFrame(data=conmat)
df

Confusion Matrix¶

skplt.metrics.plot_confusion_matrix(Y_Test, prediction)

<matplotlib.axes._subplots.AxesSubplot at 0x1a338c1a58>

Pythonist

Wednesday, May 15, 2019

Machine Learning Model for Wine DataSet

Machine Learning Model for Wine DataSet¶

Results 98.14 %¶

Import the Library¶

Load the Dataset¶

Get the X_Data¶

Get the Y_Data¶

Create Train and Test Data using Train Test Split¶

Create a Min Max Scalar to Scale the Dataset¶

Scale the X_Data¶

Scale the Y_ Data¶

Check the Shape of Data set¶

Create a Feature Colummn¶

Create Model¶

Create a input Function¶

Create a Test input Function¶

Test the Data¶

Confusion Matrix¶

Accuracy is 98.14¶

No comments:

Post a Comment

Getting started with LakeFS and Apache Iceberg Running Locally