Saturday, March 27, 2021

Build Powerful Text Classification Model with Spacy NLP | Amazon product review dataset | python

Spacy Text Classification

Authors

  • Soumil Nitin Shah

Soumil Nitin Shah

Bachelor in Electronic Engineering | Masters in Electrical Engineering | Master in Computer Engineering |

Excellent experience of building scalable and high-performance Software Applications combining distinctive skill sets in Internet of Things (IoT), Machine Learning and Full Stack Web Development in Python.

Step 1: Imports

In [80]:
import os
import pandas as pd
import spacy 
In [81]:
df=pd.read_csv("https://raw.githubusercontent.com/hanzhang0420/Women-Clothing-E-commerce/master/Womens%20Clothing%20E-Commerce%20Reviews.csv")
In [82]:
df.head(3)
Out[82]:
Unnamed: 0 Clothing ID Age Title Review Text Rating Recommended IND Positive Feedback Count Division Name Department Name Class Name
0 0 767 33 NaN Absolutely wonderful - silky and sexy and comf... 4 1 0 Initmates Intimate Intimates
1 1 1080 34 NaN Love this dress! it's sooo pretty. i happene... 5 1 4 General Dresses Dresses
2 2 1077 60 Some major design flaws I had such high hopes for this dress and reall... 3 0 0 General Dresses Dresses
In [83]:
df.isna().sum()
Out[83]:
Unnamed: 0                    0
Clothing ID                   0
Age                           0
Title                      3810
Review Text                 845
Rating                        0
Recommended IND               0
Positive Feedback Count       0
Division Name                14
Department Name              14
Class Name                   14
dtype: int64
In [84]:
df.shape
Out[84]:
(23486, 11)
In [85]:
df = df[['Review Text','Recommended IND']].dropna()
In [86]:
df.head(6)
Out[86]:
Review Text Recommended IND
0 Absolutely wonderful - silky and sexy and comf... 1
1 Love this dress! it's sooo pretty. i happene... 1
2 I had such high hopes for this dress and reall... 0
3 I love, love, love this jumpsuit. it's fun, fl... 1
4 This shirt is very flattering to all due to th... 1
5 I love tracy reese dresses, but this one is no... 0
In [87]:
# Negative Text 
df.iloc[5]['Review Text']
Out[87]:
'I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.'

Loading Spacy Library

In [88]:
nlp=spacy.load("en_core_web_sm")
nlp.pipe_names
c:\python38\lib\site-packages\spacy\util.py:275: UserWarning: [W031] Model 'en_core_web_sm' (2.2.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.2). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
Out[88]:
['tagger', 'parser', 'ner']
In [89]:
# we created a simple Text cat we are adding that to spacy :D 
textcat = nlp.create_pipe( "textcat", config={"exclusive_classes": True, "architecture": "simple_cnn"})
In [90]:
# Adding this to Pipe
nlp.add_pipe(textcat, last=True)
In [91]:
nlp.pipe_names
Out[91]:
['tagger', 'parser', 'ner', 'textcat']
In [92]:
# Adding the labels to textcat
textcat.add_label("POSITIVE")
textcat.add_label("NEGATIVE")
Out[92]:
1
In [93]:
textcat.labels
Out[93]:
('POSITIVE', 'NEGATIVE')

Step 2:

  • Pre procesing
In [94]:
# Converting review text to tuple 
df['tuples'] = df.apply(lambda row: (row['Review Text'],row['Recommended IND']), axis=1)
In [95]:
df.head(1)
Out[95]:
Review Text Recommended IND tuples
0 Absolutely wonderful - silky and sexy and comf... 1 (Absolutely wonderful - silky and sexy and com...

& This is how data looks like

In [96]:
# This is how data looks like 
df["tuples"][0]
Out[96]:
('Absolutely wonderful - silky and sexy and comfortable', 1)
In [97]:
# Converting tuple to List 
train = df['tuples'].tolist()
In [98]:
print(train[0])
print(len(train))
('Absolutely wonderful - silky and sexy and comfortable', 1)
22641
In [99]:
texts, labels = zip(*train)
In [100]:
texts[0]
Out[100]:
'Absolutely wonderful - silky and sexy and comfortable'
In [101]:
labels[0]
Out[101]:
1
  • what we did is bascially created Flags if its Positive aka True and negative we gave False
In [102]:
cats = []
for y in labels:
    if(bool(y)):
        cats.append({"POSITIVE": True, "NEGATIVE":False})
    else:
        cats.append({"POSITIVE": False, "NEGATIVE":True})
In [103]:
TrainX = texts
TrainY = cats
In [104]:
n_texts=23486
In [105]:
len(TrainX)
Out[105]:
22641
In [106]:
len(TrainY)
Out[106]:
22641
In [107]:
train_data = list(zip(TrainX,[{'cats': cats} for cats in TrainY]))
  • Data has to be in this format
In [108]:
train_data[0]
Out[108]:
('Absolutely wonderful - silky and sexy and comfortable',
 {'cats': {'POSITIVE': True, 'NEGATIVE': False}})
In [109]:
len(train_data)
Out[109]:
22641

Model

In [112]:
len(train_data)
Out[112]:
22641
In [113]:
n_iter=10
In [114]:
from spacy.util import minibatch, compounding

# Disabling other components
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
with nlp.disable_pipes(*other_pipes):  # only train textcat
    optimizer = nlp.begin_training()
    
    print("Training the model...")
    
    # Performing training
    for i in range(n_iter):
        print("Epoch : {} ".format(i))
        losses = {}
        batches = minibatch(train_data, size=compounding(4., 32., 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
                       losses=losses)
Training the model...
Epoch : 0 
Epoch : 1 
Epoch : 2 
Epoch : 3 
Epoch : 4 
Epoch : 5 
Epoch : 6 
Epoch : 7 
Epoch : 8 
Epoch : 9 
In [115]:
nlp.to_disk("sentiment")
In [116]:
nlp = spacy.load("sentiment")
c:\python38\lib\site-packages\spacy\util.py:275: UserWarning: [W031] Model 'en_core_web_sm' (2.2.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.2). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
In [117]:
# Testing the model
test_text = "I had such high hopes for this dress and really crappy worst product hate it wporst bad "
doc=nlp(test_text)
doc.cats 
Out[117]:
{'POSITIVE': 8.598086424171925e-06, 'NEGATIVE': 0.9999914169311523}

Developer Guide: Getting Started with Flink (PyFlink) and Hudi - Setting Up Your Local Environment and Performing CRUD Operations via flink

flink-hudi-final Install Flink and Python ¶ conda info --envs # Create ENV conda ...