Saturday, March 27, 2021

Build Powerful Text Classification Model with Spacy NLP | Amazon product review dataset | python

Spacy Text Classification

Authors

  • Soumil Nitin Shah

Soumil Nitin Shah

Bachelor in Electronic Engineering | Masters in Electrical Engineering | Master in Computer Engineering |

Excellent experience of building scalable and high-performance Software Applications combining distinctive skill sets in Internet of Things (IoT), Machine Learning and Full Stack Web Development in Python.

Step 1: Imports

In [80]:
import os
import pandas as pd
import spacy 
In [81]:
df=pd.read_csv("https://raw.githubusercontent.com/hanzhang0420/Women-Clothing-E-commerce/master/Womens%20Clothing%20E-Commerce%20Reviews.csv")
In [82]:
df.head(3)
Out[82]:
Unnamed: 0 Clothing ID Age Title Review Text Rating Recommended IND Positive Feedback Count Division Name Department Name Class Name
0 0 767 33 NaN Absolutely wonderful - silky and sexy and comf... 4 1 0 Initmates Intimate Intimates
1 1 1080 34 NaN Love this dress! it's sooo pretty. i happene... 5 1 4 General Dresses Dresses
2 2 1077 60 Some major design flaws I had such high hopes for this dress and reall... 3 0 0 General Dresses Dresses
In [83]:
df.isna().sum()
Out[83]:
Unnamed: 0                    0
Clothing ID                   0
Age                           0
Title                      3810
Review Text                 845
Rating                        0
Recommended IND               0
Positive Feedback Count       0
Division Name                14
Department Name              14
Class Name                   14
dtype: int64
In [84]:
df.shape
Out[84]:
(23486, 11)
In [85]:
df = df[['Review Text','Recommended IND']].dropna()
In [86]:
df.head(6)
Out[86]:
Review Text Recommended IND
0 Absolutely wonderful - silky and sexy and comf... 1
1 Love this dress! it's sooo pretty. i happene... 1
2 I had such high hopes for this dress and reall... 0
3 I love, love, love this jumpsuit. it's fun, fl... 1
4 This shirt is very flattering to all due to th... 1
5 I love tracy reese dresses, but this one is no... 0
In [87]:
# Negative Text 
df.iloc[5]['Review Text']
Out[87]:
'I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.'

Loading Spacy Library

In [88]:
nlp=spacy.load("en_core_web_sm")
nlp.pipe_names
c:\python38\lib\site-packages\spacy\util.py:275: UserWarning: [W031] Model 'en_core_web_sm' (2.2.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.2). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
Out[88]:
['tagger', 'parser', 'ner']
In [89]:
# we created a simple Text cat we are adding that to spacy :D 
textcat = nlp.create_pipe( "textcat", config={"exclusive_classes": True, "architecture": "simple_cnn"})
In [90]:
# Adding this to Pipe
nlp.add_pipe(textcat, last=True)
In [91]:
nlp.pipe_names
Out[91]:
['tagger', 'parser', 'ner', 'textcat']
In [92]:
# Adding the labels to textcat
textcat.add_label("POSITIVE")
textcat.add_label("NEGATIVE")
Out[92]:
1
In [93]:
textcat.labels
Out[93]:
('POSITIVE', 'NEGATIVE')

Step 2:

  • Pre procesing
In [94]:
# Converting review text to tuple 
df['tuples'] = df.apply(lambda row: (row['Review Text'],row['Recommended IND']), axis=1)
In [95]:
df.head(1)
Out[95]:
Review Text Recommended IND tuples
0 Absolutely wonderful - silky and sexy and comf... 1 (Absolutely wonderful - silky and sexy and com...

& This is how data looks like

In [96]:
# This is how data looks like 
df["tuples"][0]
Out[96]:
('Absolutely wonderful - silky and sexy and comfortable', 1)
In [97]:
# Converting tuple to List 
train = df['tuples'].tolist()
In [98]:
print(train[0])
print(len(train))
('Absolutely wonderful - silky and sexy and comfortable', 1)
22641
In [99]:
texts, labels = zip(*train)
In [100]:
texts[0]
Out[100]:
'Absolutely wonderful - silky and sexy and comfortable'
In [101]:
labels[0]
Out[101]:
1
  • what we did is bascially created Flags if its Positive aka True and negative we gave False
In [102]:
cats = []
for y in labels:
    if(bool(y)):
        cats.append({"POSITIVE": True, "NEGATIVE":False})
    else:
        cats.append({"POSITIVE": False, "NEGATIVE":True})
In [103]:
TrainX = texts
TrainY = cats
In [104]:
n_texts=23486
In [105]:
len(TrainX)
Out[105]:
22641
In [106]:
len(TrainY)
Out[106]:
22641
In [107]:
train_data = list(zip(TrainX,[{'cats': cats} for cats in TrainY]))
  • Data has to be in this format
In [108]:
train_data[0]
Out[108]:
('Absolutely wonderful - silky and sexy and comfortable',
 {'cats': {'POSITIVE': True, 'NEGATIVE': False}})
In [109]:
len(train_data)
Out[109]:
22641

Model

In [112]:
len(train_data)
Out[112]:
22641
In [113]:
n_iter=10
In [114]:
from spacy.util import minibatch, compounding

# Disabling other components
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
with nlp.disable_pipes(*other_pipes):  # only train textcat
    optimizer = nlp.begin_training()
    
    print("Training the model...")
    
    # Performing training
    for i in range(n_iter):
        print("Epoch : {} ".format(i))
        losses = {}
        batches = minibatch(train_data, size=compounding(4., 32., 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
                       losses=losses)
Training the model...
Epoch : 0 
Epoch : 1 
Epoch : 2 
Epoch : 3 
Epoch : 4 
Epoch : 5 
Epoch : 6 
Epoch : 7 
Epoch : 8 
Epoch : 9 
In [115]:
nlp.to_disk("sentiment")
In [116]:
nlp = spacy.load("sentiment")
c:\python38\lib\site-packages\spacy\util.py:275: UserWarning: [W031] Model 'en_core_web_sm' (2.2.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.2). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
In [117]:
# Testing the model
test_text = "I had such high hopes for this dress and really crappy worst product hate it wporst bad "
doc=nlp(test_text)
doc.cats 
Out[117]:
{'POSITIVE': 8.598086424171925e-06, 'NEGATIVE': 0.9999914169311523}

No comments:

Post a Comment

Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST endpoint

gluecat Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST e...