Authors¶

Soumil Nitin Shah

Soumil Nitin Shah¶

Bachelor in Electronic Engineering | Masters in Electrical Engineering | Master in Computer Engineering |

Website : https://soumilshah.herokuapp.com
Github: https://github.com/soumilshah1995
Linkedin: https://www.linkedin.com/in/shah-soumil/
Blog: https://soumilshah1995.blogspot.com/
Youtube : https://www.youtube.com/channel/UC_eOodxvwS_H7x2uLQa-svw?view_as=subscriber
Facebook Page : https://www.facebook.com/soumilshah1995/
Email : shahsoumil519@gmail.com
projects : https://soumilshah.herokuapp.com/project

Excellent experience of building scalable and high-performance Software Applications combining distinctive skill sets in Internet of Things (IoT), Machine Learning and Full Stack Web Development in Python.

Step 1: Imports¶

import os
import pandas as pd
import spacy

df=pd.read_csv("https://raw.githubusercontent.com/hanzhang0420/Women-Clothing-E-commerce/master/Womens%20Clothing%20E-Commerce%20Reviews.csv")

df.head(3)

df.isna().sum()

Unnamed: 0                    0
Clothing ID                   0
Age                           0
Title                      3810
Review Text                 845
Rating                        0
Recommended IND               0
Positive Feedback Count       0
Division Name                14
Department Name              14
Class Name                   14
dtype: int64

df.shape

(23486, 11)

df = df[['Review Text','Recommended IND']].dropna()

df.head(6)

# Negative Text 
df.iloc[5]['Review Text']

'I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.'

Loading Spacy Library¶

nlp=spacy.load("en_core_web_sm")
nlp.pipe_names

c:\python38\lib\site-packages\spacy\util.py:275: UserWarning: [W031] Model 'en_core_web_sm' (2.2.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.2). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)

['tagger', 'parser', 'ner']

# we created a simple Text cat we are adding that to spacy :D 
textcat = nlp.create_pipe( "textcat", config={"exclusive_classes": True, "architecture": "simple_cnn"})

# Adding this to Pipe
nlp.add_pipe(textcat, last=True)

nlp.pipe_names

['tagger', 'parser', 'ner', 'textcat']

# Adding the labels to textcat
textcat.add_label("POSITIVE")
textcat.add_label("NEGATIVE")

1

textcat.labels

('POSITIVE', 'NEGATIVE')

Step 2:¶

Pre procesing

# Converting review text to tuple 
df['tuples'] = df.apply(lambda row: (row['Review Text'],row['Recommended IND']), axis=1)

df.head(1)

& This is how data looks like

# This is how data looks like 
df["tuples"][0]

('Absolutely wonderful - silky and sexy and comfortable', 1)

# Converting tuple to List 
train = df['tuples'].tolist()

print(train[0])
print(len(train))

('Absolutely wonderful - silky and sexy and comfortable', 1)
22641

texts, labels = zip(*train)

texts[0]

'Absolutely wonderful - silky and sexy and comfortable'

labels[0]

1

what we did is bascially created Flags if its Positive aka True and negative we gave False

cats = []
for y in labels:
    if(bool(y)):
        cats.append({"POSITIVE": True, "NEGATIVE":False})
    else:
        cats.append({"POSITIVE": False, "NEGATIVE":True})

TrainX = texts
TrainY = cats

n_texts=23486

len(TrainX)

22641

len(TrainY)

22641

train_data = list(zip(TrainX,[{'cats': cats} for cats in TrainY]))

Data has to be in this format

train_data[0]

('Absolutely wonderful - silky and sexy and comfortable',
 {'cats': {'POSITIVE': True, 'NEGATIVE': False}})

len(train_data)

22641

Model¶

len(train_data)

22641

n_iter=10

from spacy.util import minibatch, compounding

# Disabling other components
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
with nlp.disable_pipes(*other_pipes):  # only train textcat
    optimizer = nlp.begin_training()
    
    print("Training the model...")
    
    # Performing training
    for i in range(n_iter):
        print("Epoch : {} ".format(i))
        losses = {}
        batches = minibatch(train_data, size=compounding(4., 32., 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
                       losses=losses)

Training the model...
Epoch : 0 
Epoch : 1 
Epoch : 2 
Epoch : 3 
Epoch : 4 
Epoch : 5 
Epoch : 6 
Epoch : 7 
Epoch : 8 
Epoch : 9

nlp.to_disk("sentiment")

nlp = spacy.load("sentiment")

c:\python38\lib\site-packages\spacy\util.py:275: UserWarning: [W031] Model 'en_core_web_sm' (2.2.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.2). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)

# Testing the model
test_text = "I had such high hopes for this dress and really crappy worst product hate it wporst bad "
doc=nlp(test_text)
doc.cats

{'POSITIVE': 8.598086424171925e-06, 'NEGATIVE': 0.9999914169311523}

	Unnamed: 0	Clothing ID	Age	Title	Review Text	Rating	Recommended IND	Positive Feedback Count	Division Name	Department Name	Class Name
0	0	767	33	NaN	Absolutely wonderful - silky and sexy and comf...	4	1	0	Initmates	Intimate	Intimates
1	1	1080	34	NaN	Love this dress! it's sooo pretty. i happene...	5	1	4	General	Dresses	Dresses
2	2	1077	60	Some major design flaws	I had such high hopes for this dress and reall...	3	0	0	General	Dresses	Dresses

	Review Text	Recommended IND
0	Absolutely wonderful - silky and sexy and comf...	1
1	Love this dress! it's sooo pretty. i happene...	1
2	I had such high hopes for this dress and reall...	0
3	I love, love, love this jumpsuit. it's fun, fl...	1
4	This shirt is very flattering to all due to th...	1
5	I love tracy reese dresses, but this one is no...	0

Pythonist

Saturday, March 27, 2021

Build Powerful Text Classification Model with Spacy NLP | Amazon product review dataset | python