Authors¶
- Soumil Nitin Shah
Soumil Nitin Shah¶
Bachelor in Electronic Engineering | Masters in Electrical Engineering | Master in Computer Engineering |
- Website : https://soumilshah.herokuapp.com
- Github: https://github.com/soumilshah1995
- Linkedin: https://www.linkedin.com/in/shah-soumil/
- Blog: https://soumilshah1995.blogspot.com/
- Youtube : https://www.youtube.com/channel/UC_eOodxvwS_H7x2uLQa-svw?view_as=subscriber
- Facebook Page : https://www.facebook.com/soumilshah1995/
- Email : shahsoumil519@gmail.com
- projects : https://soumilshah.herokuapp.com/project
Excellent experience of building scalable and high-performance Software Applications combining distinctive skill sets in Internet of Things (IoT), Machine Learning and Full Stack Web Development in Python.
Step 1: Imports¶
import os
import pandas as pd
import spacy
df=pd.read_csv("https://raw.githubusercontent.com/hanzhang0420/Women-Clothing-E-commerce/master/Womens%20Clothing%20E-Commerce%20Reviews.csv")
df.head(3)
df.isna().sum()
df.shape
df = df[['Review Text','Recommended IND']].dropna()
df.head(6)
# Negative Text
df.iloc[5]['Review Text']
Loading Spacy Library¶
nlp=spacy.load("en_core_web_sm")
nlp.pipe_names
# we created a simple Text cat we are adding that to spacy :D
textcat = nlp.create_pipe( "textcat", config={"exclusive_classes": True, "architecture": "simple_cnn"})
# Adding this to Pipe
nlp.add_pipe(textcat, last=True)
nlp.pipe_names
# Adding the labels to textcat
textcat.add_label("POSITIVE")
textcat.add_label("NEGATIVE")
textcat.labels
Step 2:¶
- Pre procesing
# Converting review text to tuple
df['tuples'] = df.apply(lambda row: (row['Review Text'],row['Recommended IND']), axis=1)
df.head(1)
& This is how data looks like
# This is how data looks like
df["tuples"][0]
# Converting tuple to List
train = df['tuples'].tolist()
print(train[0])
print(len(train))
texts, labels = zip(*train)
texts[0]
labels[0]
- what we did is bascially created Flags if its Positive aka True and negative we gave False
cats = []
for y in labels:
if(bool(y)):
cats.append({"POSITIVE": True, "NEGATIVE":False})
else:
cats.append({"POSITIVE": False, "NEGATIVE":True})
TrainX = texts
TrainY = cats
n_texts=23486
len(TrainX)
len(TrainY)
train_data = list(zip(TrainX,[{'cats': cats} for cats in TrainY]))
- Data has to be in this format
train_data[0]
len(train_data)
Model¶
len(train_data)
n_iter=10
from spacy.util import minibatch, compounding
# Disabling other components
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
with nlp.disable_pipes(*other_pipes): # only train textcat
optimizer = nlp.begin_training()
print("Training the model...")
# Performing training
for i in range(n_iter):
print("Epoch : {} ".format(i))
losses = {}
batches = minibatch(train_data, size=compounding(4., 32., 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
losses=losses)
nlp.to_disk("sentiment")
nlp = spacy.load("sentiment")
# Testing the model
test_text = "I had such high hopes for this dress and really crappy worst product hate it wporst bad "
doc=nlp(test_text)
doc.cats