Using Google Pre Trained Machine Learning Model Mobile Net to find Similar Images and using Jacard Index or Cosine Similarity or Pearson Similarity¶
About Myself¶
Hello! I’m Soumil Nitin Shah, a Software and Hardware Developer based in New York City. I have completed by Bachelor in Electronic Engineering and my Double master’s in Computer and Electrical Engineering. I Develop Python Based Cross Platform Desktop Application , Webpages , Software, REST API, Database and much more I have more than 2 Years of Experience in Python
Website : http://soumilshah.herokuapp.com/
- Youtube :https://www.youtube.com/channel/UC_eOodxvwS_H7x2uLQa-svw
Currently i work as a Software Engineer at JobTarget
Step 1:¶
- Define Imports
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import os
import pandas as pd
import matplotlib.pyplot as plt
import base64
from PIL import Image
import io
import math
from math import sqrt
%matplotlib inline
global embed
embed = hub.KerasLayer(os.getcwd())
- What i have did is i downloaded Google Pre trained Model and Extracted in current working cirectory when you unzip you should see three files Asset | Varibale and file ending with .pb extension
for x in os.listdir("."):
print(x)
Step 2:¶
Converting the Images to vectors i wrote a simple Helper class that takes a file name and outputs its corresponding Vectors
class TensorVector(object):
def __init__(self, FileName=None):
self.FileName = FileName
def process(self):
img = tf.io.read_file(self.FileName)
img = tf.io.decode_jpeg(img, channels=3)
img = tf.image.resize_with_pad(img, 224, 224)
img = tf.image.convert_image_dtype(img,tf.float32)[tf.newaxis, ...]
features = embed(img)
feature_set = np.squeeze(features)
return list(feature_set)
Step 2:¶
- Whenever i work with Images i always convert Image into base64 as on web usually this is format we use
- let me show how to convert Image into Base64
def convertBase64(FileName):
"""
Return the Numpy array for a image
"""
with open(FileName, "rb") as f:
data = f.read()
res = base64.b64encode(data)
base64data = res.decode("UTF-8")
imgdata = base64.b64decode(base64data)
image = Image.open(io.BytesIO(imgdata))
return np.array(image)
plt.imshow(convertBase64("1000010653_3390.jpg"))
Converting this Image into vector using pre tarined model¶
helper = TensorVector("1000010653_3390.jpg")
vector = helper.process()
len(vector)
- We just converted Image into Vector using pre trained Model Lets do iot for another image and see the similarity between two Images
plt.imshow(convertBase64("1000010653_3415.jpg"))
helper = TensorVector("1000010653_3415.jpg")
vector2 = helper.process()
len(vector2)
Apply Cosine Similarity¶
import math
from math import sqrt
def cosineSim(a1,a2):
sum = 0
suma1 = 0
sumb1 = 0
for i,j in zip(a1, a2):
suma1 += i * i
sumb1 += j*j
sum += i*j
cosine_sim = sum / ((sqrt(suma1))*(sqrt(sumb1)))
return cosine_sim
def jaccard_similarity(list1, list2):
intersection = len(list(set(list1).intersection(list2)))
union = (len(list1) + len(list2)) - intersection
return float(intersection) / union
def average(x):
assert len(x) > 0
return float(sum(x)) / len(x)
def pearson_def(x, y):
assert len(x) == len(y)
n = len(x)
assert n > 0
avg_x = average(x)
avg_y = average(y)
diffprod = 0
xdiff2 = 0
ydiff2 = 0
for idx in range(n):
xdiff = x[idx] - avg_x
ydiff = y[idx] - avg_y
diffprod += xdiff * ydiff
xdiff2 += xdiff * xdiff
ydiff2 += ydiff * ydiff
return diffprod / math.sqrt(xdiff2 * ydiff2)
similarity¶
1. Cosine Similarity¶
print("similarity Cosine : {} ".format(cosineSim(vector, vector2)))
2. jacard Similarity¶
print("similarity jacard : {} ".format(jaccard_similarity(vector, vector2)))
Pearson Similarity¶
print("similarity Pearson : {} ".format(pearson_def(vector, vector2)))
Euclidean distance¶
from scipy.spatial import distance
a = tuple(vector)
b = tuple(vector2)
dst = distance.euclidean(a, b)
print("Euclidean distance : {} ".format(dst))
3. Average Similarity¶
similarity = (jaccard_similarity(vector, vector2) + cosineSim(vector, vector2) + pearson_def(vector, vector2)) / 3
similarity