Friday, July 10, 2020

Computing Similarity on Images using Machine Learning and Cosine Similarity

Code

Using Google Pre Trained Machine Learning Model Mobile Net to find Similar Images and using Cosine Similarity Algorithms

About Myself

  • Hello! I’m Soumil Nitin Shah, a Software and Hardware Developer based in New York City. I have completed by Bachelor in Electronic Engineering and my Double master’s in Computer and Electrical Engineering. I Develop Python Based Cross Platform Desktop Application , Webpages , Software, REST API, Database and much more I have more than 2 Years of Experience in Python

  • Website : http://soumilshah.herokuapp.com/

  • Youtube :https://www.youtube.com/channel/UC_eOodxvwS_H7x2uLQa-svw

Currently i work as a Software Engineer at JobTarget

Step 1:

  • Define Imports
In [24]:
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import os
import pandas as pd
import matplotlib.pyplot as plt 
import base64
from PIL import Image
import io
import math 
from math import sqrt


%matplotlib inline

global embed
embed = hub.KerasLayer(os.getcwd())
  • What i have did is i downloaded Google Pre trained Model and Extracted in current working cirectory when you unzip you should see three files Asset | Varibale and file ending with .pb extension
In [3]:
for x in os.listdir("."):
    print(x)
.ipynb_checkpoints
1000010653_3390.jpg
1000010653_3415.jpg
1000010653_3419.jpg
1000010653_3421.jpg
assets
Code.ipynb
imagenet_mobilenet_v2_140_224_feature_vector_4.tar
saved_model.pb
variables

Step 2:

Converting the Images to vectors i wrote a simple Helper class that takes a file name and outputs its corresponding Vectors

In [5]:
class TensorVector(object):

    def __init__(self, FileName=None):
        self.FileName = FileName

    def process(self):

        img = tf.io.read_file(self.FileName)
        img = tf.io.decode_jpeg(img, channels=3)
        img = tf.image.resize_with_pad(img, 224, 224)
        img = tf.image.convert_image_dtype(img,tf.float32)[tf.newaxis, ...]
        features = embed(img)
        feature_set = np.squeeze(features)
        return list(feature_set)

Step 2:

  • Whenever i work with Images i always convert Image into base64 as on web usually this is format we use
  • let me show how to convert Image into Base64
In [13]:
def convertBase64(FileName):
    """
    Return the Numpy array for a image 
    """
    with open(FileName, "rb") as f:
        data = f.read()
        
    res = base64.b64encode(data)
    
    base64data = res.decode("UTF-8")
    
    imgdata = base64.b64decode(base64data)
    
    image = Image.open(io.BytesIO(imgdata))
    
    return np.array(image)
In [15]:
plt.imshow(convertBase64("1000010653_3390.jpg"))
Out[15]:
<matplotlib.image.AxesImage at 0x1e015c5c550>
Converting this Image into vector using pre tarined model
In [17]:
helper = TensorVector("1000010653_3390.jpg")
vector = helper.process()
In [19]:
len(vector)
Out[19]:
1792
  • We just converted Image into Vector using pre trained Model Lets do iot for another image and see the similarity between two Images
In [20]:
plt.imshow(convertBase64("1000010653_3415.jpg"))
Out[20]:
<matplotlib.image.AxesImage at 0x1e01ba6fca0>
In [21]:
helper = TensorVector("1000010653_3415.jpg")
vector2 = helper.process()
In [22]:
len(vector2)
Out[22]:
1792
Apply Cosine Similarity
In [23]:
def cosineSim(a1,a2):
    sum = 0
    suma1 = 0
    sumb1 = 0
    for i,j in zip(a1, a2):
        suma1 += i * i
        sumb1 += j*j
        sum += i*j
    cosine_sim = sum / ((sqrt(suma1))*(sqrt(sumb1)))
    return cosine_sim
In [25]:
cosineSim(vector, vector2)
Out[25]:
0.6222431779870471

Both Images are 60 % Similar

  • Remember each image relates to another image if you want to see that relation we can plot all these images their 1-D Vector we need to apply PCA to reduce Dimesion and then we can plot this on matplotlib to see pattern how this images relates to each other and also we can take nearest neighbour for fast Search we can use KNN ML on ELK which is new feature on AWS

2 comments:

  1. I high appreciate this post. It’s hard to find the good from the bad sometimes, but I think you’ve nailed it! would you mind updating your blog with more information? machine learning interview questions

    ReplyDelete
  2. Hi I cannot download the files from hub. Could you help me with that?

    ReplyDelete

Develop Full Text Search (Semantics Search) with Postgres (PGVector) and Python Hands on Lab

final-notebook Develop Full Text Search (Semantics Search) with Postgres (PGVector)...