Sunday, January 23, 2022

Power of Semantics Search combined with Elastic Search | ML on ELK

master

Power of Semantics Search combined with Elastic Search | ML on ELK

Soumil Nitin Shah

Bachelor in Electronic Engineering | Masters in Electrical Engineering | Master in Computer Engineering |



Step 1: Define imports
In [4]:
try:
    import json
    import os
    import uuid

    import pandas as pd
    import numpy as np

    import elasticsearch
    from elasticsearch import Elasticsearch
    from elasticsearch import helpers
    from sentence_transformers import SentenceTransformer, util
    from tqdm import tqdm
    from dotenv import load_dotenv
    load_dotenv("secret.env")
except Exception as e:
    print("Some Modules are Missing :{}".format(e))

Step 2: Define helper classes

In [5]:
class Reader(object):

    def __init__(self, file_name):
        self.file_name = file_name

    def run(self):

        df = pd.read_csv(self.file_name, chunksize=3000)
        df = next(df)
        df = df.fillna("")
        return df
  • This class will Convert given text into Tokens
In [6]:
class Tokenizer(object):
    def __init__(self):
        self.model = SentenceTransformer('all-MiniLM-L6-v2')

    def get_token(self, documents):
        sentences  = [documents]
        sentence_embeddings = self.model.encode(sentences)
        _ = list(sentence_embeddings.flatten())
        encod_np_array = np.array(_)
        encod_list = encod_np_array.tolist()
        return encod_list
In [7]:
class ElasticSearchImports(object):
    def __init__(self, df, index_name='posting'):
        self.df = df
        self.index_name = index_name
        self.es = Elasticsearch(timeout=600,hosts=os.getenv("ENDPOINT"))

    def run(self):

        elk_data = self.df.to_dict("records")

        for job in elk_data:
            try:self.es.index(index=self.index_name,body=job)
            except Exception as e:pass

        return True

Step 3: Converting column to Vector Enbeddings

In [25]:
helper = Reader(file_name="data job posts.csv")
df = helper.run()
In [26]:
tqdm.pandas()
helper_token = Tokenizer()
df["vectors"] = df["jobpost"].progress_apply(helper_token.get_token)
100%|████████████████████████████████████████████████████████████████████████████| 3000/3000 [12:00<00:00,  4.17it/s]
In [70]:
helper_elk = ElasticSearchImports(df=df)
helper_elk.run()
c:\python38\lib\site-packages\cryptography\hazmat\backends\openssl\x509.py:15: CryptographyDeprecationWarning: This version of cryptography contains a temporary pyOpenSSL fallback path. Upgrade pyOpenSSL now.
  warnings.warn(
Out[70]:
True

Step 4: test

In [14]:
helper_token = Tokenizer()
INPUT = input("Enter the Input Query ")
token_vector = helper_token.get_token(INPUT)

query ={
  
   "size":50,
   "_source": "Title", 
   "query":{
      "bool":{
         "must":[
            {
               "knn":{
                  "vectors":{
                     "vector":token_vector,
                     "k":20
                  }
               }
            }
         ]
      }
   }
}
es = Elasticsearch(timeout=600, hosts=os.getenv("ENDPOINT"))
res = es.search(index='posting',
                size=50,
                body=query,
                request_timeout=55)

title = [x['_source']  for x in res['hits']['hits']]
title
Enter the Input Query I am looking for someone to make website and Webdeveloper
Out[14]:
[{'Title': 'Web Designer'},
 {'Title': 'Web Developer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Designer (Independent Contractor)'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Developer/ Programmer'},
 {'Title': 'Web Designer'},
 {'Title': 'Photoshop Graphics Web Designer'},
 {'Title': 'Photoshop Graphics Web Designer'},
 {'Title': 'Web Programmer'},
 {'Title': 'Web designer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Developer'},
 {'Title': 'Linux Administrator/ Developer'},
 {'Title': 'Web Programmer & Designer'},
 {'Title': 'Web Systems Group Engineer'},
 {'Title': 'PHP Programmer'},
 {'Title': 'Web Developer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Programmer & Designer'},
 {'Title': 'Web Developer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Designer/ Developer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Designer/ Developer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Designer'},
 {'Title': 'ASP.NET (C#) Web Developer'},
 {'Title': 'Web-Designer'},
 {'Title': 'Web Developer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Programmer/ Coder'},
 {'Title': 'Web Developer'},
 {'Title': 'WEb Designer'},
 {'Title': 'Web Developer'},
 {'Title': 'Web Programmer'},
 {'Title': 'Designer'},
 {'Title': 'Web Developer'},
 {'Title': 'Web Designer'},
 {'Title': 'Web Developer'},
 {'Title': 'Graphic Designer (Web/Print)'},
 {'Title': 'Software Developer/ Programmer'},
 {'Title': 'Web Programmer'},
 {'Title': 'Techincal Writer'},
 {'Title': 'Search Engine Optimization Specialists'},
 {'Title': 'Web Developer'},
 {'Title': 'Junior Engineer, Web Systems Department'},
 {'Title': 'Programmer'},
 {'Title': 'Senior ASP/ASP.NET Developer'}]
In [ ]:
 

Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST endpoint

gluecat Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST e...