Elastic Search Tutorials¶

Soumil Nitin Shah¶
Bachelor in Electronic Engineering | Masters in Electrical Engineering | Master in Computer Engineering |
Website : https://soumilshah.herokuapp.com
Github: https://github.com/soumilshah1995
Linkedin: https://www.linkedin.com/in/shah-soumil/
Blog: https://soumilshah1995.blogspot.com/
Youtube : https://www.youtube.com/channel/UC_eOodxvwS_H7x2uLQa-svw?view_as=subscriber
Facebook Page : https://www.facebook.com/soumilshah1995/
Email : shahsoumil519@gmail.com

Method 1:¶

Step 1:¶

try:
    import os
    import sys
    
    import elasticsearch
    from elasticsearch import Elasticsearch 
    import pandas as pd
    
    print("All Modules Loaded ! ")
except Exception as e:
    print("Some Modules are Missing {}".format(e))

All Modules Loaded !

Step 2:¶

def connect_elasticsearch():
    es = None
    es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
    if es.ping():
        print('Yupiee  Connected ')
    else:
        print('Awww it could not connect!')
    return es
es = connect_elasticsearch()

Yupiee  Connected

Step 3: Define Query¶

myquey = {
   "_source": [],
   "size": 10,
   "query": {
      "bool": {
         "must": [],
         "filter": [
            {
               "exists": {
                  "field": "director"
               }
            }
         ],
         "should": [
            {
               "match_phrase": {
                  "director": "Richard "
               }
            }
         ],
         "must_not": []
      }
   }
}

Step 4:¶

Elastic Search¶

index -> name of the index name
Scroll -> How long you want the scroll to stay in his case 2m
Size -> How many records you need in each cycle
Body-> ELK Query

res = es.search(
  index = 'netflix',
  scroll = '2m',
  size = 10,
  body = myquey)

counter = 0 
sid =  res["_scroll_id"]
scroll_size = res['hits']['total']
scroll_size = scroll_size['value']


# Start scrolling
while (scroll_size > 0):

    #print("Scrolling...")
    page = es.scroll(scroll_id = sid, scroll = '10m')
    
    #print("Hits : ",len(page["hits"]["hits"]))
    
    # Update the scroll ID
    sid = page['_scroll_id']

    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])

    #print("Scroll Size {} ".format(scroll_size))
    
    # Do something with the obtained page
    counter = counter + 1

print("Total Pages  : {}".format(counter))

Total Pages  : 427

Method 2:¶

the idea Goes Like this we need to map page number and we divide the search into parts
say the size was 500
Page 1 -> 0-10
Page 2 -> 10-20
All this time the size is same the query is same all that is changing is from and to word

think this way you will only query once and create a hashmap key are page number and value would be sliced Records

res = es.search(
  index = 'netflix',
  size = 100,
  body = myquey)

data = res["hits"]["hits"]

hashmap = {}

step = 2
hashmap = {}
for i in range(len(data)):
    if i==0:
        hashmap[i] = data[0:step]
    else:
        startIndex = step * i
        EndIndex =  ((i+1) * (step))
        sample = data[startIndex:EndIndex]
        hashmap[i] = sample

Method 3:¶

The approach we are taking here is basically
page 1 correspond to from 0
page 2 correspond to from 10
idea is eevry page keep increment the from varibale

First Time Query Becomes¶

myquey = {
   "_source": [],
   "size": 10,
    "from":0
   "query": {
      "bool": {
         "must": [],
         "filter": [
            {
               "exists": {
                  "field": "director"
               }
            }
         ],
         "should": [
            {
               "match_phrase": {
                  "director": "Richard "
               }
            }
         ],
         "must_not": []
      }
   }
}

Secodn Time Query Becomes¶

myquey = {
   "_source": [],
   "size": 10,
    "from":10,
   "query": {
      "bool": {
         "must": [],
         "filter": [
            {
               "exists": {
                  "field": "director"
               }
            }
         ],
         "should": [
            {
               "match_phrase": {
                  "director": "Richard "
               }
            }
         ],
         "must_not": []
      }
   }
}

Method 4: Search After Query¶

myquey = {
   "_source": [],
   "size": 10,
   "query": {
      "bool": {
         "must": [],
         "filter": [
            {
               "exists": {
                  "field": "director"
               }
            }
         ],
         "should": [
            {
               "match_phrase": {
                  "director": "Richard "
               }
            }
         ],
         "must_not": []
      }
   }
}

res = es.search(
  index = 'netflix',
  size = 100,
  body = myquey)

def create_scroll(res):
    """
    :param res: json
    :return: string
    """

    try:
        data = res.get("hits", None).get("hits", None)
        data = data[-1]
        score = data.get("_score", None)
        scroll_id_ = data.get("_id", None)
        unique_scroll_id = "{},{}".format(score, scroll_id_)
        return unique_scroll_id
    except Exception as e:
        return "Error,scroll error "

scroll = create_scroll(res)

This is out unique Scroll¶

scroll

'0.0,8URc93IB135PBBnB55dH'

Next time we will pass this scroll¶

score, scroll_id = scroll.split(",")

myquey["search_after"] = [score, scroll_id]
myquey["sort"] = [{"_score": "desc", "_id": "desc"}]

New Query for next page becomes¶

new_query ={
   "_source":[

   ],
   "size":10,
   "query":{
      "bool":{
         "must":[

         ],
         "filter":[
            {
               "exists":{
                  "field":"director"
               }
            }
         ],
         "should":[
            {
               "match_phrase":{
                  "director":"Richard "
               }
            }
         ],
         "must_not":[

         ]
      }
   },
   "search_after":[
      "0.0",
      "8URc93IB135PBBnB55dH"
   ],
   "sort":[
      {
         "_score":"desc",
         "_id":"desc"
      }
   ]
}

Now perfrom the search on Elastic search and you will get the result¶

make sure again create a scroll and then next page keep reperating the process

Please Dont Forget to Like and Share the Article if found useful¶

Pythonist

Tuesday, June 30, 2020

4 Ways to do Pagination or scrolling in Elastic Search Tutorials

Elastic Search Tutorials¶

Method 1:¶

Step 1:¶

Step 2:¶

Step 3: Define Query¶

Step 4:¶

Elastic Search¶

Method 2:¶

Method 3:¶

First Time Query Becomes¶

Secodn Time Query Becomes¶

Method 4: Search After Query¶

This is out unique Scroll¶

Next time we will pass this scroll¶

New Query for next page becomes¶

Now perfrom the search on Elastic search and you will get the result¶

2 comments:

SPJ Joins in Iceberg how to use them | Faster Join Avoid Shuffle

Tuesday, June 30, 2020

4 Ways to do Pagination or scrolling in Elastic Search Tutorials

Elastic Search Tutorials¶

4 Ways to do Pagination or scrolling in Elastic Search Tutorials¶

Method 1:¶

Step 1:¶

Step 2:¶

Step 3: Define Query¶

Step 4:¶

Elastic Search¶

Method 2:¶

Method 3:¶

First Time Query Becomes¶

Secodn Time Query Becomes¶

Method 4: Search After Query¶

This is out unique Scroll¶

Next time we will pass this scroll¶

New Query for next page becomes¶

Now perfrom the search on Elastic search and you will get the result¶

Please Dont Forget to Like and Share the Article if found useful¶

2 comments:

SPJ Joins in Iceberg how to use them | Faster Join Avoid Shuffle