Indexing of ICD 11 Documents

Indexing of documents is done in two steps

Import the required libraries

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

Initialization of Index

  • Prepare Name and number of shards, clusters etc.
INDEX_NAME = "icd11"
NUMBER_SHARDS = 1 
NUMBER_REPLICAS = 0
  • Prepare request body
request_body = {
        "settings": {
            "number_of_shards": NUMBER_SHARDS,
            "number_of_replicas": NUMBER_REPLICAS
        },
        "mappings": {
                "properties": {
                    "id": {
                        "type": "keyword"
                    },
                    "tree":{
                        "type": "text"
                    },
                    "name":{
                        "type": "text"
                    },
                    "root":{
                        "type": "text"
                    },
                    "degree":{
                        "type": "integer"
                    },
                    "definition":{
                        "type": "text"
                    },
                    "synonym":{
                        "type": "text"
                    }
                }
            }
        }
  • Call Elasticsearch
es = Elasticsearch()
  • Facilitate deleting old indexing if already exist
if es.indices.exists(INDEX_NAME):
     res = es.indices.delete(index = INDEX_NAME)
     print("Deleting index %s , Response: %s" % (INDEX_NAME, res))
  • Create new Indexing
res = es.indices.create(index = INDEX_NAME, body = request_body)
print("Create index %s , Response: %s" % (INDEX_NAME, res))

Populating the Index

  • For each item in the data list, create the data dictionary and op dictionary
'''create data dictionary'''

data_dict = {}
data_dict["id"] = item["id"]
data_dict["tree"] = item["tree"]
data_dict["root"] = item["root"]
data_dict["name"] = item["name"]
data_dict['parents']=item['parents']
data_dict['childs'] = item['childs']
data_dict["sibls"] = item["sibls"]
data_dict["degree"] = item["degree"]
data_dict["synonym"] = item["synonym"]
data_dict["definition"] = item["definition"]

'''create  op dictionary'''
op_dict = {
     "index": {
          "_index": INDEX_NAME,
          "_id": data_dict["id"]
                    }
                }
  • Add each data dictionary into bulk data list
'''Put current data into the bulk''' 

bulk_data.append(op_dict)
bulk_data.append(data_dict)
  • Push the doc dictionary to Indexing
INDEX_NAME = "icd11"
bulk_size = 50
es = Elasticsearch()
es.bulk(index=INDEX_NAME, body=bulk_data, request_timeout = 500)