Elastic Search

Elasticsearch Reindex All Indices and Check the Status

When you’re working with databases, you’ll inevitably need to make changes such as adding, removing, and modifying data.

When you’re modifying data in an Elasticsearch index, it can lead to downtime as the functionality gets completed and the data gets reindexed.

This tutorial will give you a much better way of updating indices without experiencing any downtime with the existing data source. Using the Elasticsearch re-indexing API, we will copy data from a specific source to another.

Let us get started.

NOTE: Before we get started, Reindexing operations are resource-heavy, especially on large indices. To minimize the time required for Reindexing, disable number_of_replicas by setting the value to 0 and enable them once the process is complete.

Enable _Source Field

The Reindexing operation requires the source field to be enabled on all the documents in the source index. Note that the source field is not indexed and cannot be searched but is useful for various requests.

Enable the _Source field by adding an entry as shown below:

PUT index_1
{
  “mappings”: {
    "_source": {
      "enabled": true
    }
  }
}

Reindex All Documents

To reindex documents, we need to specify the source and destination. Source and destination can be an existing index, index alias, and data streams. You can use indices from the local or a remote cluster.

NOTE: For indexing to occur successfully, both source and destination cannot be similar. You must also configure the destination as required before Reindexing because it does not apply settings from the source or any associated template.

The general syntax for Reindexing is as:

POST /_reindex

Let us start by creating two indices. The first one will be the source, and the other one will be the destination.

PUT /source_index
{
  "settings": {"number_of_replicas": 0, "number_of_shards": 1},
  "mappings": {"_source": {"enabled": true}},"aliases": {
    "alias_1": {},
    "alias_2": {
      "filter": {"term": {
        "user.id": "kibana"
      }},"routing": "1"
    }
  }
}

The cURL command is:

curl -XPUT "http://localhost:9200/source_index" -H 'Content-Type: application/json' -d'{  "settings": {"number_of_replicas": 0, "number_of_shards": 1},  "mappings": {"_source": {"enabled": true}},"aliases": {    "alias_1": {},    "alias_2": {      "filter": {"term": {        "user.id": "kibana"      }},"routing": "1"    }  }}'

Now for the destination index (you can use the above command and change a few things or use the one given below):

PUT /destination_index
{
  "settings": {"number_of_replicas": 0, "number_of_shards": 1},
  "mappings": {"_source": {"enabled": true}},"aliases": {
    "alias_3": {},
    "alias_4": {
      "filter": {"term": {
        "user.id": "kibana"
      }},"routing": "1"
    }
  }
}

As always, cURL users can use the command:

curl -XPUT "http://localhost:9200/destination_index" -H 'Content-Type: application/json' -d'{  "settings": {"number_of_replicas": 0, "number_of_shards": 1},  "mappings": {"_source": {"enabled": true}},"aliases": {    "alias_3": {},    "alias_4": {      "filter": {"term": {        "user.id": "kibana"      }},"routing": "1"    }  }}'

Now, we have the indices that we want to use, we can then move on to reindex the documents.

Consider the request below that copies the data from source_index to destination_index:

POST _reindex
{
  “source”: {
    "index": "source_index"
  },
  "dest": {
    "index": "destination_index"
  }
}

The cURL command for this is:

curl -XPOST "http://localhost:9200/_reindex" -H 'Content-Type: application/json' -d'{  "source": {    "index": ".kibana"  },  "dest": {    "index": "destination_index"  }}'

Executing this command should give you detailed information about the operation carried out.

NOTE: The source_index should have data.

{
  "took" : 2836,
  "timed_out" : false,
  "total" : 13059,
  "updated" : 0,
  "created" : 13059,
  "deleted" : 0,
  "batches" : 14,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

Checking Reindexing Status

You can view the status of the Reindexing operations by simply using the _tasks. For example, consider the request below:

GET /_tasks?detailed=true&actions=*reindex&group_by=parents

The cURL command is:

curl -XGET "http://localhost:9200/_tasks?detailed=true&actions=*reindex&group_by=parents"

That should give you detailed information about the Reindexing process as shown below:

{
  "tasks" : {
    "FTd_2iXjSXudN_Ua4tZhHg:51847" : {
      "node" : "FTd_2iXjSXudN_Ua4tZhHg",
      "id" : 51847,
      "type" : "transport",
      "action" : "indices:data/write/reindex",
      "status" : {
        "total" : 13059,
        "updated" : 9000,
        "created" : 0,
        "deleted" : 0,
        "batches" : 10,
        "version_conflicts" : 0,
        "noops" : 0,
        "retries" : {
          "bulk" : 0,
          "search" : 0
        },
        "throttled_millis" : 0,
        "requests_per_second" : -1.0,
        "throttled_until_millis" : 0
      },
      "description" : "reindex from [source_index] to [destination_index][_doc]",
      "start_time_in_millis" : 1611247308063,
      "running_time_in_nanos" : 2094157836,
      "cancellable" : true,
      "headers" : { }
    }
  }
}

Conclusion

We’ve covered everything you need to know about using Elasticsearch Reindexing API to copy documents from one index (source) to another (destination). Although there is more to the Reindexing API, this guide should help you get started.

About the author

John Otieno

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list