Index Modules

Published by user on

Features

1. Overview

2. Static Index Settings

3. Dynamic Index Settings

4. Index Shard Allocation

4.1. Shard allocation filtering

4.2. Index allocation filter settings

4.3. Allocation delay on a node leave

4.4. Index recovery order

4.5. Shard allocation

5. Logging Slow Searches

6. Slow Log Index

7. Store Module

1. Overview

Index modules adjust the configurations of the indices and control how the indices will act. Index modules are created for all indices individually.

Adjusting the configurations means, for example, the primary shard number of an index or replica shard number, the compression method to store data, etc.

There are two types of index settings: Static and Dynamic.

2. Static Index Settings

Static index settings have to be set ahead of time while creating an index or after an index is closed for a closed index.

Some basic static index settings:

  • The number of primary shards that the index has (index.number_of_shards, default value=1). This setting cannot be changed after creation
  • The compression type (index.codec), LZ4 compression, is the default compression type. Higher compression types can be used but at the cost of slower queries
  • The custom routing value (index.routing_partition_size) will decide whether to go to a bunch of shards or to a single shard. Defaults to 1
  • If the index is composed of 1 shard, then the custom routing value can be 1. Otherwise, it should be less than the shard number of the index
  • This value needs to be decided at the index creation. Increasing this value causes having more shards to search but also provides smoother distributed data

3. Dynamic Index Settings

Dynamic index settings can be changed anytime desired at runtime

  • The number of replicas in a primary shard (index.number_of_replicas), the default value is 1. For single node architecture, it should be set 0
  • The number of replicas can be increased automatically (index.auto_expand_replicas). By default, this setting is set to false. In order to set this value, a lower and upper limit needs to be defined, for example, 2-6. If maximum and minimum values wanted to be used, then it needs to be set as 0-all
  • The time needed for a shard to be considered idle if no queries have been done in that time interval (index.search.idle.after), the default value is 30s
  • The refresh interval (index.refresh_interval) setting enables us to decide how often an index will catch the changes and show them in search. Defaults to 1 second. In order to disable the setting needs to be set -1. The idling shards will not be refreshed until they are searched
  • Routing allocation setting (index.routing.allocation.enable) enables shard allocation by various options:
    • all, this is the default value and allows all shards to be allocated
    • primaries, only primary shards can be allocated
    • new_primaries, only newly created primary shards can be allocated
    • none, none of the shards can be allocated
  • Rebalancing the shards (index.routing.rebalance.enable) lets us decide which shards will be rebalanced
    • all, the default setting and all shards can be rebalanced
    • primaries, only primary shards can be rebalanced
    • replicas, only newly created primary shards can be rebalanced
    • none, none of the shards can be rebalanced
  • Unassigned shard allocation timeout setting (index.unassigned.node_left.delayed_timeout), the default value is 1 minute, describes how long the master node will wait to take action if a node goes missing

4. Index Shard Allocation

Shards can be allocated to nodes by using this module and settings can be configured index by index.

4.1. Shard allocation filtering

The shards are going to be allocated somewhere, but where? Shard allocation filters can be used to decide the location of the shards of an index.

In this manner, alongside the built-in attributes, custom attributes can also be used as a shard allocation filter. The built-in attributes are _name, _host_ip, _publish_ip, _ip, _host, _id, _tier, and _tier_preference.

A custom attribute can be added to a node by adding the attribute to the node’s elasticsearch.yml file: e.g., if we have 3 nodes and we want to filter them on their order, the order attribute can be added by node.attr.order: first.

4.2. Index allocation filter settings

  • index.routing.allocation.include.{attribute}: The index must have at least one of the comma-separated values in the attribute in order to be assigned to a node
  • index.routing.allocation.require.{attribute}: The index must have all of the comma-separated values in the attribute in order to be assigned to a node
  • index.routing.allocation.exclude.{attribute}: The index must have none of the comma-separated values in the attribute in order to be assigned to a node

In order to allocate the shards of the index e.g used_cars to first or second nodes

PUT used_cars/_settings
{
“index.routing.allocation.include.order”: “first,second”
}

4.3. Allocation delay on a node leave

When a node says goodbye to its fellows in the cluster, the master node takes action to prevent data loss. From then, there are 3 ways available to track.

  1. If the leaver is a primary shard, the master node promotes a replica shard and gets it to be a primary shard
  2. If the leaver is a replica shard, the master node allocates a new replica shard to fill in for the leaver
  3. Master node re-balances the shards on the nodes what’s left on the hand

Whenever a node goes missing, these steps shouldn’t be taken immediately. A timeout must be set in case of the node went missing because of some communication problem. If the node comes back alive while the master node trying to do some action, this causes extra work on resources.

Master node will wait 10 minutes before allocating the unassigned shards that the leaver node left-off:

PUT _all/_settings
{
"settings": {
  "index.unassigned.node_left.delayed_timeout": "10m"
  }
}
If the timeout is set to 0, the setting is disabled so the missing shards will be allocated as soon as possible.

4.4. Index recovery order

By default, the newest index will be recovered first and the oldest will be recovered last but the priority of an index can be set to change the order.

PUT cars-000001

PUT cars-000002
{
"settings": {
  "index.priority": 100
  }
}

PUT cars-000003

PUT cars-000004
{
"settings": {
  "index.priority": 10
  }
}

4.5. Shard allocation

In the perspective of the cluster, shards will be spread along the nodes, but is there a limit? So, how many nodes? By default, as many as possible. 

index.routing.allocation.total_shards_per_node is used for defining how many shards will be allocated to a single node.

cluster.routing.allocation.total_shards_per_node is used for defining how many shards will be allocated to each node.

These limits must be set carefully, because they are hard limits, and if the limit is not sufficient, some shards will be left unallocated.

5. Logging Slow Searches

Dynamic settings contain some configurations for logging slow queries. The operation is done at the shard level. Slow logs may enlighten us to see why the searches are slow.

To set timers to log setting should be put into the index:

PUT *cars/_settings
{
  "index.search.slowlog.threshold.query.warn": "15s",
  "index.search.slowlog.threshold.query.info": "8s",
  "index.search.slowlog.threshold.query.debug": "4s",
  "index.search.slowlog.threshold.query.trace": "1s",
  "index.search.slowlog.threshold.fetch.warn": "1500ms",
  "index.search.slowlog.threshold.fetch.info": "800ms",
  "index.search.slowlog.threshold.fetch.debug": "1s",
  "index.search.slowlog.threshold.fetch.trace": "500ms"
}
Setting the values to -1 means the timer is off and 0ms will show all the processing queries.

There are two phases: query and fetch.

The time taken in the query phase means the time taken while getting a list of the documents from all the searched shards.

The time taken in the fetch phase means how long it takes to get the correct shard containing the queried documents and get the result.

The queries which take longer once in a while are not the problem most of the time, but if some queries always take a long time to result, that must be investigated.

6. Slow Log Index

The same mentality as searching.

PUT *cars/_settings
{
  "index.indexing.slowlog.threshold.index.warn": "15s",
  "index.indexing.slowlog.threshold.index.info": "8s",
  "index.indexing.slowlog.threshold.index.debug": "4s",
  "index.indexing.slowlog.threshold.index.trace": "1s",
  "index.indexing.slowlog.source": "2000"
}
index.indexing.slowlog.source designates how many characters will be logged of the _source field, by default 1000.

7. Store Module

Changing storage type is not recommended due to its bond to the heap memory. It can be altered in elasticsearch.yml or can be defined at index creation time for an individual index with index.store.type parameter. 

Some of the storage types:

  • fs
  • niofs
  • mmapfs
  • hybridfs

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published.