{"id":390,"date":"2022-08-05T10:11:45","date_gmt":"2022-08-05T10:11:45","guid":{"rendered":"http:\/\/192.168.1.102\/?p=390"},"modified":"2022-10-19T18:27:42","modified_gmt":"2022-10-19T18:27:42","slug":"index-lifecycle-management","status":"publish","type":"post","link":"https:\/\/192.168.12.139\/blog\/index-lifecycle-management","title":{"rendered":"Index Lifecycle Management"},"content":{"rendered":"
1. Overview<\/a><\/p>\n 2. Basics<\/a><\/p>\n 2.1 Phases<\/a><\/p>\n 2.2 Segment<\/a><\/p>\n 2.3 Rollover<\/a><\/p>\n 2.4 Merge<\/a><\/p>\n 2.5 Shrink<\/a><\/p>\n 2.6 Searchable Snapshots<\/a><\/p>\n 2.7 Fully Mounted Index<\/a><\/p>\n 2.8 Partially Mounted Index<\/a><\/p>\n 2.9 Index Priority<\/a><\/p>\n 2.10 Wait for Snapshot<\/a><\/p>\n 2.11 Some Design Considerations<\/a><\/p>\n <\/a><\/p>\n Indices don’t have an infinite lifetime like us humans, and they shouldn’t have. They are created, live, and die when their service is appreciated.<\/p>\n Indices need to be managed in favor of performance and stability. An oversized<\/em> index<\/em> causes some problems further down the road. Indices’ lifecycles are managed by Index Lifecycle Management (ILM) policies<\/strong>. <\/span><\/p>\n Policies<\/em> decide when to create a new index, rollover<\/em> an index depending on the configuration, transition<\/em> of indices from one phase<\/em> to another such as hot<\/em>, warm<\/em>, and cold<\/em>, and a retention period<\/em>. So after a while, a determined time period (retention period) an index will be deleted.<\/span><\/p>\n <\/a><\/p>\n Let’s start by making clear some terms and actions.<\/span><\/p>\n <\/a><\/p>\n Hot:<\/span><\/strong><\/span> Read and write operations run in this phase. Documents are updated, deleted, created, or queried.<\/span><\/p>\n Warm:<\/span><\/strong><\/span> The index can still be updated but less likely, and can be queried.<\/span><\/p>\n Cold:<\/span><\/strong><\/span> Relative slower queries when compared to hot and warm tiers.<\/span><\/p>\n Frozen:<\/span><\/strong><\/span> The indices which are needed to be searched less frequently than the cold tier in this phase. It takes longer to search this tier than the cold tier. Used for old indices.<\/span><\/p>\n Delete:<\/span><\/strong><\/span> The indices that come to the end of the line will rest their final times in this phase. After their defined retention age they will be gone for good.<\/span><\/p>\n Data stream indices automatically use the hot tier<\/span><\/p>\n<\/blockquote>\n <\/a><\/p>\n <\/a><\/p>\n Inverted indices in shards<\/span><\/p>\n<\/li>\n When a query is run in a shard, segments will be queried in order then the results will be combined<\/span><\/p>\n<\/li>\n Immutable<\/span><\/p>\n<\/li>\n Elasticsearch flushes in a while in other words fsyncs the segments so new data can be written to the disk<\/span><\/p>\n<\/li>\n When the documents are indexed, Elasticsearch collects them in RAM<\/strong> and writes them into a new small segment, and every a couple of seconds makes this data can be found as a result of a search but this doesn’t mean that the data has been written to the disk<\/strong><\/span><\/p>\n<\/li>\n<\/ul>\n <\/a><\/p>\n When an index reaches some certain maximum index size<\/strong> or a maximum shard size<\/strong>, has some certain maximum number of documents<\/strong>, or reaches a maximum age<\/strong> (e.g., 30 days), Elasticsearch runs a rollover operation due to the index’s ILM policy and transitions the index to the next stop in hot-warm-cold<\/em> phase architecture. <\/span><\/p>\n Further transitions between warm,<\/em> cold<\/em>, and delete<\/em> phases happen due to the time interval of the user’s choice.<\/span><\/p>\n The following example has a multi-conditional rollover. The index will be rolled over when it’s 20 days old, or primary shard reaches 20 GB, or the number of documents in the index reaches a million. Depending on which condition is met first, the index will be rolled over on that condition.<\/p>\n There are a lot of options under the “Advanced settings” tab actually.<\/span><\/p>\n <\/a><\/p>\n Actions can be taken in Kibana UI under the Stack Management > Index Lifecycle Policies tab for a selected policy.\u00a0<\/span><\/p>\n During the hot phase, if searchable snapshot action is taken, in further phases, shrink and force merge actions will not be available to take because\u00a0<\/span>force merge should be done in the same phase of the phase before than searchable snapshot action.<\/span><\/strong><\/p><\/blockquote>\n <\/p>\n When a document is updated, the old one is marked as deleted, and a new document is indexed. During the\u00a0<\/span>merge<\/span><\/em>\u00a0operation, documents that are marked as deleted will be deleted.<\/span> The indices that are not going to be written can be shrunk to reduce their\u00a0<\/span>shard count<\/span><\/em>.\u00a0<\/span>Shrink<\/span><\/em>\u00a0action can be achieved by shrink index API or by an ILM action in the warm phase.<\/span><\/p>\n In the hot phase without rollover, the shrink will not be taken into account.<\/span><\/p>\n <\/a><\/p>\n Searchable snapshots enable cost savings by freeing us to use replica shards. So how does it achieve this?<\/p>\n As aforementioned,\u00a0<\/span>cold<\/span><\/em>\u00a0and\u00a0<\/span>frozen<\/span><\/em>\u00a0tiers contain infrequently searched data. In cold or frozen tier, an index can be turned into a searchable index by an ILM policy, and by default, it has no replicas.<\/span><\/p>\n Indices in the snapshots that have already been taken can be gotten searchable by mounting them. If a snapshot is cloned and then mounted, this will disengage the backup snapshot and searchable snapshot. So they can be managed by\u00a0<\/span>ILM policies<\/span><\/em>\u00a0individually.<\/span><\/p>\n Searchable snapshots are searched the same way indices are searched. <\/span><\/p>\n When it comes to dealing with old data, searchable snapshots are needed to be borne in mind since old data doesn’t require a fast response most of the time.<\/p>\n If an index is going to be used as a searchable snapshot, it is way better to have each shard single-segmented. Reads are done segment by segment in shards, so fewer segments come with less search time and less snapshot restore time.<\/span><\/p>\n When a search is run on a searchable snapshot and relative data to search cannot be found locally, it will be downloaded from the snapshot repository.<\/span><\/p>\n In order to run a query on a searchable snapshot, that snapshot needs to be mounted first as an index. This can be done by the user or ILM. <\/span><\/p>\n Searchable snapshots indices in a snapshot are restored as they are. There is also a condition: If the original index snapshot is dead, searchable snapshots cannot be restored.<\/span><\/p>\n <\/a><\/p>\n The first option is mounting an index fully available in the hot and cold tiers. This operation achieves locally stored, exactly cloned shards of the snapshotted index.<\/span><\/p>\n After the fully copied index is created locally, since it copies all of the data residing in the index, search capabilities on this type of mounted indices are reasonably good.<\/p>\n If a search query is run while copying the data locally, the query does not wait until the copy operation is done. It runs alongside the currently running copy operation, but of course, this query will take understandably longer.<\/p>\n These indices survive restarts after being fully copied to the local storage.<\/span><\/p>\n1. Overview<\/h2>\n
2. Basics<\/span><\/h2>\n
2.1. Phases<\/span><\/h3>\n
\n
2.2. Segment<\/h3>\n
\n
2.3. Rollover<\/h3>\n
PUT<\/span> _ilm<\/span>\/<\/span>policy<\/span>\/<\/span>test<\/span>-<\/span>pol<\/span><\/span>\r\n{<\/span>\r\n \u00a0\"policy\"<\/span>: {<\/span>\r\n \u00a0 \u00a0\"phases\"<\/span>: {<\/span>\r\n \u00a0 \u00a0 \u00a0\"hot\"<\/span>: {<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0\"actions\"<\/span>: {<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"rollover\"<\/span>: {<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"max_age\"<\/span>: \"20d\"<\/span>,<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"max_primary_shard_size\"<\/span>: \"25gb\"<\/span>,<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"max_docs\"<\/span>: 1000000<\/span><\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 },<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"set_priority\"<\/span>: {<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"priority\"<\/span>: 100<\/span><\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 }<\/span>\r\n \u00a0 \u00a0 \u00a0 },<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0\"min_age\"<\/span>: \"0ms\"<\/span><\/span>\r\n \u00a0 \u00a0 },<\/span>\r\n \u00a0 \u00a0 \u00a0\"delete\"<\/span>: {<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0\"min_age\"<\/span>: \"90d\"<\/span>,<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0\"actions\"<\/span>: {<\/span>\r\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"delete\"<\/span>: {}<\/span>\r\n \u00a0 \u00a0 \u00a0 }<\/span>\r\n \u00a0 \u00a0 }<\/span>\r\n \u00a0 }<\/span>\r\n }<\/span>\r\n}<\/span><\/pre>\n
2.4. Merge<\/h3>\n
\n<\/a><\/p>\n2.5. Shrink<\/h3>\n
2.6. Searchable Snapshots<\/span><\/h3>\n
2.7. Fully Mounted Index<\/h3>\n