ElasticSearch Snapshot Lifecycle Management

Know How Guide and Hands on Guide for AWS

ElasticSearch Snapshot Lifecycle Management

Preparation

  1. Install ElasticSearch and Kibana on EC2

  2. Generate the Sample data
    pip install requests pytz elasticsearch --user
    python es_createIndex.py
    
  3. Check the index should contain 1000 documents ```bash curl -XGET ‘localhost:9200/_cat/indices?v&pretty’

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open testes-lifecycle-2020-11-25 7fluXBYtRRqz_Ai-J5BqVQ 1 1 1000 0 109.6kb 109.6kb


## Install S3 repository plugin

The S3 repository plugin adds support for using AWS S3 as a repository for Snapshot/Restore.

1. Install 

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3 systemctl daemon-reload systemctl restart elasticsearch.service


2. Configure
- Create S3 bucket `ray-elasticsearch` with Server-side encryption with CMK and disable versioning

- Enable the S3 lifecycle rule `es-on-ec2-snapshot-lifecycle`, for testing, I set the as below
1 day transition to Glacier glacier and 4 days permanently delete 

![lifecycle-rule](/aws-is-how/analytics/elasticsearch-lifecycle/images/lifecycle-rule.png)

![lifecycle-rule2](/aws-is-how/analytics/elasticsearch-lifecycle/images/lifecycle-rule2.png)

- Config the S3 repository
```bash
curl -X PUT "localhost:9200/_snapshot/my_s3_repository?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "ray-elasticsearch",
    "base_path": "es-on-ec2-snapshot"
  }
}
'

curl -X POST “localhost:9200/_nodes/reload_secure_settings?pretty” -H ‘Content-Type: application/json’ -d’ { “secure_settings_password”: “YOUR_keystore_Passwd” } ‘


- Verify the snapshot can be taken and stored into S3 bucket
```bash
curl -X PUT "localhost:9200/_snapshot/my_s3_repository/snapshot_1?wait_for_completion=true&pretty" -H 'Content-Type: application/json' -d'
{
  "indices": ["testes-lifecycle-*"],
  "ignore_unavailable": true,
  "include_global_state": false
}
'

{
  "snapshot" : {
    "snapshot" : "snapshot_1",
    "uuid" : "rtGY13HjTZyQDr9jo_Paow",
    "version_id" : 7100099,
    "version" : "7.10.0",
    "indices" : [
      "testes-lifecycle-2020-11-25"
    ],
    "data_streams" : [ ],
    "include_global_state" : false,
    "state" : "SUCCESS",
    "start_time" : "2020-11-25T09:29:01.859Z",
    "start_time_in_millis" : 1606296541859,
    "end_time" : "2020-11-25T09:29:02.660Z",
    "end_time_in_millis" : 1606296542660,
    "duration_in_millis" : 801,
    "failures" : [ ],
    "shards" : {
      "total" : 1,
      "failed" : 0,
      "successful" : 1
    }
  }
}

aws s3 ls s3://ray-elasticsearch/es-on-ec2-snapshot/

                           PRE indices/
2020-11-25 08:42:29          0
2020-11-25 09:29:04        457 index-0
2020-11-25 09:29:04          8 index.latest
2020-11-25 09:29:03        234 meta-rtGY13HjTZyQDr9jo_Paow.dat
2020-11-25 09:29:04        289 snap-rtGY13HjTZyQDr9jo_Paow.dat

Automate daily backups of Elasticsearch data using an Elasticsearch SLM policy

  1. Define a daily-snapshots policy to back up all of your data streams and indices daily at 10:00AM UTC.
curl -X PUT "localhost:9200/_slm/policy/daily-snapshots?pretty" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 0 10 * * ?", 
  "name": "<daily-snap-{now/d}>", 
  "repository": "my_s3_repository", 
  "config": { 
    "indices": ["testes-lifecycle-*"] 
  },
  "retention": { 
    "expire_after": "2d", 
    "min_count": 1, 
    "max_count": 3 
  }
}
'

curl -X PUT "localhost:9200/_slm/policy/nightly-snapshots?pretty" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 10 10 * * ?", 
  "name": "<nightly-snap-{now/d}>", 
  "repository": "my_s3_repository", 
  "config": { 
    "indices": ["testes-*"] 
  },
  "retention": { 
    "expire_after": "1d", 
    "min_count": 1, 
    "max_count": 2
  }
}
'
  1. Create some new index or add data to existed index
    • Add data to existed index
      python es_createIndex.py
      

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open testes-lifecycle-2020-11-25 7fluXBYtRRqz_Ai-J5BqVQ 1 1 2000 0 175kb 175kb


- To verify the policy, take a snapshot using the configuration right now
```bash
curl -X POST "localhost:9200/_slm/policy/daily-snapshots/_execute?pretty"
{
  "snapshot_name" : "daily-snap-2020.11.25-igijobi3qimbfxcsedixtq"
}

curl -X GET "localhost:9200/_slm/policy/daily-snapshots?human&pretty"

Snapshot retention task history

curl -X GET “localhost:9200/_slm/stats?pretty” { “retention_runs” : 0, “retention_failed” : 0, “retention_timed_out” : 0, “retention_deletion_time” : “0s”, “retention_deletion_time_millis” : 0, “total_snapshots_taken” : 1, “total_snapshots_failed” : 0, “total_snapshots_deleted” : 0, “total_snapshot_deletion_failures” : 0, “policy_stats” : [ { “policy” : “daily-snapshots”, “snapshots_taken” : 1, “snapshots_failed” : 0, “snapshots_deleted” : 0, “snapshot_deletion_failures” : 0 } ] }


## Test Elasticsearch SLM policy and S3 lifecycle rule

- Create Elasticsearch daily. 
- Create the snapshot daily; 
- Set the S3 lifecycle transit standard class to Glacier after 1 day when snapshot upload to S3; 
- Set Glacier delele the objects after 4 days when snapshot upload to S3; 
- Set the SLM policy after 2 days when snapshot taken. 

**Summary**
You can use the S3 lifecycle rule to control the achieve (convert the storage class from Standard to Glacier) and use the ES SLM to control the snapshot creation and retention on schedule

1. Check Elasticsearch whether can get the snapshot status

- Result: Failed to Elasticsearch retrieve the snapshot info when the snapshot has been converted to Glacier storage class

```bash
curl -X GET "localhost:9200/_snapshot/my_s3_repository/_all?pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "snapshot_exception",
        "reason" : "[my_s3_repository:daily-snap-2020.11.25-igijobi3qimbfxcsedixtq/lt8KPEloSUiiwaeyxkSBCA] Snapshot could not be read"
      }
    ],
    "type" : "snapshot_exception",
    "reason" : "[my_s3_repository:daily-snap-2020.11.25-igijobi3qimbfxcsedixtq/lt8KPEloSUiiwaeyxkSBCA] Snapshot could not be read",
    "caused_by" : {
      "type" : "amazon_s3_exception",
      "reason" : "The operation is not valid for the object's storage class (Service: Amazon S3; Status Code: 403; Error Code: InvalidObjectState; Request ID: 8BFA2EC093F962BC; S3 Extended Request ID: LfCQ1SNoe4sChGDnUxHZXQDEAszjMEjb4RU/JFjDbM6mUDbWKdkbDsBMjHOWtzzjbgbVj2nFj38=)"
    }
  },
  "status" : 500
}
  1. Check using Elasticsearch snapshot delete API to delete the snapshot when the snapshot has been converted to Glacier storage class
# Manual created snapshot
curl -X DELETE "localhost:9200/_snapshot/my_s3_repository/snapshot_1?pretty"
{
  "acknowledged" : true
}

curl -X GET "localhost:9200/_snapshot/my_s3_repository/snapshot_1?pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "snapshot_missing_exception",
        "reason" : "[my_s3_repository:snapshot_1] is missing"
      }
    ],
    "type" : "snapshot_missing_exception",
    "reason" : "[my_s3_repository:snapshot_1] is missing"
  },
  "status" : 404
}

# SLM created snapshot
curl -X GET "localhost:9200/_snapshot/my_s3_repository/daily-snap-2020.11.25-igijobi3qimbfxcsedixtq?pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "snapshot_exception",
        "reason" : "[my_s3_repository:daily-snap-2020.11.25-igijobi3qimbfxcsedixtq/lt8KPEloSUiiwaeyxkSBCA] Snapshot could not be read"
      }
    ],
    "type" : "snapshot_exception",
    "reason" : "[my_s3_repository:daily-snap-2020.11.25-igijobi3qimbfxcsedixtq/lt8KPEloSUiiwaeyxkSBCA] Snapshot could not be read",
    "caused_by" : {
      "type" : "amazon_s3_exception",
      "reason" : "The operation is not valid for the object's storage class (Service: Amazon S3; Status Code: 403; Error Code: InvalidObjectState; Request ID: 74183000200A1160; S3 Extended Request ID: N9S8m2yVyuHymAlOJfnKJ7tGKn5XVTO/P/MneLO+ZAb9b/rpzljM1LtM6vdGoWRu1Xn47HXXnHQ=)"
    }
  },
  "status" : 500
}

curl -X DELETE "localhost:9200/_snapshot/my_s3_repository/daily-snap-2020.11.25-igijobi3qimbfxcsedixtq?pretty"
{
  "acknowledged" : true
}

curl -X GET "localhost:9200/_snapshot/my_s3_repository/daily-snap-2020.11.25-igijobi3qimbfxcsedixtq?pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "snapshot_missing_exception",
        "reason" : "[my_s3_repository:daily-snap-2020.11.25-igijobi3qimbfxcsedixtq] is missing"
      }
    ],
    "type" : "snapshot_missing_exception",
    "reason" : "[my_s3_repository:daily-snap-2020.11.25-igijobi3qimbfxcsedixtq] is missing"
  },
  "status" : 404
}
  1. Check Elasticsearch SLM whether can successfully delete the snapshot when the snapshot has been converted to Glacier storage class
# Check SLM status
curl -X GET "localhost:9200/_slm/stats?pretty"
{
  "retention_runs" : 15,
  "retention_failed" : 1,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "1.2s",
  "retention_deletion_time_millis" : 1285,
  "total_snapshots_taken" : 10,
  "total_snapshots_failed" : 15,
  "total_snapshots_deleted" : 1,
  "total_snapshot_deletion_failures" : 0,
  "policy_stats" : [
    {
      "policy" : "daily-snapshots",
      "snapshots_taken" : 6,
      "snapshots_failed" : 10,
      "snapshots_deleted" : 1,
      "snapshot_deletion_failures" : 0
    },
    {
      "policy" : "nightly-snapshots",
      "snapshots_taken" : 4,
      "snapshots_failed" : 5,
      "snapshots_deleted" : 0,
      "snapshot_deletion_failures" : 0
    }
  ]
}

# Deletes any snapshots that are expired according to the policy’s retention rules.
curl -X POST "localhost:9200/_slm/_execute_retention?pretty"
{
  "acknowledged" : true
}

# Check SLM status
curl -X GET "localhost:9200/_slm/stats?pretty"
{
  "retention_runs" : 16,
  "retention_failed" : 1,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "3.4s",
  "retention_deletion_time_millis" : 3438,
  "total_snapshots_taken" : 10,
  "total_snapshots_failed" : 15,
  "total_snapshots_deleted" : 3,
  "total_snapshot_deletion_failures" : 0,
  "policy_stats" : [
    {
      "policy" : "daily-snapshots",
      "snapshots_taken" : 6,
      "snapshots_failed" : 10,
      "snapshots_deleted" : 2,
      "snapshot_deletion_failures" : 0
    },
    {
      "policy" : "nightly-snapshots",
      "snapshots_taken" : 4,
      "snapshots_failed" : 5,
      "snapshots_deleted" : 1,
      "snapshot_deletion_failures" : 0
    }
  ]
}
  1. Before Glacier delete the snapshot, try restore snapshot into Elasticsearch.
curl -X POST "localhost:9200/_snapshot/my_s3_repository/snapshot_1/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "testes-lifecycle-2020-11-25",
  "ignore_unavailable": true,
  "include_global_state": false,              
  "rename_pattern": "index_(.+)",
  "rename_replacement": "restored_index_$1",
  "include_aliases": false
}
'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "amazon_s3_exception",
        "reason" : "The operation is not valid for the object's storage class (Service: Amazon S3; Status Code: 403; Error Code: InvalidObjectState; Request ID: B39F7DDAF28ADC5E; S3 Extended Request ID: kBsluVu+LSfdXZ4603C627ZrI6ssVg0Q6McGxhbZQOWFInJ2zZt8K93kutMS1amxqBN4Pzm7DGA=)"
      }
    ],
    "type" : "amazon_s3_exception",
    "reason" : "The operation is not valid for the object's storage class (Service: Amazon S3; Status Code: 403; Error Code: InvalidObjectState; Request ID: B39F7DDAF28ADC5E; S3 Extended Request ID: kBsluVu+LSfdXZ4603C627ZrI6ssVg0Q6McGxhbZQOWFInJ2zZt8K93kutMS1amxqBN4Pzm7DGA=)"
  },
  "status" : 500
}
  1. Before Glacier delete the snapshot, restore the snapshot back to S3 Standard storage class and try restore snapshot into Elasticsearch.

Reference

Snapshot S3 repository plugin

ES snapshot-lifecycle-management