Migrating an ElasticSearch cluster from version 2 to 5 can be challenging, even more if it is a big cluster.
In this post we will explore one of the strategies that can be used to do such migration, setting a up a playground environment in which we can learn the migration procedures and test things out safely.
Definition List
List of terms we will use:
- ES2: ElasticSearch v2.x.y
- ES5: ElasticSearch v5.x.y
Introduction
Let’s say you need to do a migration like that, you may come up with three strategies:
- use rack awaraness to split the cluster and selectively upgrade one rack to ES5 later on;
- snaphost on ES2 and restore on ES5 using NFS;
- snapshot on ES2 and restore on ES5 using S3 (this post subject).
The rack awareness strategy requires more manual intervertions and therefore is probably less safe (there will be a post about that). About NFS… well, S3 is simpler and probably safer.
Since production is no place to play around and test things out, I created two docker-compose environments to learn, test and polish the procedure so I can eventually do it in a production cluster with more security.
So, without further due, let’s get started!
ES2 Cluster Setup
First, we need an ES2 cluster. For testing purposes, we’ll create a 4 node cluster, one master and three data nodes:
# es2/docker-compose.yml
version: '2.2'
services:
master:
build: .
image: es2-plugs
env_file:
- .env
volumes:
- ./master.yml:/usr/share/elasticsearch/config/elasticsearch.yml
ports:
- "9200:9200"
data:
image: es2-plugs
env_file:
- .env
volumes:
- ./data.yml:/usr/share/elasticsearch/config/elasticsearch.yml
scale: 3
depends_on:
- master
Note that this docker-compose file builds a new image called es2-plugs
and
also depends on some config files.
The Dockerfile
looks like this:
# es2/Dockerfile
FROM elasticsearch:2.4.6-alpine
# cloud-aws is required for s3 snapshots
RUN /usr/share/elasticsearch/bin/plugin install --batch cloud-aws
# elasticsearch-migration shows us what we need to change in order to migrate
# to es5
RUN /usr/share/elasticsearch/bin/plugin install --batch https://github.com/elastic/elasticsearch-migration/releases/download/v2.0.4/elasticsearch-migration-2.0.4.zip
We also need an .env
file:
AWS_ACCESS_KEY_ID=your key
AWS_SECRET_KEY=your secret
PROTIP: If you don’t wan’t to use a real S3 bucket, you can start a local minio server and change set the cloud.aws.s3.endpoint on all elasticsearch.yml files.
The data.yml
file:
# es2/data.yml
network.host: 0.0.0.0
cluster.name: beckerz
discovery.zen.ping.unicast.hosts:
- master:9300
node.master: false
node.data: true
And finally the master.yml
file:
network.host: 0.0.0.0
cluster.name: beckerz
discovery.zen.minimum_master_nodes: 1
node.master: true
node.data: false
So, we should have a es2
folder with the following structure:
.
└── es2
├── .env
├── Dockerfile
├── data.yml
├── docker-compose.yml
└── master.yml
Now, we can just up our env with:
docker-compose up
You can check that the cluster is green:
curl -s "localhost:9200/_cluster/health"
It will build the image, launch one master and three data nodes talking to that master.
Adding some fake data
Let’s create a customer
index and add some fake data to it:
curl -sXPUT 'http://localhost:9200/customer/?pretty' -d '{
"settings" : {
"index" : {
"number_of_shards" : 6,
"number_of_replicas" : 2
}
}
}'
while ! curl -s "localhost:9200/_cat/indices?v" | grep green; do
sleep 0.1
done
for i in `seq 1 1000`; do
curl -sXPUT "localhost:9200/customer/external/$i?pretty" -d "
{
\"number\": $i,
\"name\": \"John Doe - $i\"
}"
done
Once that is done, time to snapshot it!
Snapshotting ES2
The first thing you need to do is to create a repository, let’s call it
backups
:
curl -sXPUT "localhost:9200/_snapshot/backups?pretty" -d'
{
"type": "s3",
"settings": {
"bucket": "my-bucket",
"base_path": "my-subfolder"
}
}
'
You can check it with:
curl -s "localhost:9200/_snapshot?pretty"
Now, let’s snapshot everything:
curl -sXPUT "localhost:9200/_snapshot/backups/snapshot_1?wait_for_completion=true&pretty"
Once it is done, you can already restore it on another cluster.
Check the elasticsearch-migration plugin
You can explore the migration plugin by going to its URL.
Setup an ES5 cluster
We’ll create another 4-node cluster using docker-compose, this time running ES5:
# es5/docker-compose.yml
version: '2.2'
services:
master:
build: .
image: es5-plugs
env_file:
- .env
environment:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- ./master.yml:/usr/share/elasticsearch/config/elasticsearch.yml
ports:
- "9400:9200" # this master wil bind to your local 9400 port
data:
image: es5-plugs
depends_on:
- master
scale: 3
env_file:
- .env
environment:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- ./data.yml:/usr/share/elasticsearch/config/elasticsearch.yml
We can use the same .env
, master.yml
and data.yml
file from our ES2 env.
The Dockerfile
is different though:
# es5/Dockerfile
FROM elasticsearch:5.6.10-alpine
# repository-s3 is required for s3 snapshots
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch repository-s3
So we will have the following structure now:
.
├── es2
│ ├── .env
│ ├── Dockerfile
│ ├── data.yml
│ ├── docker-compose.yml
│ └── master.yml
└── es5
├── .env
├── Dockerfile
├── data.yml
├── docker-compose.yml
└── master.yml
Great, let’s fire this cluster up!
docker-compose up
You can check that the cluster is green:
curl -s "localhost:9400/_cluster/health"
Attention: Note that the master port is now 9400.
We should now have one master and three data nodes running ES5 as well!
Restore ES2 snapshot into ES5
For that, we need to create a repository with the same arguments we used on ES2.
curl -sXPUT "localhost:9400/_snapshot/backups?pretty" -d'
{
"type": "s3",
"settings": {
"bucket": "my-bucket",
"base_path": "my-subfolder",
"readonly": true
}
}
'
Note the readonly
part. This is important because only a single cluster
can have write permissions into the bucket at a time.
And then, finally, restore that snapshot:
curl -sXPOST "localhost:9400/_snapshot/backups/snapshot_1/_restore?wait_for_completion=true&pretty"
You can always check that documents count and settings match and etc:
curl -s "localhost:9200/customer/_settings?pretty"
curl -s "localhost:9400/customer/_settings?pretty"
curl -s "localhost:9200/_count?pretty"
curl -s "localhost:9400/_count?pretty"
Incremental snapshots
So, on a real scenario, snapshots would probably take a lot of time (let’s say the cluster have hundreds of terabytes), and your users will still want to use the app meanwhile.
To minimize downtimes, you can probably do the following:
- full snapshot
- restore full snapshot
- incremental snapshot 1
- close all indices
- restore incremental snapshot 1
- bring app down or put it in read-only mode
- incremental snapshot 2
- close all indices
- restore incremental snapshot 2
- bring app up again, pointing to the new cluster
Doing it like that, your downtime will be reduced to the time between steps 5 and 8. If your app have such feature, you can leave it in read-only mode for some time to validate more things and avoid a split brain scenario, and once you are confident, finnaly enable everything - from that time on we can’t go back anymore.
Timings
I tested the same documents in both Amazon S3 and DigitalOcean Spaces. Amazon S3 seemed orders of magnitude faster (70% or so), so I would probably recommend you use S3 for production environments - or just measure it yourself 🤷♂️.
Conclusion
Using this strategy is fairly easy and you can basically abort it at any time if you find bugs in your app, for example.
I hope that the receipts for the test environment serve you well and that you can also use them to practice in a safe environment before working on “real” environments.
Next steps
On the next posts we will explore the rack awareness strategy.