100 data nodes basically if you want fast ingress, keep shards small, once they ...

sandGorgon · on April 9, 2016

I assume you are also replicating your nodes...how does replication impact ingress? What happens when nodes exceed 10 GB? Do you split them?

sqlcook · on April 9, 2016

if you want the fastest ingress, disable replica until your ingress is done, its faster to create replica at the end of ETL for that given index. Also, you want to disable auto allocation as well, this will disable shard movement during ingress, re-enable it afterwards.

on a 100 node cluster i had roughly 500GB on each node. this was not a single index, multiple indexes, with roughly 8 shards per index per node. Shard count is pretty important to get correct.

I did not manually control document routing (it was hard based on the type of data i was ingressing), so it was set to auto and during the load i observed hotspots in the cluster (you have to look at BULK thread/queue length), some nodes were getting burst of docs while others were idle, roughly 40-50% of the nodes in the cluster were under utilized, and maybe 5-10% had hot spots from time to time.

Also, depending what you use to push data in, (I used ES hadoop plugin) , you have to account for shard segment merges, which literally pause ingress for a brief moment and merge segments in a given shard. You have to set retry to -1 (infinite) and retry delay to something like a second or two, otherwise you will end up with dropped documents.

sandGorgon · on April 9, 2016

this is brilliant ! if you had your ES and hadoop config somewhere it would be awesome