Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We had some trouble with etcd at work with constant leader re-election and high CPU usage around last summer, we switched to consul and so far we are happy with it, but etcd seems to be better supported by 3rd party apps, so maybe we should take it for another spin.


This would certainly help. High CPU was also an issue that we started to notice on 0.4.6, here with some of the NYC CoreOS guys, and that's been fixed. Chalk it up to completely redoing internal communication.

EDIT: Master election too :)

(Disclaimer: etcd dev here)


Are you guys still per spec RAFT or have you diverged at this point?


The team has worked _very_ carefully to follow the raft state machine as described in the paper as close as possible. For example we have a set of tests[1] that takes possible problems outlined in the original paper and implements them as unit tests.

[1] https://github.com/coreos/etcd/blob/master/raft/raft_paper_t...


Keep in mind that reads do not go through Raft by default, so it's possible to get stale data.

They've added "consistent=true"/"quorum=true" URL parameters for GETs per https://github.com/coreos/etcd/issues/741 as a workaround.


I had problems due to high-cpu load as well. Haven't had an issue since I updated to latest etcd sometime late last year.


I had the same issues last around July - August and continued upgrading as they release new versions and somewhere along the way it got fixed.

You can also had to fine tune some timeouts (election and compression if I remember correctly) to get the best performance out it.


Pretty sure the high-cpu issue was a time.Ticker leak that was fixed earlier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: