Was it easy to setup in terms of reliability and failover? Given what you and la...

ekimekim · on May 21, 2020

This came up in several other threads here: Don't use RabbitMQ's clustering. It's surprisingly brittle and hard to recover from.

The accepted wisdom that I've seen is to run a single broker with a completely independent hot spare. But of course switching over to your hot spare will violate most of the guarentees that Rabbit gives you around durability, ordering etc, so you have to be very careful how you use it.

I desperately want to like Rabbit (and have used it heavily in the past) but right now I wouldn't use it if I can get away with anything else, it just has no real HA story.

jackvanlightly · on May 21, 2020

Rabbit dev here. We released quorum queues a few months ago. It's a Raft based replicated queue that addresses all the old problems. https://www.rabbitmq.com/blog/2020/04/20/rabbitmq-gets-an-ha...

sciurus · on May 21, 2020

Thanks for all your work on RabbitMQ, and for your great blog posts about it and other messaging systems.

For anyone who wants to understand the potential complexities of HA RabbitMQ, spend some time reading https://jack-vanlightly.com/blog/2018/8/31/rabbitmq-vs-kafka...

theptip · on May 21, 2020

In my experience RMQ is solid enough that for many use-cases it's reasonable to run it without a standby (especially if you're on Kubernetes where you'll get a replacement instance created automatically if your active instance fails).

A common use-case is for async tasks (Celery) that can tolerate a few minutes of downtime. If you're running a fully evented architecture then this might not apply - though if you're not targeting 4-5 nines of reliability or an RTO of < 5mins, then you might not need a standby even if RMQ is a core part of your architecture. "Avoid single points of failure" is a good heuristic, but "consider the SLAs of your dependencies" is the more granular way of thinking about this, and a single RMQ instance has a very high uptime.

For context I had an RMQ docker container running for almost two years without any interruptions. If you're in a small team then HA might well be overkill.

Fun gotcha - if you're running RMQ in Kubernetes/Docker, make sure you give it an explicit memory limit, else it will try to allocate disk space equal to 40% of your host's memory. (See "memory limits" in https://hub.docker.com/_/rabbitmq). That's a good best-practice for any containerized environment regardless what workload you're running, but this one will cause errors if you're trying to use a small disk volume on a host with lots of memory.

hrpnk · on May 22, 2020

Which Kubernetes operator for RabbitMQ are you using?

theptip · on May 22, 2020

I’m just using the Helm manifests, but when I set this up operators were not a thing. I’d probably look into the operator approach if I was starting from scratch now.

gog · on May 21, 2020

I've been running RabbitMQ for >8 years in production, once even in a fleet of 180 buses where every bus had an instance of rabbitmq running locally.

Never had a single issue in all those years.

But, I must admit that running a HA cluster is something that I've never tried, it sounds complicated and scary once you start digging through the docs.

All my deployments have been to bare metal Ubuntu and Debian machines with durable Qs and messages.

If you need to use transactions, they are really slow, couple of orders of magnitude slower compared to regular AMQP usage.

zonk_ · on May 21, 2020

> Was it easy to setup in terms of reliability and failover?

To be honest, the current setup was set up by colleagues who have even less experience than me and it's still running flawlessly. Iirc, it's just two instances that are behind a load balancer and the consumer just consumes from both, but I'm not super certain on that.

I've tested the cluster functionality to see how to set it up and it worked fine for me, but I have no experience with that in production, but other people in this thread don't seem to be too happy with it, so ymmv.

> Is there anything I should keep in mind for running it in production?

Nothing special that you wouldn't do otherwise, when getting to know a new component/microservice. Just check out the get started section[1] on their page in the appropriate language and play around with a small setup. Get familiar with the libraries to connect and send/queue/fetch stuff and the topics. Make sure to use your brainpower before you set it up to handle all eventualities and is set up exactly how you want it to act (ACK/NACK, what happens if a sender/consumer dies, etc.), because then you set it up once and probably never touch it again.

One thing I'm not really sure on and what I haven't really answered myself yet: there might be some "logic" to your RabbitMQ instance, depending on which metadata you add to each message (e.g. retries). If you have such logic, it might be better to have a service around the RabbitMQ instance, otherwise this logic ends up in your code base of you actual solution and that might not be wanted and maybe harder to maintain. But I'm not so sure myself on that one.

Oh yeah, and check out some patterns for your needs. There are for example multiple ways to implement retries, for example with a queue for queues, etc. But if you API is REST based, everything should be straightforward.

[1]: https://www.rabbitmq.com/getstarted.html

dkersten · on May 21, 2020

Thanks for the detailed response (and to everyone else who responded too!), I appreciate it! I will prototype something and play around and see how it handles different situations when I get time. I've also just bought the book mentioned elsewhere, so hopefully I can get up to speed quickly. It does sound that my original impression about it being complex to run/maintain was perhaps overblown. That's good, because from a features point of view, RabbitMQ seemed like a good fit for the things I want an MQ solution for.