RabbitMQ is great. One of the few pieces of software I've used that "just works"...

MR4D · on May 21, 2020

> The only downside is once you get message-queue-pilled, ...

I think this is why email will never die. It's basically turned into a huge message queue. Even voice mails come into my inbox.

====== EDIT - I meant to say "huge universal message queue" and left out the word "universal" accidentally

pjc50 · on May 21, 2020

It always was a message queue in a very literal sense.

There's a lot of work in mailer-daemons to ensure that email has as reliable as possible delivery in a store-and-forward system..

MR4D · on May 21, 2020

You're correct - I left out the word "universal" accidentally, which would have made my intent much more clear.

Thanks for catching that.

guptaneil · on May 21, 2020

I think this is what a lot of people who complain about Slack don't get. It's just a better message queue for your business. The fact that you can funnel all your business events, regardless of whether they originate from humans or bots, into one place and then each worker (again, either human or bot) can subscribe/filter/react to relevant events is super powerful. However, if you try to use it as a corporate SMS platform or email replacement, you will very quickly feel overwhelmed because both of those message queues are designed for much lower throughput.

emmelaich · on May 21, 2020

And you can literally use the maildir format for a queue!

https://pypi.org/project/dirq/

Perl had the original implementation, and there are implementations in other languages.

MR4D · on May 22, 2020

I forgot about that - used to be a cool hack!

dkersten · on May 21, 2020

How is it for production deployment? I was considering it for something recently, but got overwhelmed by the documentation on setting up a fault-tolerant production deployment, so have been avoiding it. Was this an overreaction? What is your experience with that?

Also, do you happen to know how well it works in a fault-tolerant way for communicating between services that are in different data centers?

My main use-case is to receive status/change notifications from a service running elsewhere from the API server servicing the UI, in order to avoid polling for new data.

pbhowmic · on May 21, 2020

RabbitMQ, even a single-node RabbitMQ, has a hard time going down. You are more likely to have your server/container go down long before RabbitMQ node goes down. That being said, if you want to have a clustered solution with nodes being in different DCs, configure shoveling (https://www.rabbitmq.com/shovel.html) or for a simpler solution, use a private VPN to interconnect the RabbitMQ nodes. I would go for the latter.

zonk_ · on May 21, 2020

We use it in a fairly big scale for our slack bot system. It was just set up once, as akyu said, and since then it just works. Whenever we had troubles, it was always anything other than RabbitMQ.

I've also looked into other solutions (ActiveMQ, Google PubSub, ...) and RabbitMQ is by far the most straight-forward and quick to set up. There are some edge cases that it doesn't cover as well, for example automatic retries, but there are some "RabbitMQ patterns" to make it work. For a simple message broker/queue system, it's great and the docs are also great.

neeleshs · on May 21, 2020

We use Google Pub-Sub and got the whole thing up and running very quickly with Spring integration. Message durability, automatic consumer load balancing, automatic retries, some easy broadcast patterns - all out of the box and literally a click of a button on the infra side.

Has worked out quite well so far.

dkersten · on May 21, 2020

Was it easy to setup in terms of reliability and failover?

Given what you and larrik are saying, I think I need to give it a trial run, but its a project with a tiny team, so I want to be sure it won't be the cause of sleepless nights when things go wrong. It sounds like RabbitMQ is quite solid and shouldn't be the cause for concern, which is promising!

Is there anything I should keep in mind for running it in production? Any best practices or gotchas, based on your experience (eg don't run in docker, or make sure there's lots of RAM or things like that)? I guess its all in the production checklist. I need to read through it all again!

ekimekim · on May 21, 2020

This came up in several other threads here: Don't use RabbitMQ's clustering. It's surprisingly brittle and hard to recover from.

The accepted wisdom that I've seen is to run a single broker with a completely independent hot spare. But of course switching over to your hot spare will violate most of the guarentees that Rabbit gives you around durability, ordering etc, so you have to be very careful how you use it.

I desperately want to like Rabbit (and have used it heavily in the past) but right now I wouldn't use it if I can get away with anything else, it just has no real HA story.

jackvanlightly · on May 21, 2020

Rabbit dev here. We released quorum queues a few months ago. It's a Raft based replicated queue that addresses all the old problems. https://www.rabbitmq.com/blog/2020/04/20/rabbitmq-gets-an-ha...

sciurus · on May 21, 2020

Thanks for all your work on RabbitMQ, and for your great blog posts about it and other messaging systems.

For anyone who wants to understand the potential complexities of HA RabbitMQ, spend some time reading https://jack-vanlightly.com/blog/2018/8/31/rabbitmq-vs-kafka...

theptip · on May 21, 2020

In my experience RMQ is solid enough that for many use-cases it's reasonable to run it without a standby (especially if you're on Kubernetes where you'll get a replacement instance created automatically if your active instance fails).

A common use-case is for async tasks (Celery) that can tolerate a few minutes of downtime. If you're running a fully evented architecture then this might not apply - though if you're not targeting 4-5 nines of reliability or an RTO of < 5mins, then you might not need a standby even if RMQ is a core part of your architecture. "Avoid single points of failure" is a good heuristic, but "consider the SLAs of your dependencies" is the more granular way of thinking about this, and a single RMQ instance has a very high uptime.

For context I had an RMQ docker container running for almost two years without any interruptions. If you're in a small team then HA might well be overkill.

Fun gotcha - if you're running RMQ in Kubernetes/Docker, make sure you give it an explicit memory limit, else it will try to allocate disk space equal to 40% of your host's memory. (See "memory limits" in https://hub.docker.com/_/rabbitmq). That's a good best-practice for any containerized environment regardless what workload you're running, but this one will cause errors if you're trying to use a small disk volume on a host with lots of memory.

hrpnk · on May 22, 2020

Which Kubernetes operator for RabbitMQ are you using?

theptip · on May 22, 2020

I’m just using the Helm manifests, but when I set this up operators were not a thing. I’d probably look into the operator approach if I was starting from scratch now.

gog · on May 21, 2020

I've been running RabbitMQ for >8 years in production, once even in a fleet of 180 buses where every bus had an instance of rabbitmq running locally.

Never had a single issue in all those years.

But, I must admit that running a HA cluster is something that I've never tried, it sounds complicated and scary once you start digging through the docs.

All my deployments have been to bare metal Ubuntu and Debian machines with durable Qs and messages.

If you need to use transactions, they are really slow, couple of orders of magnitude slower compared to regular AMQP usage.

zonk_ · on May 21, 2020

> Was it easy to setup in terms of reliability and failover?

To be honest, the current setup was set up by colleagues who have even less experience than me and it's still running flawlessly. Iirc, it's just two instances that are behind a load balancer and the consumer just consumes from both, but I'm not super certain on that.

I've tested the cluster functionality to see how to set it up and it worked fine for me, but I have no experience with that in production, but other people in this thread don't seem to be too happy with it, so ymmv.

> Is there anything I should keep in mind for running it in production?

Nothing special that you wouldn't do otherwise, when getting to know a new component/microservice. Just check out the get started section[1] on their page in the appropriate language and play around with a small setup. Get familiar with the libraries to connect and send/queue/fetch stuff and the topics. Make sure to use your brainpower before you set it up to handle all eventualities and is set up exactly how you want it to act (ACK/NACK, what happens if a sender/consumer dies, etc.), because then you set it up once and probably never touch it again.

One thing I'm not really sure on and what I haven't really answered myself yet: there might be some "logic" to your RabbitMQ instance, depending on which metadata you add to each message (e.g. retries). If you have such logic, it might be better to have a service around the RabbitMQ instance, otherwise this logic ends up in your code base of you actual solution and that might not be wanted and maybe harder to maintain. But I'm not so sure myself on that one.

Oh yeah, and check out some patterns for your needs. There are for example multiple ways to implement retries, for example with a queue for queues, etc. But if you API is REST based, everything should be straightforward.

[1]: https://www.rabbitmq.com/getstarted.html

dkersten · on May 21, 2020

Thanks for the detailed response (and to everyone else who responded too!), I appreciate it! I will prototype something and play around and see how it handles different situations when I get time. I've also just bought the book mentioned elsewhere, so hopefully I can get up to speed quickly. It does sound that my original impression about it being complex to run/maintain was perhaps overblown. That's good, because from a features point of view, RabbitMQ seemed like a good fit for the things I want an MQ solution for.

heipei · on May 21, 2020

My go-to solution for fault-tolerant message queues is nsq (https://nsq.io/). nsq works differently from most other message queues in that it's supposed to be run in a distributed fashion, i.e. one nsqd running wherever messages are produced. That way you have a lightweight and fast local message queue that you can push messages to and not worry about network connectivity. You can use nsqlookupd to find the distributed nsqd that hold the topic you want to subscribe to, or you can run an additional nsq-to-nsq process to push messages from one broker to the next. It's a really great and very mature and stable piece of software. I'd say the only downside to using nsq is that you have to invest a little more in monitoring and you have to make sure that network connectivity between your consumer and each nsqd that carries a certain topic is possible.

dkersten · on May 21, 2020

Thanks for the recommendation! That looks pretty nice and “ops friendly” is definitely a plus. I will investigate this further.

ecoqba11 · on May 21, 2020

Been using RabbitMQ for a lot of projects in production. It can handle quite a lot data and this thing never fails. Sometimes it can be running for an entire year and we force restart just because.

rawoke083600 · on May 21, 2020

Yup been my production experience as well ! Super solid system !

ketralnis · on May 21, 2020

We use it for more or less everything at reddit. Almost every user action corresponds to a rabbit queue

akoncius · on May 21, 2020

sounds cool! how big queues are on your setup? how big mq instances (servers) are? do you use HA, replications/failovers?

larrik · on May 21, 2020

We switched from Amazon's SQS to RabbitMQ, because SQS was killing our performance, and wasn't nearly as powerful overall.

RabbitMQ gave us such a performance increase that we killed our database. We ended up having to rate limit RabbitMQ!

jaquers · on May 21, 2020

> I was considering it for something recently, but got overwhelmed by the documentation on setting up a fault-tolerant production deployment, so have been avoiding it. Was this an overreaction?

In general the defaults are pretty good I think. There is a one page production deployment guide: https://www.rabbitmq.com/production-checklist.html that I followed to replace our handbuilt cluster w/ a new automated deployment, plus a few other niceties like docker logs & rmq metrics to cloudwatch and then auto clustering via autoscaling groups lookup.

I thoroughness of the docs can perhaps seem daunting, but I see it as a badge of quality and especially if you are growing it's usage organically it should "just work".

tankerdude · on May 21, 2020

If it's super simple like that and the throughput isn't massive, use something else you don't need to support, like AWS's SQS.

If you're bad at hosting and need the throughput, there's cloudamqp.

So many options for pub/sub systems so use what works for you.

nailer · on May 21, 2020

+1. Discovered RabbitMQ/AMQP around 2010, since then tech went through a 2015-era wave of HTTP microservices that has come, and, largely gone or moved to MQ.

carterklein13 · on May 21, 2020

When you say "gone or moved to MQ" - if not moved to messaging services like RabbitMQ/NATS/etc, where else could things have gone? At least from my experience, HTTP microservices are still very common, especially when using things like AWS Lambdas.

I feel like most continually-running backends will make use of RabbitMQ/NATS/ZeroMQ/etc, or more and more I see lightweight systems going completely serverless and just using lambdas - which are HTTP microservices.

nailer · on May 21, 2020

> When you say "gone or moved to MQ" - if not moved to messaging services like RabbitMQ/NATS/etc, where else could things have gone?

They could have stayed trying to do continually running microservices on HTTP.

> I feel like most continually-running backends will make use of RabbitMQ/NATS/ZeroMQ/etc

I do too.

> more and more I see lightweight systems going completely serverless and just using lambdas - which are HTTP microservices.

Likewise.

But long running HTTP microservices are lame, and everybody realises that now, despite it being a cool idea back in 2015.

carterklein13 · on May 21, 2020

To be fair, I started working post-2015, so I've actually never come face-to-face with a long running HTTP microservice backend... what would something like that even look like? I'm thinking of systems I've worked on that use a messaging queue, but that only rely on HTTP requests - is that what it would be? So like, I'd make a request to a microservice behind an endpoint, which in turn would make requests to 3 more microservices behind other endpoints? If so, I'm certainly glad that idea isn't cool anymore because that seems greatly inefficient :)

fluxsauce · on May 21, 2020

  moved to MQ

Are you referring to IBM MQ?

jonesetc · on May 21, 2020

probably just meant message queues in general.