Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fantastic news, congrats Salvatore! Cannot _wait_ to replace some hacky Kafka uses with tried-and-true Redis4! :)


In what sense is Kafka (or your use of it) hacky? I have never used Kafka, but I have always thought of it as being more solidly engineered than Redis but also more complicated and perhaps tricky to deploy (based on blog posts I read).


In any context its used where the demand (by whatever measure you care to use: bandwidth, throughput, message durability, etc.) doesn't justify it or isn't a good use case of Kafka, for starters. That happens all the time, because every data and infrastructure engineer in the Bay Area wants to put Kafka on his resume.


But that doesn't explain why Kafka has any minimum the output required. Does it have usability issues?

A good tool should be able to be used at any scale.


Kafka has very poor tooling in my experience (a folder full of fairly buggy bash scripts...), and due to ZooKeeper requires a lot of operational care. For example, it's extremely easy to destroy a Kafka cluster by bringing a new, empty ZK server online with newer but incorrect data in its volume. ZK will happily trash the entire cluster thinking it has new instructions. So network isolation is key, which, while obvious, is another source of potential failure.

Kafka also has the JVM, which requires a lot of love to scale in my experience. I do not want my programmers messing around with GC options when writing to what should (to them) be exposed just like a regular file handle (except distributed across many systems). I strongly prefer to avoid Java applications at all costs - in my experience it takes years and years and years for Java based infrastructure to become relatively stable & reliable (see ElasticSearch 5.0, or ask anyone who has been oncall for a Tomcat based application). This is almost certainly personal bias, but it's my bias regardless.

Redis also has a _massive_ number of tooling / monitoring / ecosystem advantages, including hosted options, and can run on a single instance without configuration changes from the developers perspective.

I also have personal reasons to prefer Salvatore's work over the work of Confluent.


As someone that has both Kafka and Redis in use without issue, for years, (and is about to replace a lot of misused Redis instances with Kafka) I really fail to follow your points.

So, a Zookeeper cluster can't survive accidentally injecting just the right malicious data that will make it keel over. I'm sorry, how do you accidentally achieve that? Do you also accidentally configure your Redis Sentinel to replicate from /dev/null?

As a matter of fact, this announcement comes at a very inopportune time for me. antirez had the epiphany of reading on IRC about replicated logs instead of looking at the opening paragraphs of the Kafka documentation, and all the Redis evangelists at my job will now try to shoe-horn the wrong usecase back into Redis because Redis!1cos(0)!. Sigh.


> For example, it's extremely easy to destroy a Kafka cluster by bringing a new, empty ZK server online with newer but incorrect data in its volume. ZK will happily trash the entire cluster thinking it has new instructions.

How does that happen? I mean a new, empty ZK server with never data than the rest of the cluster?

Also, please note that ZK is not meant to be a database, but a coordination service, it's guarantee is to have all nodes being always in consistent state and neither of its nodes allow to make any changes if there's no quorum. So if a new node somehow has more recent data with higher serial number it's expected that remaining nodes will sync to that.


Exactly right - in my case the situation was another team accidentally bringing a new ZK node with "bad" but "new" data online. Had there been network isolation, no issues. Had there been static cluster identifiers, also no issues. It was a messy environment, and it should have been prevented by operational diligence, but my point is redis is "harder to mess up". As on on-call engineer, I'll always go with simpler, foolproof tools. Another qibble is how gnarly the client-side driver for Kafka is...

I don't hate Kafka, I just don't like ZK and find redis has better tooling and a better track record at my shops :)


In order to connect a ZK host to the cluster its IP needs to be included in configuration of all the nodes.

It's hard to accidentally add node to a cluster. A person who can "accidentally" add a ZK node has enough permission to do a lot of more devastating things accidentally.


Yep. All it takes is service discovery and a not-totally-familiar with ZK jr. sysadmin.

This is all in service to my point about simplicity and safety.


> ...in my experience it takes years and years and years for Java based infrastructure to become relatively stable & reliable (see ElasticSearch 5.0, or ask anyone who has been oncall for a Tomcat based application).

This is almost the exact opposite of how I determine what tools to use. If it's written in C or Java, I'm usually pretty confident that it is engineered by a team of experienced developers. Both because the languages are technically more difficult to use, and not as cool.

In contrast, if a tool is written with javascript (Node) or Ruby, and often times Python, I'm very hesitant.

In fact, this whole topic of streams has me wondering just how many developers out there are setting up clusters of Kafka or Redis or whatever is new and hip, when they could have saved themselves a huge amount of pain by using tried and true tools like JMS or ZeroMQ.

Most companies do NOT have a need for scaling like Netflix or LinkedIn, and I'm beginning to wonder if Kafka, Redis, etc are this year's version of NoSQL and MongoDB hype.


I should have clarified - web applications in Java, built by web developers, are not to be trusted. ES -usage-, not ES itself, is/was the nightmare before es5 (which is much much much more defensive against anti-patterns).

I love redis, so I certainly don't have issue with code written in C, heh. Just code written in C by junior developers :P


> A good tool should be able to be used at any scale.

I don't necessarily agree that a tool should be used at any scale even if it's technically possible to. Cost (multiple dimensions, including money and engineering effort) factors in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: