Hacker Newsnew | past | comments | ask | show | jobs | submit | uberduper's commentslogin

What a timely article and comment. I've been watching a lecture series over the last few days about quantum mechanics and the many worlds interpretation. And I have questions.

I may have missed it or didn't understand it when I heard it explained. What underpins the notion that when a particle transitions from a superposed to defined state, the other basis states continue to exist? If they have to continue to exist, then okay many worlds, but why do we think (or know?) they must continue to exist?


Because quantum mechanics describes the universe with a wave function, which evolves according to the schroedinger equation.

In it, there is no notion of collapse. The only thing that makes sense is saying the observer becomes entangled with the measurement.

So if you only look at the Schrödinger equation, this is the only conclusion.

Wave function collapse is something which is simply added ad-hoc to describe our observation, not something which is actually defined in QM


That's an unsatisfying answer. I have some work to do if I want to understand it.


Double slit experiment has been done with electrons which are, afaik, much easier to detect and send single file. It's been done with molecules. It's not a thought experiment.

Quantum superposition is real. There's no doubt about that.


Not a physicist, just here to observe single photons weren't reliably emitted until the modern era. like the 1970s. The double slit experiment pre-dates this. it's from 1801. The one which confirms "self interaction" was 1974. I was in high school 1973-78 so the stuff we did, was comparatively "new" physics in that sense. Not a message I remember receiving at the time.

From the pop-sci history reading I do, "detecting" reliable generation of single photon STREAMS in the early days depended on using mechanisms which inherently would release a sequence of photons on a time base, over time, and then gating the time sufficiently accurately to have high confidence you know the time base, and can discriminate an "individual" from the herd.

I don't doubt quantum theory. I only observe it's mostly for young students (like almost all received wisdom) grounded in experiments which don't actually do what people think they do. The ones you run in the school lab are illustrative not probative.

What people do in places like the BIPM in Paris, or CERN, isn't the same as that experiment you did with a ticker-tape and a weighted trolleycar down a ramp. "it's been confirmed" is the unfortunate reality of received wisdom, and inherently depends on trust in science. I do trust science.

Now we have quantum dots, and processes which will depend on reliably emitting single photons and single electrons, the trust has moved beyond "because they did it in CERN" into "because it's implemented in the chipset attached to the system I am using" -QC will need massive amounts of reliably generated single instance signals.


> just here to observe single photons weren't reliably emitted until the modern era.

A dim light bulb from a few feet away emits on the order of 1k photons/sec, which is low enough that you can count individual emissions using fairly simple analog equipment [0] [1].

> The double slit experiment pre-dates this. it's from 1801. The one which confirms "self interaction" was 1974.

There's an experiment from 1909 that demonstrated the double-slit experiment with single(ish) photons [2].

> I only observe it's mostly for young students (like almost all received wisdom) grounded in experiments which don't actually do what people think they do. The ones you run in the school lab are illustrative not probative.

> What people do in places like the BIPM in Paris, or CERN, isn't the same as that experiment you did with a ticker-tape and a weighted trolleycar down a ramp. "it's been confirmed" is the unfortunate reality of received wisdom, and inherently depends on trust in science. I do trust science.

The double-slit experiment is actually fairly easy and cheap to run [3]. Certainly more complicated than ticker tape, but not by much.

[0]: https://en.wikipedia.org/wiki/Scintillation_counter

[1]: https://en.wikipedia.org/wiki/Photomultiplier_tube

[2]: https://www.biodiversitylibrary.org/page/31034247

[3]: https://www.teachspin.com/two-slit


It's difficult to quantify the value of "I know the shit out of linux" to a prospective employer when they're looking for cog developer #471.

In my experience it's the network of people you've worked with that know how beneficial you are and want to work with you again (this is key) that will keep you in demand regardless of the market conditions.


Victim-blaming is not necessary in this hiring environment. In the last decade only small companies have been available to me which means there’s under five folks I can turn to directly for jobs, and all are not hiring now.


I've made quite a career out of knowing how linux works and not reinventing the wheels it provides. I read man pages. I sometimes run `systemctl list-unit-files` and say, "hmm what is that??" then go find out what it is. I've been at this for decades and curiosity keeps pushing me to learn new things and keep up with recent developments.


But how did you get your first Linux job? That's where I'm stuck at. Where I live, there's literally zero entry level Linux roles, and the literally couple of Linux roles that are available require you to have centuries worth of enterprise experience with Kuberneres, Openshift, Ansible, Chef, Puppet, Terraform etc...


I was a windows guy at a large auction site and started bringing linux in to my workflows and solutions. I'd already been gaining personal experience with linux and the BSDs, solaris, etc. That was my last "windows job."

I'd say there's really no "linux roles" out there. Entry level or not. Everyone collectively decided "devops" was a big bright beautiful tomorrow and made devs learn release management and made ops people get lost (or become the developer they never wanted to be). Everyone shifted their focus towards "as code" solutions because the SRE book said nobody should log in to servers or something. So we hire people that know the abstractions instead and assume nobody really needs to go deeper than that.

It sucks, but learning the abstractions is how you're gonna have to get started. If you're already a linux nerd then you may benefit from understanding what the abstraction is doing under the hood.

If I was starting out right now, I'd go work through Kelsey Hightower's 'Kubernetes The Hard Way' and build functional kubernetes clusters from scratch on a couple of the cloud providers. Do not copy&paste anything from the blog. Every line, every command, by hand. Type out those CSRs and manifests! Recognize what each component you're setting up is and what it's used for. Like "what is the CCM and what's it responsible for?" Or "What's going on with every step of the kubelet bootstrapping process? What controllers are involved and what are they doing?" Read EVERYTHING under kubernetes.io/docs. Understand the relationships between all the primitives.

If you already have some linux, networking, and containers knowledge to build on top of, I think you could work through all of that in less than 4 weeks and have a better understanding of kubernetes than 80%+ of engineers at any level and crush a kubernetes focused interview.


Thanks but my point still stands: there's no entry-level roles, whether it's "Linux" or a Linux-based "DevOps" role. I'm actually working in a windows-based mostly-DevOps type role, but we use almost zero opensource tools and it's very Microsoft centric.

The closest Linux-y roles that I might have a shot at getting into are "cloud engineer" type roles, with a heavy emphasis on AWS - and I hate AWS with a passion (just as much as I hate Azure).

Regardless, the biggest issue is getting that interview call - now in the age of AI, people are faking their CVs and companies are getting flooded with hundreds or thousands of junk applications, so getting that interview call - especially when you don't meet their professional experience requirements - is next to impossible. I could have all the Kuberneres certs in the world, but what's the point if I get filtered out right at the first stage?


Start introducing it where you are. I was an early advocate for the use of WSL2/Docker and along with that a push towards deploying to Linux initially as a cost saving as projects started shifting away from .Net Framework and into .Net Core and Node that were actually easier to deploy to Linux... WSL/Docker became a natural fit as it was "closer to production" for the development workflow.

It's not always possible, but there are definitely selling points that can help you introduce these things. Hell, scripting out the onboarding chores from a clean windows install (powershell to bootstrap in windows, then bash, etc for the WSL environment) with only 3-4 manual steps... and you get a new dev onboarded in a couple hours with a fully working environment and software stack, including an initialized database... You can raise some eyebrows.

Do the same for automated deployments on your existing projects... shift the testing environments to Linux as a "test" or "experiment" ... you can eat away at both directions.

Before you know it, developers can choose windows or mac instead of one or the other, and can use whatever editor they like. Maybe still stuck with C# or MS-SQL, maybe PostgreSQL for green projects.


I thought you were asking for advice. Sorry.


It’s been 17 years since I got my first Linux job in 2008. Where I live, that’s rare, 99% of the industry here is a 'Microsoft Shop,' and the biggest player in town is practically married to them.

I started out at a small Linux company working with Plone CMS. The pay wasn’t great, but it was the perfect place to learn Linux and Python. Since then, I’ve used Linux every single day, become a Java developer, and started a few businesses. Using Linux of course.

But lately, things are changing. Companies are realizing that when it comes to Data Engineering and Science, C# just can't compete with Python's ecosystem. Now that they need to pivot, they're looking for help, and there are very few people in this area with the experience to answer that call.


I was working in a Windows-centric environment and started using ProxMox as the hypervisor instead of Windows Server. This combined with my self. Hosting hobby (Proxmox mini PC cluster, network diagrams of vlans, self hosting my own blog website, having a handful of small tools in my git repos) was what sold my current company on hiring me, more than my resume of working in tech.


You can make almost any job into a Linux job. Use a linux VM on your desktop to solve a problem for the company. Things change once your employer knows its essential.

I've also seen Linux make inroads in "windows only" enterprises when it became essential for performance reasons. A couple of times, towards the start of a project, windows APIs were discovered to be too slow to meet requirements:

In one case, customer needed us to send a report packet every 40ms. But even when we passed "0" to the windows Sleep() function, it would sometimes stop our program for 100ms at a time. The sleep function on linux was highly accurate, so we shipped linux. Along the way 5-6 devs switched to, or got a second PC to run linux.

In another case, we needed to saturate a 10GbE link with a data stream. We evaluated windows with a simple program:

   while(1) send(sock, &buffer, len(buffer);
... but we found windows could only squeeze out 10% of the link capacity. Linux, on the other hand, could saturate the 10GbE link before we had even done any performance tuning. On linux, our production program met all requirements while using only 3% CPU usage. Windows simply couldn't touch this. More devs learned linux to support this product.

Those companies still don't require linux skills when hiring, because everyone there was once a windows guy who figured it out on the job. But when we see linux abilities on the resume it gives candidates a boost because we know they'll be up to speed faster.


Lie and learn.


That's the way.


In my experience

- Try to find a way to not go to the meeting. Anything you say, especially the most insignificant part, will be used against _someone_ in an argument that doesn't make any sense. You're going to feel the need to correct their misunderstanding and misuse of what you said. You might even try to re-focus the discussion back to the important thing you were _trying_ to say. It only goes down hill from there. You're better off interfacing with a group of C-level people through documents.

- 1:1 meetings can work. Make sure you can back up everything you say with data.

- You're a developer, you can't estimate time and effort for shit. If asked, say you'll get with your manager or the PjM or w/e to get a date.

- Find out from the person that asked you to join what you should be prepared to speak to.

- If there's an agenda or documents that will be discussed, read them before the meeting. Doesn't matter if they plan to read it during the meeting.

- No hemming and hawing. If you don't understand what you're being asked, ask for clarification. If you don't have an answer you're confident in, say so. If they insist, prefix your crisp and concise answer with your level of confidence. "In my experience.." "From what I've read.." etc.


I'd like to add that, what you are describing, sounds like a pretty hostile environment. Luckily, I made the opposite experience. However, it also requires that you, the developer, are open and curious about the motifs that the C-Level has. Sure, you can sing the tantrum that you cannot estimate times - and I agree! But it helps to ask what they actually need, and why they need it. Maybe you can come up with better ways to help them get clarity. Remember - from their standpoint, it's all a big black box they cannot understand.

From my experience, execs want to know the current state, and also want to be able to intervene before a project derails. That's usually accomplished by open and coherent communication - a skill that is yet to be found by some developers. But you can work on it! ...if you want.


If I'm smart, I certainly don't feel like it.

I can tell you I do not enjoy thinking. I hate it. It is a compulsion that I cannot avoid. I know that it makes most interactions in my life more difficult. I know it's a source of unhappiness. I cannot stop thinking.

I want to do. Not think. I fail to do. I think about failure.


Two things. First, not all smart people are overthinkers and not all overthinkers are smart.

Second, I find that a great way to change one's self-damaging behavior is, rather than the therapy that is often recommended, to try to be as much as possible, relatively speaking, in the company of people who behave the way we would like to.

For the person who wants to exercise, but for some psychological hang-ups, can't, the company of people who exercise tends to be much more effective than finding out the root causes of the behavior. The same for thinking too much, eating too much, not being able to talk to other people.


You should look into meditation.

Let me explain.

Meditation teaches that your thoughts are uncontrolled expressions of your subconcious; as are your worries, your fears, your anxieties.

To meditate is not to stop thinking thoughts, but to observe them as they spontaneously appear, and - just as quickly - disappear. To recognize that you are not the thinker of your thoughts. To view them from a place of detachment and curious observation, instead of a place of investment and worry.


May I recommend an alternative to Eastern Meditation practices?

The alternative is Autogenic Training (AT), a method invented by Dr. Schultz a century ago. It is a well-tested scientific approach, and the outcomes are generally very positive, if not life-changing.

AT does not involve interpreting obscure texts written thousands of years ago in other languages and referring to ways of life that have long been forgotten.

AT does not require silent retreats or attending workshops and seminars at the end of which you are more confused than before. It is simple and just requires following the steps outlined by Schultz and his students.

I am surprised that it is not popular at all, but its strengths are also its weaknesses. Most people long for the esoteric and unexplained, while AT is clear, easy to understand and practice.


It would be more convincing if you explained what it actually is. Rather than what it is not.


There are books and Google and Wikipedia.

Like people refer to meditation and don't explain all the process involved in one of the traditions because there is a wealth of information available, I would much prefer to answer to specific questions on the practice instead of copying and pasting from Wikipedia, which I am doing now.

"The technique involves repetitions of a set of visualisations accompanied by vocal suggestions that induce a state of relaxation and is based on passive concentration of bodily perceptions like heaviness and warmth of limbs, which are facilitated by self-suggestions.Autogenic training is used to alleviate many stress-induced psychosomatic disorders"

The formulas are six: heaviness, warmth, heart beating regularly and strongly, calm breath, warm solar plexus, and cool forehead.

There's no vocal suggestion (the Wikipedia article is wrong in that regard), the formulas are repeated silently. It's a much more effective practice of the hocus-pocus that is often meditation of the Eastern tradition, especially the bastardized variety adopted in the West, and there are plenty of books and papers available on the results of scientific studies that measure the effect on soma and psyche of AT.


Sometimes I start thinking our brains work the same way as an LLM does when it comes to language processing. Are we just using probability based on what we already know and the context of the statement we're making to select the next few words? Maybe we apply a few more rules than an LLM on what comes next as we go.

We train ourselves on content. We give more weight to some content than others. While listening to someone speak, we can often predict their next words.

What is thinking without language? Without language are we just bags of meat reacting to instincts and emotions? Are instincts and emotions what's missing for AGI?


Has this person actually benchmarked kafka? The results they get with their 96 vcpu setup could be achieved with kafka on the 4 vcpu setup. Their results with PG are absurdly slow.

If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.


Exactly. Just yesterday someone posted how they can do 250k messages/second with Redpanda (Kafka-compatible implementation) on their laptop.

https://www.youtube.com/watch?v=7CdM1WcuoLc

Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.

Just because you can do something with Postgres doesn't mean you should.

> 1. One camp chases buzzwords.

> 2. The other camp chases common sense

In this case, is "Postgres" just being used as a buzzword?

[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]


Is it about what Kafka could get or what you need right now.

Kafka is a full on steaming solution.

Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.


> Kafka is a full on steaming solution.

Freudian slip? ;)


Haha, and a typo!


This sounded interesting to me, and it looks like the plan is to make Redpanda open-source at some point in the future, but there's no timeline: https://github.com/redpanda-data/redpanda/tree/dev/licenses


Correct. Redpanda is source-available.

When you have C++ code, the number of external folks who want to — and who can effectively, actively contribute to the code — drops considerably. Our "cousins in code," ScyllaDB last year announced they were moving to source-available because of the lack of OSS contributors:

> Moreover, we have been the single significant contributor of the source code. Our ecosystem tools have received a healthy amount of contributions, but not the core database. That makes sense. The ScyllaDB internal implementation is a C++, shard-per-core, future-promise code base that is extremely hard to understand and requires full-time devotion. Thus source-wise, in terms of the code, we operated as a full open-source-first project. However, in reality, we benefitted from this no more than as a source-available project.

Source: https://www.scylladb.com/2024/12/18/why-were-moving-to-a-sou...

People still want to get free utility of the source-available code. Less commonly they want be able to see the code to understand it and potentially troubleshoot it. Yet asking for active contribution is, for almost all, a bridge too far.


Note that prior to its license change ScyllaDB was using AGPL. This is a fully FLOSS license but may have been viewed nonetheless as somewhat unfriendly by potential outside contributors. The ScyllaDB license change was really more about not wanting to expend development effort on maintaining multiple versions of the code (AGPL licensed and fully proprietary), so they went for sort of a split-the-difference approach where the fully proprietary version was in turn made source-available.

(Notably, they're not arguing that open source reusers have been "unfair" to them and freeloaded on their effort, which was the key justification many others gave for relicensing their code under non-FLOSS terms.)

In case anyone here is looking for a fully-FLOSS contender that they may want to perhaps contribute to, there's the interesting project YugabyteDB https://github.com/yugabyte/yugabyte-db


I think AGPL/Proprietary license split and eventual move to proprietary is just a slightly less overt way of the same "freeloader" argument. The intention of the original license was to make the software unpalatable to enterprises unless you buy the proprietary license, and one "benefit" of the move (at least for the bean counters) is that it stops even AGPL-friendly enterprises from being able to use the software freely.

(Personally, I have no issues with the AGPL and Stallman originally suggested this model to Qt IIRC, so I don't really mind the original split, but that is the modern intent of the strategy.)


I think the intention of the original license was to make the software unpalatable to SaaS vendors who want to keep their changes proprietary, not unpalatable to enterprises in general.


Rightly or wrongly, large companies are very averse to using AGPL software even if it would cause them very little additional burden to comply with the AGPL. Lots of projects use this cynically to help sell proprietary licenses (the proof of this is self-evident -- many such projects have CLAs and were happy to switch to a proprietary license that is even less favourable to enterprises than the AGPL as soon as it was available).

Again, I'm happy to use AGPL software, I just disagree that the intent here is that different to any of the other projects that switched to the proprietary BSL.


I haven't actually talked with Henry Poole about the subject, but I'm pretty sure that was not his intent when he wrote it.


You are obviously free to choose to use a proprietary license, that's fine -- but the primary purpose of free licenses has very little to do with contributing code back upstream.

As a maintainer of several free software projects, there are lots of issues with how projects are structured and user expectations, but I struggle to see how proprietary licenses help with that issue (I can see -- though don't entirely buy -- the argument that they help with certain business models, but that's a completely different topic). To be honest, I have no interest in actively seeking out proprietary software, but I'm certainly in the minority on that one.


Right, open source is generally of benefit to users, not to the author, and users do get some of that benefit from being able to see the source. I wouldn't want to look at it myself, though, for legal reasons.


You can be open source and not take contributions. This argument doesn't make sense to me. Just stop doing the expensive part and keep the license as is.


I think the argument is that, if they expected to receive high-quality contributions, then they'd be willing to take the risk of competitors using their software to compete with them, which an open-source license would allow. It usually doesn't work out that way; with a strong copyleft license, your competitors are just doing free R&D improving your own product, unless they can convince your customers that they know more about the product than the guys who wrote it in the first place. But that's usually the fear.

On the other hand, if they don't expect people outside their company to know C++ well enough to contribute usefully, they probably shouldn't expect people outside their company to be able to compete with them either.

Really, though, the reason to go open-source is because it benefits your customers, not because you get contributions, although you might. (This logic is unconvincing if you fear they'll stop being your customers, of course.)


The statement is untrue. For example, ClickHouse is in C++, and it has thousands of contributors with hundreds of external contributors every month.


I think it's reasonably common for accepting external contributions to an open-source project to be more trouble than it's worth, just because most programmers aren't very good.


I often use a different approach - assume by default that external contributors are smarter than our employees. This is needed to prevent arrogance and entitlement during code reviews. A reasonable pull request from an external contributor is more valuable than one from an employee.


Your name sounds familiar. I think you may be one of the people at RedPanda with whom I’ve corresponded. It’s been a few years though, so maybe not.

A colleague and I (mostly him, but on my advice) worked up a set of patches to accept and emit JSON and YAML in the CLI tool. Our use case at the time was setting things up with a config management system using the already built tool RedPanda provides without dealing with unstructured text.

We got a lot of good use out of RedPanda at that org. We’ve both moved on to a new employer, though, and the “no offering RedPanda as a service” spooked the company away from trying it without paying for the commercial package. Y’all assured a couple of us that our use case didn’t count as that, but upper management and legal opted to go with Kafka just in case.


Doesn’t Kafka/Redpanda have to fsync for every message?


Yes, for Redpanda. There's a blog about that:

"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."

However, for all that said, Redpanda is still blazingly fast.

https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...


I'm highly skeptical of the method employed to simulate unsync'd writes in that example. Using a non-clustered zookeeper and then just shutting it down, breaking the kafka controller and preventing any kafka cluster state management (not just preventing partition leader election) while manually corrupting the log file. Oof. Is it really _that_ hard to lose ack'd data from a kafka cluster that you had to go to such contrived and dubious lengths?


> while manually corrupting the log file

To be fair, since without fsync you don't have any ordering guarantees for your writes, a crash has a good chance of corrupting your data, not just losing recent writes.

That's why in PostgreSQL it's feasible to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G... but not to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G....


I just read the post and didn’t find it contrived at all. The point is to simulate a) network isolation and b) loss of recent writes.


Kafka no longer has Zookeeper dependency and RedPanda never did (this is just an aside for those reading along, not a rebuttal).



I've never looked at redpanda, but kafka absolutely does not. Kafka uses mmapped files and the page cache to manage durable writes. You can configure it to fsync if you like.


If I don’t actually want durable and consistent data, I could also turn off fsync in Postgres …


The tradeoff here is that Kafka will still work perfectly if one of its instances goes down. (Or you take it down, for upgrades, etc.)

Can you lose one Postgres instance?


AIUI Postgres has high-availability out of the box, so it's not a big deal to "lose" one as long as a secondary can take over.


Only replication is built-in, you need to add a cluster manager like Patroni to make it highly-available.


Definitely not in the case of Kafka. Even with SSD that would limit it to around 100kHz. Batch commit allows Kafka (and Postgres) to amortize fsync overhead over many messages.


On enterprise grade storage writes go to NVRAM buffers before being flushed to persistent storage so this isn't much of a bottleneck.


The context was somebody doing this on their laptop.


I was expanding the context


No, it's for every batch.


To the issue of complexity, is Redpanda suitable as a "single node implementation" where a Kafka cluster is not needed due to data volume, but the Kafka message bus pattern is desired?

AKA "Medium Data" ?


Yes. I’ve run projects where it was used that way.

It also scales to very large clusters.


Can you give some examples? I'm super curious about single-node Kafka use cases in general


I may be reading a bit extra, but my main take on this is: "in your app, you probably already have PostgreSQL. You don't need to set up an extra piece of infrastructure to cover your extra use case, just reuse the tool you already have"

It's very common to start adding more and more infra for use cases that, while technically can be better cover with new stuff, it can be served by already existing infrastructure, at least until you have proof that you need to grow it.


> If you don't need what kafka offers, don't use it.

This is literally the point the author is making.


It seems like their point was to criticize people for using new tech instead of hacking together unscalable solutions with their preferred database.


That wasn't their point. Instead of posting snarky comments, please review the site guidelines:

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."


But honestly, isn't that the strongest plausible interpretation according to the "site guidelines" ? When one explicitly says that the one camp chases "buzzwords" and the other chases "common sense", how else are you supposed to interpret it ?


> how else are you supposed to interpret it?

It's not so hard. You interpret it how it is written. Yes, they say one camp chases buzzwords and another chases common sense. Critique that if you want to. That's fine.

But what's not written in the OP is some sort of claim that Postgres performs better than Kafka. The opposite is written. The OP acknowledges that Kafka is fast. Right there in the title! What's written is OP's experiments and data that shows Postgres is slow but can be practical for people who don't need Kafka. Honestly I don't see anything bewildering about it. But if you think they're wrong about Postgres being slow but practical that's something nice to talk about. What's not nice is to post snarky comments insinuating that the OP is asking you to design unscalable solutions.


Which is crazy, because Kafka is like olllld compared to competing tech like Pulsar and RedPanda. I'm trying to remember what year I started using v0.8, it was probably mid-late 2010s?


But in this case, it is like saying "You don't need a fuel truck. You can transport 9,000 gallons of gasoline between cities by gathering 9,000 1-gallon milk jugs and filling each, then getting 4,500 volunteers to each carry 2 gallons and walk the entire distance on foot."

In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.

[Disclosure: I work for Redpanda]


I'll push the metaphor a bit: I think the point is that if you have a fleet of vehicles you want to fuel, go ahead and get a fuel truck and bite off on that expense. However, if you only have 1 or 2, a couple of jerry cans you probably already have + a pickup truck is probably sufficient.


Getting a 288-core machine might be easier than setting up Kafka; I'm guessing that it would be a couple of weeks of work to learn enough to install Kafka the first time. Installing Postgres is trivial.


"Lots of the team knows Postgres really well, nobody knows Kafka at all yet" is also an underrated factor in making choices. "Kafka was the ideal technical choice but we screwed up the implementation through well-intentioned inexperience" being an all too plausible outcome.


Indeed, I've seen this happen first hand where there was really only one guy who really "knew" Kafka, and it was too big of a job for just him. In that case it was fine until he left the company, and then it became a massive albatross and a major pain point. In another case, the eng team didn't really have anyone who really "knew" Kafka but used a managed service thinking it would be fine. It was until it wasn't, and switching away is not a light lift, nor is mass educating the dev team.

Kafka et al definitely have their place, but I think most people would be much better off reaching for a simpler queue system (or for some things, just using Postgres) unless you really need the advanced features.


I'm wondering why there wasn't any push for the Kafka guy to share his knowledge within his team, or to other teams?


Multiple factors (neither a good excuse, just reality):

* Lack of interest for other team members, which translated to doing what they thought was a sufficiently minimal amount of knowledge transfer

* An (unwise) attitude that "it's already set up and configured, and terraformed, so we can just acquire that knowledge if and when it's needed"

* Kafka guy left a lot faster than anybody really expected, not leaving much time and practically no documentation

* The rest of the team was already overwhelmed with other responsiblities and didn't have much bandwidth available

* Nobody wanted to be the person/people that ended up "owning" it, so there was a reverse incentive


Interesting, thanks!


This is the crux of my point.

Postgres is the solution in question of the article because I simply assume the majority of companies will start with Postgres as their first piece of infra. And it is often the case. If not - MySQL, SQLite, whatever. Just optimize for the thing you know, and see if it can handle your use case (often you'll be surprised)


The only thing that might take "weeks" is procrastination. Presuming absolutely no background other than general data engineering, a decent beginner online course in Kafka (or Redpanda) will run about 1-2 hours.

You should be able to install within minutes.


I mean, setting up Zookeeper, tweaking the kernel settings, configuring the hardware, the kind of stuff mentioned in guides like https://medium.com/@ankurrana/things-nobody-will-tell-you-se... and https://dungeonengineering.com/the-kafkaesque-nightmare-of-m.... Apparently you can do without Zookeeper now, but that's another choice to make, possibly doing careful experiments with both choices to see what's better. Much more discussion in https://news.ycombinator.com/item?id=37036291.

None of this applies to Redpanda.


True. Redpanda does not use Zookeeper.

Yet to also be fair to the Kafka folks, Zookeeper is no longer default and hasn't been since April 2025 with the release of Apache Kafka 4.0:

"Kafka 4.0's completed transition to KRaft eliminates ZooKeeper (KIP-500), making clusters easier to operate at any scale."

Source: https://developer.confluent.io/newsletter/introducing-apache...


Right, I was talking about installing Kafka, not installing Redpanda. Redpanda may be perfectly fine software, but bringing it up in that context is a bit apples-and-oranges since it's not open-source: https://news.ycombinator.com/item?id=45748426


Good on you for being fair in this discussion :)


Just use Strimzi if you're in a K8s world (disclosure used to work on Strimzi for RH, but I still think it's far better than Helm charts or fully self-managed, and far cheaper than fully managed).


Thanks! I didn't know about Strimzi!


Even though I'm a few years on from Red Hat, I still really recommend Strimzi. I think the best way to describe it is "a sorta managed Kafka". It'll make things that are hard in self-managed Kafka (like rolling upgrades) easy as.


>> If you don't need what kafka offers, don't use it.

> This is literally the point the author is making.

Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!


In fact, a properly-configured Kafka cluster on minimal hardware will saturate its network link before it hits CPU or disk bottlenecks.


Isn't that true for everything on the cloud? I thought we are long into the era where your disk comes over the network there.


Depends on how you configure the clients, ask me how I know that using a K8s pod id in a consumer group id is a really bad idea - or how setting batch size to 1 and linger to 0 is a really bad idea - the former blows up disk (all those unique consumer groups cause the backing topic to consume a lot of space, as the topic is by default only compacted) and the latter thrashes request handler CPU time.


But it can do so many processes a second I’ll be able to scale to the moon before I ever launch.


This doesn't even make sense. How do you know what the network links or the other bottlenecks are like? There are a grandiose number of assumptions being made here.


There is a finite and relatively narrow range of ratios of CPU, memory, and network throughput in both modern cloud offerings and bare hardware configurations.

Obviously it's possible to build, for example, a machine with 2 cores, a 10Gbps network link, and a single HDD that would falsify my statement.


But the workload matters. Even the comment in the article doesn't completely make sense for me in that way -- if your workload is 50 operations per byte transferred versus 5000 operations per byte transferred, there is a considerable difference in hardware requirements.


Exactly. "a properly-configured Kafka cluster" implies you have very properly configured your clients too, which is almost never the case because it's practically very hard to do in the messy reality of a large-scale organization.

Even if you somehow get everyone to follow best-practices, you most likely still won't get to saturate the network on "minimal hardware". The number of client connections and requests per second will likely saturate your "minimal CPU".

It's true that minimal hardware on Kafka can saturate the network, but this mostly happens in low-digit client scenarios. In practice, orgs pushing serious data have serious client counts.


A network link can be anything from 1Gbps to 800Gbps.


The 96 vcpu setup with 24xlarge instance costs about $20k/month on AWS before discounts. And one thing you don’t want in a pub sub system is a single instance taking all the read/writes. You can run a sizeable Kafka cluster for that kind of money in AWS.


This is why benchmarks should be hardware limit based IMO. Like I am maxing IOPS/throughput of this ssd or maxing out the network card etc.

CPU is more tricky but I’m sure it can be shown somehow


I remember doing 900k writes/s (non-replicated) already back on kafka 0.8 with a random physical server with an old fusionio drive (says something about how long ago this was :D).

It's a fair point that if you already have a pgsql setup, and only need a few messages here and there, then pg is fine. But yeah, the 96 vcpu setup is absurd.


> Has this person actually benchmarked kafka?

Is anyone actually reading the full article, or just reacting to the first unimpressive numbers you can find and then jumping on the first dismissive comment you can find here?

Benchmarking Kafka isn't the point here. The author isn't claiming that Postgres outperforms Kafka. The argument is that Postgres can handle modest messaging workloads well enough for teams that don't want the operational complexity of running Kafka.

Yes, the throughput is astoundingly low for such a powerful CPU but that's precisely the point. Now you know how well or how bad Postgres performs on a beefy machine. You don't always need Kafka-level scale. The takeaway is that Postgres can be a practical choice if you already have it in place.

So rather than dismissing it over the first unimpressive number you find, maybe respond to that actual matter of TFA. Where's the line where Postgres stops being "good enough"? That'll be something nice to talk about.


Then the author should have gone on to discuss not just the implementation they now have to maintain, but also all the client implementations they'll have to keep re-creating for their custom solution. Or they could talk about all the industry standard tools that work with kafka and not their custom implementation.

Or they could have not mentioned kafka at all and just demonstrated their pub/sub implementation with PG. They could have not tried to make it about the buzzword resume driven engineering people vs. common sense folks such as himself.


These are real trade offs!

Client impl. can be re-created once in a library/extension and be done with.

The network effect of Kafka is the hardest to topple. The API is standard. If you need that connectivity, then use Kafka.

An alternative idea I did not mention is to use a Kafka API proxy on top of Postgres. Tansu is developing this (amongst other pluggable backends) - https://github.com/tansu-io/tansu; That solution would retain both the client implementations and tools (network effect). But it comes at extra complexity


The problem is benchmarking on the 96 vcpu server, because at that point the author seems to miss the point of Kafka. That's just a waste of money for that performance.


And if the OP hadn't done that, someone here would complain, why couldn't the OP use a larger CPU and test if Postgres performs better? Really, there is no way the OP can win here, can they?

I'm glad the OP benchmarked on the 96 vCPU server. So now I know how well Postgres performs on a large CPU. Not very well. But if the OP had done their benchmark on a low CPU, I wouldn't have learned this.


you're missing the point. Postgres performs well on large CPU. Postgres as-used by OP does not and is a waste of money. It's great that he benchmarked for a larger CPU, that's not what people are disputing, they are disputing the ridiculous conclusion.


I wonder if OP could have got different results if they implemented a different schema as opposed to mimicking Kafka's setup with the partitions, consumer offsets, etc.

I might well be talking out of my arse but if you're going to implement pub/sub in Postgres, it'd be worth designing around its strengths and going back to basics on event sourcing.


Had the same thoughts, weird it didn't include Kafka numbers.

Never used Kafka myself, but we extensively use Redis queues with some scripts to ensure persistency, and we hit throughputs much higher than those in equivalent prod machines.

Same for Redis pubsubs, but those are just standard non-persistent pubsubs, so maybe that gives it an upper edge.


Just checked my single node Kafka setup which currently handles 695.27k e/s (average daily) into elasticsearch without breaking a sweat. kafka has been the only stable thing in this whole setup.

zeek -> kafka -> logstash -> elastic


how is node failure handled? is this using KRaft or ZK?


out of curiosity, what does your service do that it handles almost 700K events/sec?


Niri convinced me Scrolling is The Way.

I really want windows to be able to span columns. So if I have 1 column with two windows and focus on the bottom window then create a new window/column to the right, I want that new window to be on the bottom half of column 2. I want the window from the top of column 1 to stretch across columns 1 and 2. If I again create another window/column to the right, that top left window should stretch across columns 1-3. So I should have one very wide window across the top of the screen and 3 windows across the bottom.

I've started playing with this idea in the hyprland hyprscrolling plugin but I'm kind of an idiot and don't have much free time these days.


As far as I understand it, it's not just that they need for no American citizen to take the job. They need for no American citizen to apply for it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: