More

cletus · 2026-01-05T21:23:35 1767648215

So there's some survivor bias here but it's generally not bad advice. You should be focusing on outcomes like improving SLAs, top line metrics and so on. You should be solving user and business problems. That's all good advice. But still this article presumes a lot.

In my experience, managers will naturally partition their reports into three buckets: their stars, their problems and their worker bees. The worker bees tend to be ignored. They're doing fine. They're getting on with whatever they've been told to do or possibly what they've found to do. They're not going to create any problems. The problems are the underperformers. These are people who create problems and/or are at risk of getting a subpar performance rating.

Now there are lots of reasons that someone can be a problem. I tend to believe that any problem just hasn't found the right fit yet and, until proven otherwise, problems are a failure in management. That tends to be a minority view in practice. It's more common to simply throw people in the deep end and sink or swim because that takes much less overhead. You will see this as teams who have a lot of churn but only in part of the team. In particularly toxic environments, savvy managers will game the system by having a sacrificial anode position. They hire someone to take the bad rating they have to give to protect the rest of the team.

And then there are the stars. These are the people you expect to grow and be promoted. More often than not however they are chosen rather than demonstrating their potential. I've seen someone shine when their director is actively trying to sabotage them but that's rare.

Your stars will get the better projects. Your problems will get the worse ones. If a given project is a success or not will largely come down to perception not reality.

The point I'm getting to is that despite all the process put around this at large companies like performance ratings, feedback, calibration, promo committees, etc the majority of all this is vibes based.

So back to the "take my job" advice. If someone is viewed as a star, that's great advice. For anyone else, you might get negative feedback about not doing your actual job, not being a team player and so on. I've seen it happen a million times.

And here's the dirty little secret of it all: this is where the racism, sexism and ableism sneaks in. It's usually not that direct but Stanford grads (as just one example) will tend to vibe with other Stanford grads. They have common experience, probably common professors and so on. Same for MIT. Or CMU. Or UW. Or Waterloo. And so on.

So all of the biases that go into the selection process for those institutions will bleed into the tech space.

And this kind of environment is much worse for anyone on the spectrum because allistic people will be inclined to dislike from the start for no reason and that's going to hurt how they're viewed (ie as a star, a worker bee or a problem) and their performance ratings.

Because all of this is ultimately just a popularity contest with very few exceptions. I've seen multiple people finagle their way to Senior STaff SWE on just vibes.

And all of this gets worse since the tech sector has joined Corporate America in being in permanet layoff mode. The Welchian "up or out" philosophy has taken hold in Big Tech where there are quotas of 5-10% of the workforce have to get subpar ratings every year and that tends to kill their careers at that company. This turns the entire workplace even more into an exercise in social engineering.

jimbokun · 2026-01-05T22:10:08 1767651008

Yeah the only solution to avoid this is to find a company where building and selling a product actually matters. In large companies it’s too easy to fudge the connection between individual contributions and financial impact.

If you’re not looking to become a founder, companies right around 100 employees is the sweet spot in my (very limited) experience.

cletus · 2026-01-04T18:18:40 1767550720

I'm going to pick out 3 points:

> 2. Being right is cheap. Getting to right together is the real work

> 6. Your code doesn’t advocate for you. People do

> 14. If you win every debate, you’re probably accumulating silent resistance

The common thread here is that in large organizations, your impact is largely measured by how much you're liked. It's completely vibes-based. Stack ranking (which Google used to have; not sure if it still does) just codifies popularity.

What's the issue with that? People who are autistic tend to do really badly through no fault of their own. These systems are basically a selection filter for allistic people.

This comes up in PSC ("perf" at Meta, "calibration" elsewhere) where the exact same set of facts can be constructed as a win or a loss and the only difference is vibes. I've seen this time and time again.

In one case I saw a team of 6 go away and do nothing for 6 months then come back and shut down. If they're liked, "we learned a lot". If they're not, "they had no impact".

Years ago Google studied the elements of a successful team and a key element was psychological safety. This [1] seems related but more recent. This was originally done 10-15 years ago. I agree with that. The problem? Permanent layoffs culture, designed entirely to suppress wages, kills pyschological safety and turns survival into a game of being liked and manufacturing impact.

> 18. Most performance wins come from removing work, not adding cleverness

One thing I really appreciated about Google was that it has a very strict style guide and the subset of C++ in particular that you can use is (was?) very limited. At the time, this included "no exceptions", no mutable function arguments and adding templtes had an extremely high bar to be allowed.

Why? To avoid arguments about style issues. That's huge. But also because C++ in particular seemed to attract people who were in love with thier own cleverness. I've seem some horrific uses of templates (not at Google) that made code incredibly difficult to test for very little gain.

> 9. Most “slow” teams are actually misaligned teams

I think this is the most important point but I would generalize it and restate it as: most problems are organizational problems.

At Meta, for example, product teams were incentivized to ship and their impact was measured in metric bumps. But there was no incentive to support what you've already shipped beyond it not blowing up. So in many teams there was a fire and forget approach to filing a bug and forgetting about it, to the point where it became a company priority to have SLAs on old bugs, which caused the inevitable: people just downgrading bug priorities to avoid SLAs.

That's an organizational problem where the participants have figured out that shiping is the only thing they get rewarded for. Things like documentation, code quality and bug fixes were paid lip service to only.

Disclaimer: Xoogler, ex-Facebooker.

[1]: https://www.aristotleperformance.com/post/project-aristotle-...

cletus · 2026-01-04T15:41:57 1767541317

The mistake was not having nullability be expressed in the type system.

At Facebook I used their PHP fork Hack a lot and Hack has a really expressive type system where PHP does not. You can express nullability of a type and it defaults to a type being non-nullable, which is the correct default. The type checker was aware of changes too, so:

    function foo(?A $a): void {
      $a->bar(); // compile error, $a could be null
      if ($a is null) {
        return;
      }
      $a->bar(); // not a compiler error because $a is now A not ?A
      if ($a is ChildOfA) {
        $a->childBar(); // not an error, in this scope $a is ChildOfA
      }
    }

Now Hack like Java used type erasure so you could force a null into something non-nullable if you really wanted to but, in practice, this almost never happened. A far bigger problem was dealing with legacy code that was converted with a tool and returned or used the type "mixed", which could be literally anything.

The real problem with Java in particular is you'd end up chaining calls then get the dreaded NullPointerException and have no idea from the error or the logs what was broken from:

   a.b.c.d();

I'm fine with things like Option/Maybe types but to me they solve different problems. They're a way of expressing that you don't want to specify a value or that a value is missing and that's different to something being null (IMHO).

jedwards1211 · 2026-01-05T14:06:18 1767621978

Yeah in Typescript I rarely run into null deference errors at runtime either. It can happen if you unsafely cast the type of values coming into your code, but if you runtime validate input at your application boundary it’s very unlikely

Now looking back at a lot of other languages that don’t express nullability, it’s like, what were they thinking? How did I not wish for nullability in type declarations in all my years of dealing with NullPointerExceptions?

MichaelNolan · 2026-01-05T18:48:12 1767638892

> The real problem with Java in particular is you'd end up chaining calls ... and have no idea from the error or the logs what was broken from: a.b.c.d();

That’s been solved since Java 14. (5 years ago) Now the error will tell you exactly what was null.

And “soon” Java will have built in support for expressing nullability in the type system. Though with existing tools like NullAway it’s already (in my opinion) a solved problem.

mirekrusin · 2026-01-04T20:31:33 1767558693

Exactly, null is not evil it itself – the fact that it's not represented in type system is.

Type system where nullability can be expressed, you have refinement so you can map "null | T" to "T" with conditionals and sugar like optional chaining and nullish coalescing is all that's needed.

hu3 · 2026-01-04T22:37:35 1767566255

Neat.

In PHP land, for some years now that code would not pass CI/CD checks and IDEs show red squiggles. Provided they use any popular static analysis tools like PHPStan, Psalm and I believe SonarQube would also flag it.

cletus · 2026-01-04T05:30:44 1767504644

As an early user of SO [1], I feel reasonably qualified to discuss this issue. Note that I barely posted after 2011 or so so I can't really speak to the current state.

But what I can say is that even back in 2010 it was obvious to me that moderation was a problem, specifically a cultural problem. I'm really talking about the rise of the administrative/bureaucratic class that, if left unchecked, can become absolute poison.

I'm constantly reminded of the Leonard Nimoy voiced line from Civ4: "the bureaucracy is expanding to meet the needs of the expanding bureaucracy". That sums it up exactly. There is a certain type of person who doesn't become a creator of content but rather a moderator of content. These are people who end up as Reddit mods, for example.

Rules and standards are good up to a point but some people forget that those rules and standards serve a purpose and should never become a goal unto themselves. So if the moderators run wild, they'll start creating work for themselves and having debates about what's a repeated question, how questions and answers should be structured, etc.

This manifested as the war of "closed, non-constructive" on SO. Some really good questions were killed this way because the moderators decided on their own that a question had to have a provable answer to avoid flame wars. And this goes back to the rules and standards being a tool not a goal. My stance was (and is) that shouldn't we solve flame wars when they happen rather than going around and "solving" imaginary problems?

I lost that battle. You can argue taht questions like "should I use Javascript or Typescript?" don't belong on SO (as the moderators did). My position was that even though there's no definite answer, somebody can give you a list of strengths and weaknesses and things to consider.

Even something that does have a definite answer like "how do I efficiently code a factorial function?" has multiple but different defensible answers. Even in one language you can have multiple implementations that might, say, be compile-time or runtime.

Another commenter here talked about finding the nearest point on an ellipse and came up with a method they're proud of where there are other methods that would also do the job.

Anyway, I'd occasionally login and see a constant churn on my answers from moderators doing pointless busywork as this month they'd decided something needed to be capitalized or not capitalized.

A perfect example of this kind of thing is Bryan Henderson's war on "comprised of" on Wikipedia [2].

Anyway, I think the core issue of SO was that there was a lot of low-hanging fruit and I got a lot of accepted answers on questions that could never be asked today. You'll also read many anecdotes about people having a negative experience asking questions on SO in later years where their question was immediately closed as, say, a duplicate when the question wasn't a duplicate. The moderator just didn't understand the difference. That sort of thing.

But any mature site ultimately ends with an impossible barrier to entry as newcomers don't know all the cultural rules that have been put in place and they tend to have a negative experience as they get yelled at for not knowing that Rule 11.6.2.7 forbids the kind of question they asked.

[1]: https://stackoverflow.com/users/18393/cletus

[2]: https://www.npr.org/2015/03/12/392568604/dont-you-dare-use-c...

sevenseacat · 2026-01-04T12:02:41 1767528161

> This manifested as the war of "closed, non-constructive" on SO. Some really good questions were killed this way because the moderators decided on their own that a question had to have a provable answer to avoid flame wars.

It's literally a Q&A site. Questions need actual answers, not just opinions or "this worked for me".

zahlman · 2026-01-04T07:31:33 1767511893

> This manifested as the war of "closed, non-constructive" on SO. Some really good questions were killed this way because the moderators decided on their own that a question had to have a provable answer to avoid flame wars.

Please point at some of these "really good" questions, if you saved any links. (I have privileges to see deleted questions; deletion is normally soft unless there's a legal requirement or something.) I'll be happy to explain why they are not actually what the site wanted and not compatible with the site's goals.

The idea that the question "should have provable answers" wasn't some invention of moderators or the community; it came directly from Atwood (https://stackoverflow.blog/2011/01/17/real-questions-have-an...).

> I lost that battle. You can argue taht questions like "should I use Javascript or Typescript?" don't belong on SO (as the moderators did). My position was that even though there's no definite answer, somebody can give you a list of strengths and weaknesses and things to consider.

Please read "Understanding the standard for "opinion-based" questions" (https://meta.stackoverflow.com/questions/434806) and "What types of questions should I avoid asking?" (https://stackoverflow.com/help/dont-ask).

shagie · 2026-01-04T13:56:08 1767534968

I believe that this tension about what type of questions was baked into the very foundation of StackOverflow.

https://www.joelonsoftware.com/2008/09/15/stack-overflow-lau...

> What kind of questions are appropriate? Well, thanks to the tagging system, we can be rather broad with that. As long as questions are appropriately tagged, I think it’s okay to be off topic as long as what you’re asking about is of interest to people who make software. But it does have to be a question. Stack Overflow isn’t a good place for imponderables, or public service announcements, or vague complaints, or storytelling.

vs

https://blog.codinghorror.com/introducing-stackoverflow-com/

> Stackoverflow is sort of like the anti-experts-exchange (minus the nausea-inducing sleaze and quasi-legal search engine gaming) meets wikipedia meets programming reddit. It is by programmers, for programmers, with the ultimate intent of collectively increasing the sum total of good programming knowledge in the world. No matter what programming language you use, or what operating system you call home. Better programming is our goal.

(the emphasis on "good" is in the original)

And this can be seen in the revision history of https://stackoverflow.com/posts/1003841/revisions (take note of revision 1 and the moderation actions 2011)

---

Questions that are fun and slightly outside of the intended domain of the site are manageable ... if there is sufficient moderation to keep those types of questions from sucking up all available resources.

That was the first failing of NotProgrammingRelated.StackExchange ... later Programming.StackExchange ... later SoftwareEngineering.StackExchange.

The fun things, while they were fun took way more moderation resources than was available. People would ask a fun question, get a good bit of rep - but then not help in curating those questions. "What is your favorite book" would get countless answers... and then people would keep posting the same answers rather than reading all of them themselves and voting to cause the "good" content to bubble up to the top.

That's why TeX can have https://tex.stackexchange.com/questions/tagged/fun and MathOverflow can have https://mathoverflow.net/questions/tagged/soft-question and https://mathoverflow.net/questions/tagged/big-list -- there is a very high ratio for the active in moderation to active users.

Stack Overflow kind of had this at its start... but over time the "what is acceptable moderation" was curtailed more and more - especially in the face of more and more questions that should be closed.

While fun questions are fun... the "I have 30 minutes free before my next meeting want to help someone and see a good question" is something that became increasingly difficult. The "Keep all the questions" ideal made that harder and so fewer and fewer of the - lets call them "atwoodians" remained. From where I sit, that change in corporate policy was completely solidified when Jeff left.

As moderation and curation restricted (changing the close reasons to more and more specific things - "it's not on that list, so you can't close it") meant that the content that was not as well thought out but did match the rules became more and more prevalent and overwhelmed the ability for the "spolskyites" to close since so many of the atwoodians have left.

What remained where shells of rules that were the "truce" in the tension between the atwoodians and spolskyites and a few people trying to fight the oncoming tide of poorly asked questions with insufficient and neglected tooling.

As the tide of questions went out and corporate realized that there was necessary moderation that wasn't happening because of the higher standards from the earlier days they tried to make it easier. The golden hammer of duplication was a powerful one - though misused in many cases. The "this question closes now because its poorly asked and similar to that other canonical one that works through the issue" was far easier than "close as {something}" that requires another four people to take note of it before the question gets an answer from the Fastest Gun in the West. Later the number of people needed was changed from needing five people to three, but by then there was tide was in retreat.

Corporate, seeing things there were fewer questions being asked measured this as engagement - and has tried things to increase engagement rather than good questions. However, those "let's increase engagement" efforts were also done with even more of a moderation burden upon the community without the tooling to fix the problems or help the diminishing number of people who were participating in moderating and curating the content of the site.

zahlman · 2026-01-04T23:24:09 1767569049

> As moderation and curation restricted (changing the close reasons to more and more specific things - "it's not on that list, so you can't close it") meant that the content that was not as well thought out but did match the rules became more and more prevalent and overwhelmed the ability for the "spolskyites" to close since so many of the atwoodians have left.

Just to make sure: I always got the impression that Atwood was the one who wanted to keep things strictly on mission and Spolsky was the one more interested in growing a community. Yes? I do get the impression that there was a serious ideological conflict there; between the "library of detailed, high-quality answers" and the, well, "to every question" (without a proper understanding of what should count as a distinct, useful question that can have a high-quality answer). But also, the reputation gamification was incredibly poorly thought out for the "library" goal (https://meta.stackexchange.com/questions/387356/the-stack-ex...). And I suspect they both shared blame in that.

A lot of it was also ignored for too long because of the assumption that a) the site would just die if it clamped down on everything from the start; b) the site would naturally attract experts with good taste in questions (including maybe even the ability to pose good https://en.wikipedia.org/wiki/Dorothy_Dixer questions) before the beginners ever cleared the barrier of trying to phrase a proper question instead of using a forum.

(Nowadays, there are still small forums all over the place. And many of them try to maintain some standards for the OP. And they're all plagued with neophytes who try to use the forum as if it were a chat room. The old adage about foolproofing rings true.)

Around 2014 is when the conflict really seems to have boiled over (as new question volume was peaking). Notably, that also seems to be when the dupe-hammer was introduced (https://meta.stackoverflow.com/questions/254589).

shagie · 2026-01-05T01:23:41 1767576221

Jeff was the author of https://stackoverflow.blog/2011/06/13/optimizing-for-pearls-... and was more focused on quality than community - his vision was the library.

Joel was indeed more community minded - though part of that community mindedness was also more expectations of community moderation than what the tooling was able to scale for.

And yes, they both were to blame for gamification - though part of that was the Web 2.0 ideals of the time and the hook to keep a person coming back to it. It was part of the question that was to be answered "how do you separate the core group from the general participants on a site?" ... and that brings me to "people need to read A Group Is Its Own Worst Enemy" ( https://news.ycombinator.com/item?id=23723205 ) to understand how it shaped Stack Overflow.

https://blog.codinghorror.com/its-clay-shirkys-internet-we-j... (2008)

https://web.archive.org/web/20110827205048/https://stackover... (Podcast #23 from 2011)

   Atwood: Maybe. But the cool thing about this is this is not just me, because that would be boring. It is actually me and Clay Shirky. You know, Clay Shirky is one of my heroes.

   Spolsky: Oh...

   Atwood: Yeah I know, it's awesome. So we get to talk about like building communities online and I get to talk about StackOverflow, you know, and all the lessons we've learned and, get to present with Clay. Obviously he's an expert so. That's one of the people that I have emailed actually, because I thought that would be good, because he is from New-York city as well. So we could A) show him the site and B) talk about the thing we are going to do together in March, because he needs to see the site to have some context. I mean I did meet him and talk to him about this earlier a few months ago, I think I mentioned it on the podcasts. But that was before we had sort of even going to beta, so there's really not a lot to show him. But I would love to show him in person. So we'll see if I'll hear back from him, I do not know.

https://meta.stackexchange.com/questions/105232/clay-shirkys... (2011)

2014 sounds about right for when it peaked... it was also when a lot of things hit the fan one after another. General stress, the decline of community moderation. The dup hammer was a way to try to reduce the amount of close votes needed - but in doing so it became "everything is a nail" when the dup hammer. It was used to close poor questions as dups of other questions ... and rather than making it easier to close questions that didn't fit well, corporate allowed the "everything is a dup" problem to fester.

That also then made Stack Overflow's search become worse. Consider https://meta.stackoverflow.com/a/262080 which provides itself as a timestamp of 2014...

    How much traffic do the questions that get duped to something bring? Especially the (currently) 410 questions linked to the Java NPE question.

That question now has 10,356 questions linked to it... and that's part of the "why search quality is going down" - because poor questions were getting linked and not deleted. Search went downhill, dupe hammer was over used because regular close votes took too long because community moderation was going down, which in turn caused people to be grumpy about "closed as dup" rather than "your question looks like it is about X, but lacks an MCVE to be able to verify that... so close it as a dup of X rather than needing 5 votes to get an MCVE close.. which would have been more helpful in guiding a user - but would mean people would start doing FGITW to answer it maybe and you'd get it as a dup of something else instead."

All sorts of problems around that time.

zahlman · 2026-01-05T13:43:11 1767620591

Thanks; lots of great information here.

Regarding duplicates and deletion you may be interested in my thoughts: https://meta.stackoverflow.com/questions/426214/when-is-it-a... ; https://meta.stackoverflow.com/questions/434215/where-do-the... ; https://meta.stackoverflow.com/questions/421677/closing-a-qu... seem relevant here, browsing through a search of my saved posts.

Having duplicates should make the search better, by pointing people who phrase the same problem in different ways to the same place. But low-quality questions often don't produce something searchable for others, and they cover topics relevant to people who lack search skills.

chris_wot · 2026-01-04T06:38:58 1767508738

Dunno why you are being downvoted - there is a certain type of person who contributes virtually nothing on Wikipedia except peripheral things like categories. BrownHairedGirl was the most toxic person in Wikipedia but she was lauded by her minions - and yet she did virtually no content creation whatsoever. Yet made millions of edits!

cletus · 2025-12-28T23:41:15 1766965275

My second project at Google basically killed mocking for me and I've basically never done it since. Two things happened.

The first was that I worked on a rewrite of something (using GWT no less; it was more than a decade ago) and they decided to have a lot of test coverage and test requirements. That's fine but they way it was mandated and implemented, everybody just testing their service and DIed a bunch of mocks in.

The results were entirely predictable. The entire system was incredibly brittle and a service that existed for only 8 weeks behaved like legacy code. You could spend half a day fixing mocks in tests for a 30 minute change just because you switched backend services, changed the order of calls or just ended up calling a given service more times than expected. It was horrible and a complete waste of time.

Even the DI aspect of this was horrible because everything used Guice andd there wer emodules that installed modules that installed modules and modifying those to return mocks in a test environment was a massive effort that typically resulted in having a different environment (and injector) for test code vs production code so what are you actually testing?

The second was that about this time the Java engineers at the company went on a massive boondoggle to decide on whether to use (and mandate) EasyMock vs Mockito. This was additionally a waste of time. Regardless of the relative merits of either, there's really not that much difference. At no point is it worth completely changing your mocking framework in existing code. Who knows how many engineering man-yars were wasted on this.

Mocking encourages bad habits and a false sense of security. The solution is to have dummy versions of services and interfaces that have minimal correct behavior. So you might have a dummy Identity service that does simple lookups on an ID for permissions or metadata. If that's not what you're testing and you just need it to run a test, doing that with a mock is just wrong on so many levels.

I've basically never used mocks since, so much so that I find anyone who is strongly in favor of mocks or has strong opinions on mocking frameworks to be a huge red flag.

throwaway7783 · 2025-12-29T02:52:41 1766976761

I'm not sure I understand. "The solution is to have dummy versions of services and interfaces that have minimal correct behavior".

That's mocks in a nutshell. What other way would you use mocks?

cletus · 2025-12-29T03:21:14 1766978474

Imagine you're testing a service to creates, queries and deletes users. A fake version of that service might just be a wrapper on a HashMap keyed by ID. It might have several fields like some personal info, a hashed password, an email address, whether you're verified and so on.

Imagine one of your tests is if the user deletes their account. What pattern of calls should it make? You don't really care other than the record being deleted (or marked as deleted, depending on retention policy) after you're done.

In the mock world you might mock out calls like deleteUserByID and make suer it's called.

In the fake world, you simply check that the user record is deleted (or marked as such) after the test. You don't really care about what sequence of calls made that happen.

That may sound trivial but it gets less trivial the more complex your example is. Imagine instead you want to clear out all users who are marked for deletion. If you think about the SQL for that you might do a DELETE ... WHERE call so your API call might look like that. But if the logic is more complicated? Where if there's a change where EU and NA users have different retention periods or logging requirements so they're suddenly handled differently?

In a mokcing world you would have to change all your expected mocks. In fact, implementing this change might require fixing a ton of tests you don't care about at all and aren't really being broken by the change regardless.

In a fake world, you're testing what the data looks like after you're done, not the specific steps it took to get there.

Now those are pretty simple examples because there's not much to do the arguments used and no return values to speak of. Your code might branch differently based on those values, which then changes what calls to expects and with what values.

You're testing implementation details in a really time-consuming yet brittle way.

throwaway7783 · 2025-12-29T05:05:19 1766984719

I am unsure I follow this. I'm generally mocking the things that are dependencies for the thing I'm really testing.

If the dependencies are proper interfaces, I don't care if it's a fake or a mock, as long as the interface is called with the correct parameters. Precisely because I don't want to test the implementation details. The assumption (correctly so) is that the interface provides a contract I can rely on.

In you example, the brittleness simply moves from mocks to data setup for the fake.

MrJohz · 2025-12-29T07:12:04 1766992324

The point is that you probably don't care that much how exactly the dependency is called, as long as it is called in such a way that it does the action you want and returns the results you're interested in. The test shouldn't be "which methods of the dependency does this function call?" but rather "does this function produce the right results, assuming the dependency works as expected?".

This is most obvious with complex interfaces where there are multiple ways to call the dependency that do the same thing. For example if my dependency was an SQL library, I could call it with a string such as `SELECT name, id FROM ...`, or `SELECT id, name FROM ...`. For the dependency itself, these two strings are essentially equivalent. They'll return results in a different order, but as long as the calling code parses those results in the right order, it doesn't matter which option I go for, at least as far as my tests are concerned.

So if I write a test that checks that the dependency was carried with `SELECT name, id FROM ...`, and later I decide that the code looks cleaner the other way around, then my test will break, even though the code still works. This is a bad test - tests should only fail if there is a bug and the code is not working as expected.

In practice, you probably aren't mocking SQL calls directly, but a lot of complex dependencies have this feature where there are multiple ways to skin a cat, but you're only interested in whether the cat got skinned. I had this most recently using websockets in Node - there are different ways of checking, say, the state of the socket, and you don't want to write tests that depend on a specific method because you might later choose a different method that is completely equivalent, and you don't want your tests to start failing because of that.

3uler · 2025-12-29T10:13:46 1767003226

The fakes vs mocks distinction here feels like a terminology debate masking violent agreement. What you’re describing as a “fake” is just a well-designed mock. The problem isn’t mocks as a concept, it’s mocking at the wrong layer. The rule: mock what you own, at the boundaries you control. The chaos you describe comes from mocking infrastructure directly. Verifying “deleteUserById was called exactly once with these params” is testing implementation, not behavior. Your HashMap-backed fake tests the right thing: is the user gone after the operation? Who cares how. The issue is finding the correct layers to validate behavior, not the implementation detail of mocks or fakes… that’s like complaining a hammer smashed a hole in the wall.

throwaway7783 · 2025-12-29T22:40:01 1767048001

In the SQL example, unless you actually use an SQL service as a fake, you cannot really quite get the fake do the right thing either. At which point, it's no longer a mock/fake test but an integration/DB test. Network servers are another such class and for most parts can be either mocked or faked using interface methods.

I would argue that (barring SQL), if there are too many ways to skin a cat, it is a design smell. Interfaces are contracts. Even for SQL, I almost end up using a repository method (findByXxx flavors) so it is very narrow in scope.

Jach · 2025-12-29T04:11:49 1766981509

The general term I prefer is test double. See https://martinfowler.com/bliki/TestDouble.html for how one might distinguish dummies, fakes, stubs, spies, and mocks.

Of course getting overly pedantic leads to its own issues, much like the distinctions between types of tests.

At my last Java job I used to commonly say things like "mocks are a smell", and avoided Mockito like GP, though it was occasionally useful. PowerMock was also sometimes used because it lets you get into the innards of anything without changing any code, but much more rarely. Ideally you don't need a test double at all.

phanimahesh · 2025-12-29T03:17:39 1766978259

There are different kinds of mocks.

Check function XYZ is called, return abc when XYZ is called etc are the bad kind that people were bit badly by.

The good kind are a minimally correct fake implementation that doesn't really need any mocking library to build.

Tests should not be brittle and rigidly restate the order of function calls and expected responses. That's a whole lot of ceremony that doesn't really add confidence in the code because it does not catch many classes of errors, and requires pointless updates to match the implementation 1-1 everytime it is updated. It's effectively just writing the implementation twice, if you squint at it a bit.

OrangeMusic · 2025-12-29T09:04:24 1766999064

The second way is usually referring to as "fakes", which are not a type of mocks but a (better) alternative to mocks.

bonesss · 2025-12-29T19:31:10 1767036670

In reflection heavy environments and with injection and reflection heavy frameworks the distinction is a bit more obvious and relevant (.Net, Java). In some cases the mock configuration blossoms to essentially parallel implementations, leading to the brittleness discussed earlier in the thread.

Technically creating a shim or stub object is mocking, but “faking” isn’t using a mocking framework to track incoming calls or internal behaviours. Done properly, IMO, you’re using inheritance and the opportunity through the TDD process to polish & refine the inheritance story and internal interface of key subsystems. Much like TDD helps design interfaces by giving you earlier external interface consumers, you also get early inheritors if you are, say, creating test services with fixed output.

In ideal implementations those stub or “fake” services answer the “given…” part of user stories leaving minimalistic focused tests. Delivering hardcoded dictionaries of test data built with appropriate helpers is minimal and easy to keep up to date, without undue extra work, and doing that kind of stub work often identifies early re-use needs/benefits in the code-base. The exact features needed to evolve the system as unexpected change requests roll in are there already, as QA/end-users are the systems second rodeo, not first.

The mocking antipatterns cluster around ORM misuse and tend to leak implementation details (leading to those brittle tests), and is often co-morbid with anemic domains and other cargo cult cruft. Needing intense mocking utility and frameworks on a system you own is a smell.

For corner cases and exhaustiveness I prefer to be able to do meaningful integration tests in memory as far as possible too (in conjunction with more comprehensive tests). Faster feedback means faster work.

throwaway7783 · 2025-12-29T05:27:51 1766986071

Why is check if XYZ is called with return value ABC bad, as long as XYZ is an interface method?

Why is a minimally correct fake any better than a mock in this context?

Mocks are not really about order of calls unless you are talking about different return values on different invocations. A fake simply moves the cheese to setting up data correctly, as your tests and logic change.

Not a huge difference either way.

rcxdude · 2025-12-29T08:09:33 1766995773

The point is to test against a model of the dependency, not just the expected behavour of the code under test. If you just write a mock that exactly corresponds to the test that you're running, you're not testing the interface with the underlying system, you're just running the (probably already perfectly understandable) unit through a rote set of steps, and that's both harder to maintain and less useful than testing against a model of the underlying system.

(And IMO this should only be done for heavyweight or difficult to precisely control components of the system where necessary to improve test runtime or expand the range of testable conditions. Always prefer testing as close to the real system as reasonably practical)

throwaway7783 · 2025-12-29T22:43:33 1767048213

But mocks are a model of the dependency. I don't quite see how a fake is a better model than a mock.

In any case, I agree testing close to a real system, with actual dependencies where possible is better. But that's not done with a fake.

rcxdude · 2025-12-29T22:55:23 1767048923

The kind of mocks the OP is arguing against are not really a model of the dependency, they're just a model of a particular execution sequence in the test, because the mock is just following a script. Nothing in it ensures that the sequence is even consistent with any given understanding of how the dependency works, and it will almost certainly need updating when the code under test is refactored.

throwaway7783 · 2025-12-31T17:55:50 1767203750

My point is that a fake doesn't magically fix this issue. Both are narrow models of the underlying interface. I don't still quite understand why a mock is worse than a fake, when it comes to narrow models of the interface. If there is a method that needs to be called with a specific set up, there is no practical difference between a fake and a mock.

Again, none of this is a replacement for writing integration tests where possible. Mocks have a place in the testing realm and they are not an inherently bad tool.

ahepp · 2025-12-29T04:21:34 1766982094

Mocking is testing how an interface is used, rather than testing an implementation. That's why it requires some kind of library support. Otherwise you'd just on the hook for providing your own simple implementations of your dependencies.

yearolinuxdsktp · 2025-12-29T04:41:59 1766983319

Heavy mocks usage comes from dogmatically following the flawed “most tests should be unit tests” prescription of the “testing pyramid,” as well as a strict adherence to not testing more than one class at a time. This necessitates heavy mocking, which is fragile, terrible to refactor, leads to lots of low-value tests. Sadly, AI these days will generate tons of those unit tests in the hands of those who don’t know better. All in all leading to the same false sense of security and killing development speed.

fatso83 · 2025-12-31T10:36:44 1767177404

I get what you are saying, but you can have your cake and eat it too. Fast, comprehensive tests that cover most of your codebase. Test through the domain, employ Fakes at the boundaries.

https://asgaut.com/use-of-fakes-for-domain-driven-design-and...

LgWoodenBadger · 2025-12-28T23:48:31 1766965711

“The solution is to have dummy versions of services and interfaces that have minimal correct behavior”

If you aren’t doing this with mocks then you’re doing mocks wrong.

bccdee · 2025-12-29T04:10:51 1766981451

Martin Fowler draws a useful distinction between mocks, fakes, and stubs¹. Fakes contain some amount of internal logic, e.g. a remote key-value store can be faked with a hashmap. Stubs are a bit dumber—they have no internal logic & just return pre-defined values. Mocks, though, are rigged to assert that certain calls were made with certain parameters. You write something like `myMock.Expect("sum").Args(1, 2).Returns(3)`, and then when you call `myMock.AssertExpectations()`, the test fails unless you called `myMock.sum(1, 2)` somewhere.

People often use the word "mock" to describe all of these things interchangeably², and mocking frameworks can be useful for writing stubs or fakes. However, I think it's important to distinguish between them, because tests that use mocks (as distinct from stubs and fakes) are tightly coupled to implementation, which makes them very fragile. Stubs are fine, and fakes are fine when stubs aren't enough, but mocks are just a bad idea.

[1]: https://martinfowler.com/articles/mocksArentStubs.html

[2]: The generic term Fowler prefers is "test double."

cowsandmilk · 2025-12-29T01:29:42 1766971782

In part, you’re right, but there’s a practical difference between mocking and a good dummy version of a service. Take DynamoDB local as an example: you can insert items and they persist, delete items, delete tables, etc. Or in the Ruby on Rails world, one often would use SQLite as a local database for tests even if using a different DB in production.

Going further, there’s the whole test containers movement of having a real version of your dependency present for your tests. Of course, in a microservices world, bringing up the whole network of dependencies is extremely complicated and likely not warranted.

sfn42 · 2025-12-29T02:15:44 1766974544

I use test containers and similar methods to test against a "real" db, but I also use mocks. For example to mock the response of a third party api, can't very well spin that up in a test container. Nother example is simply time stamps. Can't really test time related stuff without mocking a timestamp provider.

It is a hassle a lot of the time, but I see it as a necessary evil.

supriyo-biswas · 2025-12-29T03:16:43 1766978203

You can use a library like [1] to mock out a real HTTP server with responses.

[1] https://www.mock-server.com/

pdpi · 2025-12-29T01:36:24 1766972184

I'd go a bit farther — "mock" is basically the name for those dummy versions.

That said, there is a massive difference between writing mocks and using a mocking library like Mockito — just like there is a difference between using dependency injection and building your application around a DI framework.

rgoulter · 2025-12-29T01:57:10 1766973430

> there is a massive difference between writing mocks and using a mocking library like Mockito

How to reconcile the differences in this discussion?

The comment at the root of the thread said "my experience with mocks is they were over-specified and lead to fragile services, even for fresh codebases. Using a 'fake' version of the service is better". The reply then said "if mocking doesn't provide a fake, it's not 'mocking'".

I'm wary of blanket sentiments like "if you ended up with a bad result, you weren't mocking". -- Is it the case that libraries like mockito are mostly used badly, but that correct use of them provides a good way of implementing robust 'fake services'?

pdpi · 2025-12-29T02:12:02 1766974322

In my opinion, we do mocking the exact opposite of how we should be doing it — Mocks shouldn't be written by the person writing tests, but rather by the people who implemented the service being mocked. It's exceedingly rare to see this pattern in the wild (and, frustratingly, I can't think of an example off the top of my head), but I know Ive had good experiences with cases of package `foo` offering a `foo-testing` package that offers mocks. Turns out that mocks are a lot more robust when they're built on top of the same internals as the production version, and doing it that way also obviates much of the need for general-purpose mocking libraries.

saghm · 2025-12-29T00:24:52 1766967892

I think the argument they're making is that once you have this, you already have an easy way to test things that doesn't require bringing in an entire framework.

jchw · 2025-12-29T01:45:12 1766972712

The difference, IMO, between a mock and a proper "test" implementation is that traditionally a mock only exists to test interface boundaries, and the "implementation" is meant to be as much of a noop as possible. That's why the default behavior of almost any "automock" is to implement an interface by doing nothing and returning nothing (or perhaps default-initialized values) and provide tools for just tacking assertions onto it. If it was a proper implementation that just happened to be in-memory, it wouldn't really be a "mock", in my opinion.

For example, let's say you want to test that some handler is properly adding data to a cache. IMO the traditional mock approach that is supported by mocking libraries is to go take your RedisCache implementation and create a dummy that does nothing, then add assertions that say, the `set` method gets called with some set of arguments. You can add return values to the mock too, but I think this is mainly meant to be in service of just making the code run and not actually implementing anything.

Meanwhile, you could always make a minimal "test" implementation (I think these are sometimes called "fakes", traditionally, though I think this nomenclature is even more confusing) of your Cache interface that actually does behave like an in-memory cache, then your test could assert as to its contents. Doing this doesn't require a "mocking" library, and in this case, what you're making is not really a "mock" - it is, in fact, a full implementation of the interface, that you could use outside of tests (e.g. in a development server.) I think this can be a pretty good middle ground in some scenarios, especially since it plays along well with in-process tools like fake clocks/timers in languages like Go and JavaScript.

Despite the pitfalls, I mostly prefer to just use the actual implementations where possible, and for this I like testcontainers. Most webserver projects I write/work on naturally require a container runtime for development for other reasons, and testcontainers is glue that can use that existing container runtime setup (be it Docker or Podman) to pretty rapidly bootstrap test or dev service dependencies on-demand. With a little bit of manual effort, you can make it so that your normal test runner (e.g. `go test ./...`) can run tests normally, and automatically skip anything that requires a real service dependency in the event that there is no Docker socket available. (Though obviously, in a real setup, you'd also want a way to force the tests to be enabled, so that you can hopefully avoid an oopsie where CI isn't actually running your tests due to a regression.)

zem · 2025-12-29T03:03:34 1766977414

my time at google likewise led me to the conclusion that fakes were better than mocks in pretty much every case (though I was working in c++ and python, not java).

edit: of course google was an unusual case because you had access to all the source code. I daresay there are cases where only a mock will work because you can't satisfy type signatures with a fake.

ebiester · 2025-12-29T02:33:40 1766975620

Mockito, in every case I had to use it, was a last resort because a third party library didnt lend itself to mocking, or you were bringing legacy code under test and using it long enough to refactor it out.

It should never be the first tool. But when you need it, it’s very useful.

rurban · 2025-12-31T05:17:24 1767158244

I dont use dummy services and I dont use mocking. I'm writing simulators to test things for HW or big services which are not available for testing.

Simulators need to be complete for their use cases or they cannot be used for testing.

cletus · 2025-12-25T13:52:30 1766670750

Story time. This has basically nothing to do with this post other than it involves a limit of 10,000 but hey, it's Christmas and I want to tell a story.

I used to work for Facebook and many years ago people noticed you couldn't block certain people but the one that was most public was Mark Zuckerberg. It would just say it failed or something like that. And people would assign malice or just intent to it. But the truth was much funnier.

Most data on Facebook is stored in a custom graph database that basically only has 2 tables that are sharded across thousands of MySQL instances but most almost always accessed via an in-memory write-through cache, also custom. It's not quite a cache because it has functionality built on top of the database that accessing directly wouldn't have.

So a person is an object and following them is an edge. Importantly, many such edges were one-way so it was easy to query if person A followed B but much more difficult to query all the followers of B. This was by design to avoid hot shards.

So I lied when I said there were 2 tables. There was a third that was an optimization that counted certain edges. So if you see "10.7M people follow X" or "136K people like this", it's reading a count, not doing a query.

Now there was another optimization here: only the last 10,000 of (object ID,edge type) were in memory. You generally wanted to avoid dealing with anything older than that because you'd start hitting the database and that was generally a huge problem on a large, live query or update. As an example, it was easy to query the last 10,000 people or pages you've followed.

You should be able to see where this is going. All that had happened was 10,000 people had blocked Mark Zuckerberg. Blocks were another kind of edge that was bidirectional (IIRC). The system just wasn't designed for a situation where more than 10,000 people wanted to block someone.

This got fixed many years ago because somebody came along and build a separate system to handle blocking that didn't have the 10,000 limit. I don't know the implementation details but I can guess. There was a separate piece of reverse-indexing infrastructure for doing queries on one-way edges. I suspect that was used.

Anyway, I love this story because it's funny how a series of technical decisions can lead to behavior and a perception nobody intended.

Zacharias030 · 2025-12-25T18:46:45 1766688405

Merry Christmas! This is why I like hackernews.

cletus · 2025-07-11T13:59:43 1752242383

People should go to jail for this.

Anyone who has worked on a large migration eventually lands on a pattern that goes something like this:

1. Double-write to the old system and the new system. Nothing uses the new system;

2. Verify the output in the new system vs the old system with appropriate scripts. If there are issues, which there will be for awhile, go back to (1);

3. Start reading from the new system with a small group of users and then an increasingly large group. Still use the old system as the source of truth. Log whenever the output differs. Keep making changes until it always matches;

4. Once you're at 100% rollout you can start decomissioning the old system.

This approach is incremental, verifiable and reversible. You need all of these things. If you engage in a massive rewrite in a silo for a year or two you're going to have a bad time. If you have no way of verifying your new system's output, you're going to have a bad time. In fact, people are going to die, as is the case here.

If you're going to accuse someone of a criminal act, a system just saying it happened should NEVER be sufficient. It should be able to show its work. The person or people who are ultimately responsible for turning a fraud detection into a criminal complaint should themselves be criminally liable if they make a false complaint.

We had a famous example of this with Hertz mistakenly reporting cars stolen, something they ultimately had to pay for in a lawsuit [1] but that's woefully insufficient. It is expensive, stressful and time-consuming to have to criminally defend yourself against a felony charge. People will often be forced to take a plea because absolutely everything is stacked in the prosecution's favor despite the theoretical presumption of innocence.

As such, an erroneous or false criminal complaint by a company should itself be a criminal charge.

In Hertz's case, a human should eyeball the alleged theft and look for records like "do we have the car?", "do we know where it is?" and "is there a record of them checking it in?"

In the UK post office scandal, a detection of fraud from accounting records should be verified by comparison to the existing system in a transition period AND, moreso in the beginning, double checking results with forensic accountants (actual humans) before any criminal complaint is filed.

[1]: https://www.npr.org/2022/12/06/1140998674/hertz-false-accusa...

cletus · 2025-06-14T15:21:38 1749914498

I realize scale makes everything more difficult but at the end of the day, Netflix is encoding and serving several thousand videos via a CDN. It can't be this hard. There are a few statements in this that gave me pause.

The core problem seems to be development in isolation. Put another way: microservices. This post hints at microservices having complete autonomy over their data storage and developing their own GraphQL models. The first is normal for microservices (but an indictment at the same time). The second is... weird.

The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie". Attributes are optional. Pull what you need. Common subsets of data can be organized in fragments. If you're not doing that, why are you using GraphQL?

So I worked at Facebook and may be a bit biased here because I encountered a couple of ex-Netflix engineers in my time who basically wanted to throw away FB's internal infrastructure and reinvent Netflix microservices.

Anyway, at FB there a Video GraphQL object. There aren't 23 or 7 or even 2.

Data storage for most things was via write-through in-memory graph database called TAO that persisted things to sharded MySQL servers. On top of this, you'd use EntQL to add a bunch of behavior to TAO like permissions, privacy policies, observers and such. And again, there was one Video entity. There were offline data pipelines that would generally process logging data (ie outside TAO).

Maybe someone more experienced with microservices can speak to this: does UDA make sense? Is it solving an actual problem? Or just a self-created problem?

jmull · 2025-06-14T17:05:20 1749920720

I think they are just trying to put in place the common data model that, as you point out, they need.

(So their micro services can work together usefully and efficiently -- I would guess that currently the communication burden between microservice teams is high and still is not that effective.)

> The whole point of GraphQL is to create a unified view of something

It can do that, but that's not really the point of GraphQL.. I suppose you're saying that's how it was used as FB. That's fine, IMO, but it sounds like this NF team decided to use something more abstract for the same purpose.

I can't comment on their choices without doing a bunch more analysis, but in my own experience I've found off-the-shelf data modeling formats have too much flexibility in some places (forcing you to add additional custom controls or require certain usage patterns) and not enough in others (forcing you to add custom extensions). The nice thing about your own format is you can make it able to express everything you want and nothing you don't. And have a well-defined projection to Graphql (and sqlite and oracle and protobufs and xml and/or whatever other thing you're using).

twodave · 2025-06-14T15:26:41 1749914801

I totally agree. Especially with Fusion it’s very easy to establish core types in self-contained subgraphs and then extend those types in domain-specific subgraphs. IMO the hardest part about this approach is just namespacing all the things, because GraphQL doesn’t have any real conventions for organizing service- (or product-) specific types.

bertails · 2025-06-14T17:43:10 1749922990

> The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie".

GraphQL is great at federating APIs, and is a standardized API protocol. It is not a data modeling language. We actually tried really hard with GraphQL first.

cush · 2025-06-14T15:35:39 1749915339

>at the end of the day, Netflix is encoding and serving several thousand videos via a CDN. It can't be this hard

Yeah maybe 10 years ago, but today Netflix is one of the top production companies on the planet. In the article, they even point to how this addresses their issues in content engineering

https://netflixtechblog.com/netflix-studio-engineering-overv...

https://netflixtechblog.com/globalizing-productions-with-net...

cletus · 2025-05-13T16:32:38 1747153958

So I've worked for Google (and Facebook) and it really drives the point home of just how cheap hardware is and how not worth it optimizing code is most of the time.

More than a decade ago Google had to start managing their resource usage in data centers. Every project has a budget. CPU cores, hard disk space, flash storage, hard disk spindles, memory, etc. And these are generally convertible to each other so you can see the relative cost.

Fun fact: even though at the time flash storage was ~20x the cost of hard disk storage, it was often cheaper net because of the spindle bottleneck.

Anyway, all of these things can be turned into software engineer hours, often called "mili-SWEs" meaning a thousandth of the effort of 1 SWE for 1 year. So projects could save on hardware and hire more people or hire fewer people but get more hardware within their current budgets.

I don't remember the exact number of CPU cores amounted to a single SWE but IIRC it was in the thousands. So if you spend 1 SWE year working on optimization acrosss your project and you're not saving 5000 CPU cores, it's a net loss.

Some projects were incredibly large and used much more than that so optimization made sense. But so often it didn't, particularly when whatever code you wrote would probably get replaced at some point anyway.

The other side of this is that there is (IMHO) a general usability problem with the Web in that it simply shouldn't take the resources it does. If you know people who had to or still do data entry for their jobs, you'll know that the mouse is pretty inefficient. The old terminals from 30-40+ years ago that were text-based had some incredibly efficent interfaces at a tiny fraction of the resource usage.

I had expected that at some point the Web would be "solved" in the sense that there'd be a generally expected technology stack and we'd move on to other problems but it simply hasn't happened. There's still a "framework of the week" and we're still doing dumb things like reimplementing scroll bars in user code that don't work right with the mouse wheel.

I don't know how to solve that problem or even if it will ever be "solved".

mike_hearn · 2025-05-13T19:34:32 1747164872

I worked there too and you're talking about performance in terms of optimal usage of CPU on a per-project basis.

Google DID put a ton of effort into two other aspects of performance: latency, and overall machine utilization. Both of these were top-down directives that absorbed a lot of time and attention from thousands of engineers. The salary costs were huge. But, if you're machine constrained you really don't want a lot of cores idling for no reason even if they're individually cheap (because the opportunity cost of waiting on new DC builds is high). And if your usage is very sensitive to latency then it makes sense to shave milliseconds off because of business metrics, not hardware $ savings.

cletus · 2025-05-13T20:32:19 1747168339

The key part here is "machine utilization" and absolutely there was a ton of effort put into this. I think before my time servers were allocated to projects but even early on in my time at Google Borg had already adopted shared machine usage and therew was a whole system of resource quota implemented via cgroups.

Likewise there have been many optimization projects and they used to call these out at TGIF. No idea if they still do. One I remember was reducing the health checks via UDP for Stubby and given that every single Google product extensively uses Stubby then even a small (5%? I forget) reduction in UDP traffic amounted to 50,000+ cores, which is (and was) absolutely worth doing.

I wouldn't even put latency in the same category as "performance optimization" because often you decrease latency by increasing resource usage. For example, you may send duplicate RPCs and wait for the fastest to reply. That could be double or tripling effort.

xondono · 2025-05-13T17:09:00 1747156140

Except you’re self selecting for a company that has high engineering costs, big fat margins to accommodate expenses like additional hardware, and lots of projects for engineers to work on.

The evaluation needs to happen in the margins, even if it saves pennies/year on the dollar, it’s best to have those engineers doing that than have them idling.

The problem is that almost no one is doing it, because the way we make these decisions has nothing to do with the economical calculus behind, most people just do “what Google does”, which explains a lot of the disfunction.

bjourne · 2025-05-13T18:30:01 1747161001

I think the parent's point is that if Google with millions of servers can't make performance optimization worthwhile, then it is very unlikely that a smaller company can. If salaries dominate over compute costs, then minimizing the latter at the expense of the former is counterproductive.

> The evaluation needs to happen in the margins, even if it saves pennies/year on the dollar, it’s best to have those engineers doing that than have them idling.

That's debatable. Performance optimization almost always lead to complexity increase. Doubled performance can easily cause quadrupled complexity. Then one has to consider whether the maintenance burden is worth the extra performance.

makeitdouble · 2025-05-14T07:08:18 1747206498

> it is very unlikely that a smaller company can.

I think it's the reverse: a small company doesn't have the liquidity, buying power or ability to convert more resource into more money like Google.

And of course a lot of small companies will be paying Google with a fat margin to use their cloud.

Getting by with less resources, or even on-premise reduced hardware will be a way bigger win. That's why they'll pay a DBA full time to optimize their database needs to reduce costs 2 to 3x the salary. Or have full team of infra guys mostly dealing with SRE and performance.

maccard · 2025-05-13T19:16:47 1747163807

> If salaries dominate over compute costs, then minimizing the latter at the expense of the former is counterproductive.

And with client side software, compute costs approach 0 (as the company isn’t paying for it).

arp242 · 2025-05-13T21:41:55 1747172515

> I don't remember the exact number of CPU cores amounted to a single SWE but IIRC it was in the thousands.

I think this probably holds true for outfits like Google because 1) on their scale "a core" is much cheaper than average, and 2) their salaries are much higher than average. But for your average business, even large businesses? A lot less so.

I think this is a classic "Facebook/Google/Netflix/etc. are in a class of their own and almost none of their practices will work for you"-type thing.

morepork · 2025-05-13T22:12:59 1747174379

Maybe not to the same extent, but an AWS EC2 m5.large VM with 2 cores and 8 GB RAM costs ~$500/year (1 year reserved). Even if your engineers are being paid $50k/year, that's the same as 100 VMs or 200 cores + 800 GB RAM.

smikhanov · 2025-05-14T11:08:30 1747220910

    I don't know how to solve that problem or even if it will ever be "solved".

It will not be “solved” because it’s a non-problem.

You can run a thought experiment imagining an alternative universe where human resource were directed towards optimization, and that alternative universe would look nothing like ours. One extra engineer working on optimization means one less engineer working on features. For what exactly? To save some CPU cycles? Don’t make me laugh.

karmakaze · 2025-05-13T22:54:44 1747176884

Google doesn't come up with better compression and binary serialization formats just for fun--it improves their bottom line.

cletus · 2025-05-11T04:16:13 1746936973

Google has over the years tried to get several new languages off the ground. Go is by far the most successful.

What I find fascinating is that all of them that come to mind were conceived by people who didn't really understand the space they were operating in and/or had no clear idea of what problem the language solved.

There was Dart, which was originally intended to be shipped as a VM in Chrome until the Chrome team said no.

But Go was originally designed as a systems programming language. There's a lot of historical revisionism around this now but I guarantee you it was. And what's surprising about that is that having GC makes that an immediate non-starter. Yet it happened anyway.

The other big surprise for me was that Go launched without external dependencies as a first-class citizen of the Go ecosystem. For the longest time there were two methods of declaring them: either with URLs (usually Github) in the import statements or with badly supported manifests. Like just copy what Maven did for Java. Not the bloated XML of course.

But Go has done many things right like having a fairly simple (and thus fast to compile) syntax, shipping with gofmt from the start and favoring error return types over exceptions, even though it's kind of verbose (and Rust's matching is IMHO superior).

Channels were a nice idea but I've become convinced that cooperative async-await is a superior programming model.

Anyway, Go never became the C replacement the team set out to make. If anything, it's a better Python in many ways.

Good luck to Ian in whatever comes next. I certainly understand the issues he faced, which is essentially managing political infighting and fiefdoms.

Disclaimer: Xoogler.

pjmlp · 2025-05-11T14:46:51 1746974811

Some of us believe GC[0] isn't an impediment for systems programming languages.

They haven't taken off as Xerox PARC, ETHZ, Dec Olivetti, Compaq, Microsoft desired more due to politics, external or internal (in MS's case), than technical impediments.

Hence why I like the way Swift and Java/Kotlin[1] are pushed on mobile OSes, to the point "my way or get out".

I might discuss about many of Go's decisions regarding minimalism language design, however I will gladly advocate for its suitability as systems language.

The kind of systems we used to program for a few decades ago, compilers, linkers, runtimes, drivers, OS services, bare metal deployments (see TamaGo),...

[0] - Any form of GC, as per computer science definition, not street knowledge.

[1] - The NDK is relatively constrained, and nowadays there is Kotlin Native as well.

eikenberry · 2025-05-11T04:37:41 1746938261

> Channels were a nice idea but I've become convinced that cooperative async-await is a superior programming model.

Curious as to your reasoning around this? I've never heard this opinion before from someone not biased by their programming language preferences.

cletus · 2025-05-11T04:58:08 1746939488

Sure. First you need to separate buffered and unbuffered channels.

Unbuffered channels basically operate like cooperate async/await but without the explictness. In cooperative multitasking, putting something on an unbuffered channel is essentially a yield().

An awful lot of day-to-day programming is servicing requests. That could be HTTP, an RPC (eg gRPC, Thrift) or otherwise. For this kind of model IMHO you almost never want to be dealing with thread primitives in application code. It's a recipe for disaster. It's so easy to make mistakes. Plus, you often need to make expensive calls of your own (eg reading from or writing to a data store of some kind) so there's no really a performance benefit.

That's what makes cooperative async/await so good for application code. The system should provide compatible APIs for doing network requests (etc). You never have to worry about out-of-order processing, mutexes, thread pool starvation or a million other issues.

Which brings me to the more complicated case of buffered channels. IME buffered channels are almost always a premature optimization that is often hiding concurrency issues. As in if that buffered channels fills up you may deadlock where you otherwise wouldn't if the buffer wasn't full. That can be hard to test for or find until it happens in production.

But let's revisit why you're optimizing this with a buffered channel. It's rare that you're CPU-bound. If the channel consumer talks to the network any perceived benefit of concurrency is automatically gone.

So async/await doesn't allow you to buffer and create bugs for little benefit and otherwise acts like unbuffered channels. That's why I think it's a superior programming model for most applications.

yubblegum · 2025-05-11T14:52:50 1746975170

Buffers are there to deal with flow variances. What you are describing as the "ideal system" is a clockwork. Your async-awaits are meshed gears. For this approach to be "ideal" it needs to be able to uniformly handle the dynamic range of the load on the system. This means every part of the clockwork requires the same performance envelope. (a little wheel is spinning so fast that it causes metal fatigue; a flow hits the performance ceiling of an intermediary component). So it either fails or limits the system's cyclical rate. These 'speed bumps' are (because of the clockwork approach) felt throughout the flow. That is why we put buffers in between two active components. Now we have a greater dynamic range window of operation without speed bumps.

It shouldn't be too difficult to address testing of buffered systems at implementation time. Possibly pragma/compile-time capabilities allowing for injecting 'delay' in the sink side to trivially create "full buffer" conditions and test for it.

There are no golden hammers because the problem domain is not as simple as a nail. Tradeoffs and considerations. I don't think I will ever ditch either (shallow, preferred) buffers or channels. They have their use.

jpc0 · 2025-05-11T07:34:14 1746948854

I agree with many of your points, including coroutines being a good abstraction.

The reality is though that you are directly fighting or reimplementing the OS scheduler.

I haven’t found an abstraction that does exactly what I want but unfortunately any sort of structured concurrency tends to end up with coloured functions.

Something like C++ stdexec seems interesting but there are still elements of function colouring in there if you need to deal with async. The advantage is that you can compose coroutines and synchronous code.

For me I want a solution where I don’t need to care whether a function is running on the async event loop, a separate thread, a coprocessor or even a different computer and the actor/CSP model tends to model that the best way. Coroutines are an implementation detail and shouldn’t be exposed in an API but that is a strong opinion.

frollogaston · 2025-05-12T17:30:01 1747071001

As you probably know, Rust ended up with async/await. This video goes deep into that and the alternatives, and it changed my opinions a bit: https://www.youtube.com/watch?v=lJ3NC-R3gSI

Golang differs from Rust by having a runtime underneath. If you're already paying for that, it's probably better to do greenthreading than async/await, which is what Go did. I still find the Go syntax for this more bothersome and error-prone, as you said, but there are other solutions to that.

eikenberry · 2025-05-12T04:53:59 1747025639

I can see the appeal for simplicity of concept and not requiring any runtime, but it has some hard tradeoffs. In particular the ones around colored functions and how that makes it feel like concurrency was sort of tacked onto the languages that use it. Being cooperative adds a performance cost as well which I'm not sure I'd be on board with.

skybrian · 2025-05-11T14:22:11 1746973331

“Systems programming language” is an ambiguous term and for some definitions (like, a server process that handles lots of network requests) garbage collection can be ok, if latency is acceptable.

Google has lots of processes handling protobuf requests written in both Java and C++. (Or at least, it did at the time I was there. I don’t think Go ever got out of third place?)

frollogaston · 2025-05-12T17:32:02 1747071122

It's non-application software meant to support something else at run time. Like a cache, DBMS, webserver, runtime, OS, etc.

kmeisthax · 2025-05-11T15:24:16 1746977056

My working definition of "systems programming" is "programming software that controls the workings of other software". So kernels, hypervisors, emulators, interpreters, and compilers. "Meta" stuff. Any other software that "lives inside" a systems program will take on the performance characteristics of its host, so you need to provide predictable and low overhead.

GC[0] works for servers because network latency will dominate allocation latency; so you might as well use a heap scanner. But I wouldn't ever want to use GC in, say, audio workloads; where allocation latency is such a threat that even malloc/free has to be isolated into a separate thread so that it can't block sample generation. And that also means anything that audio code lives in has to not use GC. So your audio code needs to be written in a systems language, too; and nobody is going to want an OS kernel that locks up during near-OOM to go scrub many GBs of RAM.

[0] Specifically, heap-scanning deallocators, automatic refcount is a different animal.

skybrian · 2025-05-11T15:39:05 1746977945

I wouldn’t include compilers in that list. A traditional compiler is a batch process that needs to be fast enough, but isn’t particularly latency sensitive; garbage collection is fine. Compilers can and are written in high-level languages like Haskell.

Interpreters are a whole different thing. Go is pretty terrible for writing a fast interpreter since you can’t do low-level unsafe stuff like NaN boxing. It’s okay if performance isn’t critical.

pjmlp · 2025-05-11T16:52:51 1746982371

Yes, you can via unsafe.

And if you consider K&R C a systems language, you would do it like back in the day, with a bit of hand written helper functions in Assembly.

kmeisthax · 2025-05-11T17:24:52 1746984292

You don't (usually) inherit the performance characteristics of your compiler, but you do inherit the performance characteristics of the language your compiler implements.

pjmlp · 2025-05-11T16:51:09 1746982269

So that fits, given that Go compiler, linker, assembler and related runtime are all written in Go itself.

You mean an OS kernel, like Java Real Time running bare metal, designed in a way that it can even tackle battleship weapons targeting systems?

https://www.ptc.com/en/products/developer-tools/perc

nmz · 2025-05-11T15:26:43 1746977203

From what I remember, Go started out because a C++ application took 30 minutes compiling even though they were using google infrastructure, you could say that they set out to create a systems programming language (they certainly thought so), but mostly I think the real goal was recreating C++ features without the compile time, and in that, they were successful.

pluto_modadic · 2025-05-11T19:58:59 1746993539

is there a language that implements cooperative async-await patterns nicely?

frollogaston · 2025-05-12T18:49:14 1747075754

JS, Rust

zelphirkalt · 2025-05-11T16:01:58 1746979318

I mean, they claimed that one didn't need generics in the language for some 12 years or so ...