I've had better experience with LLMs translating than any bespoke translation tool, oddly enough. LLMs seemingly have a very good handle on regional varieties. As an example, I've never found a good translator for Lebanese/Syrian Arabic dialects, but ChatGPT was able to easily translate for me, even getting right some lesser used rural accent quirks which I didn't even know how to translate (similar to something like "y'all" in english).
To be fair, I wasn't using it in the way the parent comment described, for me I said: "this person speaking Lebanese/Syrian Arabic said something that sounded like [try my best to replicate the sentence]. What did they most likely mean?" and got a pretty much spot-on answer.
I wonder if this ability translates to other languages, but I wouldn't be able to tell. My Arabic is "good enough" to tell that the translations I got were good, but I'd be interested to here from someone who knows more if, for example fuzhounese translation is any good.
There's an odd trend with these sorts of posts where the author claims to have had some transformative change in their workflow brought upon by LLM coding tools, but also seemingly has nothing to show for it. To me, using the most recent ChatGPT Codex (5.3 on "Extra High" reasoning), it's incredibly obvious that while these tools are surprisingly good at doing repetitive or locally-scoped tasks, they immediately fall apart when faced with the types of things that are actually difficult in software development and require non-trivial amounts of guidance and hand-holding to get things right. This can still be useful, but is a far cry from what seems to be the online discourse right now.
As a real world example, I was told to evaluate Claude Code and ChatGPT codex at my current job since my boss had heard about them and wanted to know what it would mean for our operations. Our main environment is a C# and Typescript monorepo with 2 products being developed, and even with a pretty extensive test suite and a nearly 100 line "AGENTS.md" file, all models I tried basically fail or try to shortcut nearly every task I give it, even when using "plan mode" to give it time to come up with a plan before starting. To be fair, I was able to get it to work pretty well after giving it extremely detailed instructions and monitoring the "thinking" output and stopping it when I see something wrong there to correct it, but at that point I felt silly for spending all that effort just driving the bot instead of doing it myself.
It almost feels like this is some "open secret" which we're all pretending isn't the case too, since if it were really as good as a lot of people are saying there should be a massive increase in the number of high quality projects/products being developed. I don't mean to sound dismissive, but I really do feel like I'm going crazy here.
You're not going crazy. That is what I see as well. But, I do think there is value in:
- driving the LLM instead of doing it yourself. - sometimes I just can't get the activation energy and the LLM is always ready to go so it gives me a kickstart
- doing things you normally don't know. I learned a lot of command like tools and trucks by seeing what Claude does. Doing short scripts for stuff is super useful. Of course, the catch here is if you don't know stuff you can't drive it very well. So you need to use the things in isolation.
- exploring alternative solutions. Stuff that by definition you don't know. Of course, some will not work, but it widens your horizon
- exploring unfamiliar codebases. It can ingest huge amounts of data so exploration will be faster. (But less comprehensive than if you do it yourself fully)
- maintaining change consistency. This I think it's just better than humans. If you have stuff you need to change at 2 or 3 places, you will probably forget. LLM's are better at keeping consistency at details (but not at big picture stuff, interestingly.)
For me the biggest benefit from using LLMs is that I feel way more motivated to try new tools because I don't have to worry about the initial setup.
I'd previously encountered tools that seemed interesting, but as soon as I tried getting it to run I found myself going down an infinite debugging hole. With an LLM I can usually explain my system's constraints and the best models will give me a working setup from which I can begin iterating. The funny part is that most of these tools are usually AI related in some way, but getting a functional environment often felt impossible unless you had really modern hardware.
Same. This weekend, I built a Flutter app and a Wails app just to compare the two. Would have never done either on my own due to the up front boilerplate— and not knowing (nor really wishing to know) Dart.
>driving the LLM instead of doing it yourself. - sometimes I just can't get the activation energy and the LLM is always ready to go so it gives me a kickstart
There is a counter issue though, realizing mid session that the model won’t be able to deliver that last 10%, and now you have to either grok a dump of half finished code or start from scratch.
My problem is that once I have coded a lot with the LLM, and I come across some problem that I just cannot solve with it like a synchronization issue in my game, then I have to go down to the weeds and the effort feels so gargantuan because I have mostly relied on the LLM.
If (and it's a big if) the LLM gives you something that kinda, sorta, works, it may be an easier task to keep that working, and make it work better, while you refactor it, than it would have been to write it from scratch.
That is going to depend a lot on the skillset and motivation of the programmer, as well as the quality of the initial code dump, but...
There's a lot to be said for working code. After all, how many prototypes get shipped?
> - maintaining change consistency. This I think it's just better than humans. If you have stuff you need to change at 2 or 3 places, you will probably forget. LLM's are better at keeping consistency at details (but not at big picture stuff, interestingly.)
I use Claude Code a decent amount, and I actually find that sometimes this can be the opposite for me. Sometimes it is actually missing other areas that the change will impact and causing things to break. Sometimes when I go to test it I need to correct it and point out it missed something or I notice when in the planning phase that it is missing something.
However I do find if you use a more powerful opus model when planning, it does consider things fully a lot better than it used to. This is actually one area I have been seeing some very good improvements as the models and tooling improves.
In fact, I actually hope that these AI tools keep getting better at the point you mention, as humans also have a "context limit". There are only so many small details I can remember about the codebase so it is good if AI can "remember" or check these things.
I guess a lot of the AI can also depend on your codebase itself, how you prompt it, and what kind of agents file you have. If you have a robust set of tests for your application you can very easily have AI tools check their work to ensure things aren't being broken and quickly fix it before even completing the task. If you don't have any testing more could be missed. So I guess it's just like a human in some sense. If you have a crappy codebase for the AI to work with, the AI may also sometimes create sloppy work.
> LLM's are better at keeping consistency at details (but not at big picture stuff, interestingly.)
I think it makes sense? Unlike small details which are certain to be explicitly part of the training data, "big picture stuff" feels like it would mostly be captured only indirectly.
I tend to be surprised in the variance of reported experiences with agentic flows like Claude Code and Codex CLI.
It's possible some of it is due to codebase size or tech stack, but I really think there might be more of a human learning curve going on here than a lot of people want to admit.
I think I am firmly in the average of people who are getting decent use out of these tools. I'm not writing specialized tools to create agents of agents with incredibly detailed instructions on how each should act. I haven't even gotten around to installing a Playwright mcp (probably my next step).
But I've:
- created project directories with soft links to several of my employer's repos, and been able to answer several cross-project and cross-team questions within minutes, that normally would have required "Spike/Disco" Jira tickets for teams to investigate
- interviewed codebases along with product requirements to come up with very detailed Jira AC, and then,.. just for the heck of it, had the agent then use that AC to implement the actual PR. My team still code-reviewed it but agreed it saved time
- in side projects, have shipped several really valuable (to me) features that would have been too hard to consider otherwise, like... generating pdf book manuscripts for my branching-fiction creating writing club, and launching a whole new website that has been mired in a half-done state for years
Really my only tricks are the basics: AGENTS.md, brainstorm with the agent, continually ask it to write markdown specs for any cohesive idea, and then pick one at a time to implement in commit-sized or PR-sized chunks. GPT-5.2 xhigh is a marvel at this stuff.
My codebases are scala, pekko, typescript/react, and lilypond - yeah, the best models even understand lilypond now so I can give it a leadsheet and have it arrange for me two-hand jazz piano exercises.
I generally think that if people can't reach the above level of success at this point in time, they need to think more about how to communicate better with the models. There's a real "you get out of it what you put into it" aspect to using these tools.
Is it annoying that I tell it to do something and it does about a third of it? Absolutely.
Can I get it to finish by asking it over and over to code review its PR or some other such generic prompt to weed out the skips and scaffolding? Also yes.
Basically these things just need a supervisor looking at the requirements, test results, and evaluating the code in a loop. Sometimes that's a human, it can also absolutely be an LLM. Having a second LLM with limited context asking questions to the worker LLM works. Moreso when the outer loop has code driving it and not just a prompt.
For example I'm working on some virtualization things where I want a machine to be provisioned with a few options of linux distros and BSDs. In one prompt I asked for this list to be provisioned so a certain test of ssh would complete, it worked on it for several hours and now we're doing the code review loop. At first it gave up on the BSDs and I had to poke it to actually finish with an idea it had already had, now I'm asking it to find bugs and it's highlighting many mediocre code decisions it has made. I haven't even tested it so I'm not sure if it's lying about anything working yet.
I usually talk with the agent back and forth for 15 min, explicitly ask, "what corner cases do we need to consider, what blind spots do I have?" And then when I feel like I've brain vomited everything + send some non-sensitive copy and paste and ask it for a CLAUDE/AGENTS.md and that's sufficient to one-shot 98% of cases
The thing I've learned is that it doesn't do well at the big things (yet).
I have to break large tasks into smaller tasks, and limit the context and scope.
This is the thing that both Superpowers and Ralph [0] do well when they're orchestrating; the plans are broken down enough so that the actual coding agent instance doesn't get overwhelmed and lost.
It'll be interesting to see what Claude Code's new 1m token limit does to this. I'm not sure if the "stupid zone" is due to approaching token limits, or to inherent growth in complexity in the context.
[0] these are the two that I've experimented with, there are others.
ah, so cool. Yeah that is definitely bigger than what I ask for. I'd say the bigger risk I'm dealing with right now is that while it passes all my very strict linting and static analysis toolsets, I neglected to put detailed layered-architecture guidelines in place, so my code files are approaching several hundred lines now. I don't actually know if the "most efficient file size" for an agent is the same as for a human, but I'd like them to be shorter so I can understand them more easily.
Tell it to analyze your codebase for best practices and suggest fixes.
Tell it to analyze your architecture, security, documentation, etc. etc. etc. Install claude to do review on github pull requests and prompt it to review each one with all of these things.
Just keep expanding your imagination about what you can ask it to do, think of it more like designing an organization and pinning down the important things and providing code review and guard rails where it needs it and letting it work where it doesn't.
I wish we could track down the people who use agents to post. I’m sure “your human” thinks they are being helpful, but all they are doing is making this site worse.
Noone is interested in the question of what an LLM can do to generate a brief post to the comments section of a website. Everyone has known that is possible for some time. So it adds literally negative value to have an agent to make a post “on your behalf”
I can’t speak for anyone else, but Claude Code has been transformative for me.
I can’t say it’s led to shipping “high quality projects”, but it has let me accomplish things I just wouldn’t have had time for previously.
I’ve been wanting to develop a plastic -> silicone -> plaster -> clay mold making process for years, but it’s complex and mold making is both art and science. It would have been hundreds of hours before, with maybe 12 hours of Claude code I’m almost there (some nagging issues… maybe another hour).
And I had written some home automation stuff back with Python 2.x a decade ago; it was never worth the time to refamiliarize myself with in order to update, which led to periodic annoyances. 20 minutes, and it’s updated to all the latest Python 3.x and modern modules.
For me at least, the difference between weeks and days, days and hours, and hours and minutes has allowed me to do things I just couldn’t justify investing time in before. Which makes me happy!
So maybe some folks are “pretending”, or maybe the benefits just aren’t where you’re expecting to see them?
I’m trying to pivot my career from web/business app dev entirely into embedded, despite the steep learning curve, many new frameworks and tool chains, because I now have a full-time infinitely patient tutor, and I dare say it’s off to a pretty good start so far.
If you want to get into embedded you’d be better suited learning how to use an o-scope, a meter, and asm/c. If you’re using any sort of hardware that isn’t “mainstream” you’ll be pretty bummed at the results from an LLM.
Why not both? An LLM as a tutor, for the o-scope, meter, and assembly is pretty good at getting you unstuck. It doesn't have to do everything for you. It can do the parts you're not interested in and you can focus on the parts that are interesting to you.
I asked an LLM (google search LLM result) how to install steam on Rocky 9 without flatpak this evening, and it completely fucked it up. The correct answer were 3 dnf commands I found on reddit.
I don't know if I'd trust an LLM to teach an o-scope.
I got into embedded 10 years ago, there really is something about driving hardware directly that is just so rewarding.
For AI I've been using Cecli which is cli and can actually run the compile step then fix any errors it finds - in addition to using Context7 MCP for syntax.
Not quite 10x yet but productivity has improved for me many times over. It's just how you use the tools available
That’s where it really shines. I have a backlog of small projects (-1-2kLOC type state machines , sensors, loggers) and instead of spending 2-3 days I can usually knock them out in half a day. So they get done. On these projects, it is an infinity improvement because I simply wouldn’t have done them, unable to justify the cost.
But on bigger stuff, it bogs down and sometimes I feel like I’m going nowhere. But it gets done eventually, and I have better structured, better documented code. Not because it would be better structured and documented if I left it to its ow devices, but rather it is the best way to get performance out of LLM assistance in code.
The difference now is twofold: First, things like documentation are now -effortless-. Second, the good advice you learned about meticulously writing maintainable code no longer slows you down, now it speeds you up.
Just explicitly prioritize separation of concerns, with strict API modularity between them. Breaking everything into single concern chunks with good APIs. It’s less about re-use, and more about containment . Documentation, and testability. Also invest more time in ensuring that your data structures are a mirror of the solution space. That will pay huge dividends with better code.
These things have alerted been true, but now they also enable AI development so instead of accumulating technical debt for expedience sake, we get paid an efficiency subsidy in productivity for doing it right. ( or rather for herding the gerbils to do it right)
At work I use it on giant projects, but it’s less impressive there’s
My mold project is around 10k lines of code, still small.
But I don’t actually care about whether LLMs are good or bad or whatever. All I care is that I am am completing things that I wasn’t able to even start before. Doesn’t really matter to me if that doesn’t count for some reason.
> I’ve been wanting to develop a plastic -> silicone -> plaster -> clay mold making process for years, but it’s complex and mold making is both art and science. It would have been hundreds of hours before, with maybe 12 hours of Claude code I’m almost there (some nagging issues… maybe another hour).
That’s so nebulous and likely just plain wrong. I have some experience with silicone molds and casting silicone and other materials. I have no idea how you’d accurately estimate it would take hundreds of hours. But the mostly likely reason you’ve had results is that you just did it.
This sounds very very much like confirmation bias. “I started drinking pine needle tea and then 5 days later my cold got better!”
I use AI, it’s useful for lots of things, but this kind of anecdote is terrible evidence.
You may just be more knowledgeable than me. For me, even getting to algorithmic creation of 4-6 part molds, plus alternating negatives / positives in the different mediums, was insurmountable.
I’m willing to believe that I’m just especially clueless and this is not a meaningful project to an expert. But hey, I’m printing plastic negatives to make silicone positives to make plaster negatives to slip cast, which is what I actually do care about.
I had no idea you were talking about algorithmically making molds.
You’re just talking about taking a positive 3d model and automatically creating a mold for it that you 3d print?
If so I wouldn’t want that to be algorithmic because that’s never going to work in the general case. There are just too many edge cases that you have to manually handle. Might as well just create the mold in your CAD program.
Some? I'd be shocked if it's less than 70% of everything AI-related in here.
For example a lot of pro-OpenAI astroturfing really wanted you to know that 5.3 scored better than opus on terminal-bench 2.0 this week, and a lot of Anthropic astroturfing likes to claim that all your issues with it will simply go away as soon as you switch to a $200/month plan (like you can't try Opus in the cheaper one and realise it's definitely not 10x better).
"some", where "some" is scaled to match the overwhelmingly unprecedented amount of money being thrown behind all this. plus all of this is about a literal astroturfing machine, capable of unprecedented scale and ability to hide, which it's extremely clearly being used for at scale elsewhere / by others.
so yeah, it wouldn't surprise me if it was well over most. I don't actually claim that it is over half here, I've run across quite a few of these kinds of people in real life as well. but it wouldn't surprise me.
Pretty much every software engineer I've talked to sees it more or less like you do, with some amount of variance on exactly where you draw the line of "this is where the value prop of an LLM falls off". I think we're just awash in corporate propaganda and the output of social networks, and "it's good for certain things, mixed for others" is just not very memetic.
I wish this was true. My experience is co-workers who do lip service as to treating LLM like a baby junior dev, only to near-vibe every feature and entire projects, without spending so much as 10 mins to think on their own first.
It might be role-specific. I'm a solutions engineer. A large portion of my time is spent making demos for customers. LLMs have been a game-changer for me, because not only can I spit out _more_ demos, but I can handle more edge cases in demos that people run into. E.g. for example, someone wrote in asking how to use our REST API with Python.
I KNOW a common issue people run into is they forget to handle rate limits, but I also know more JavaScript than Python and have limited time, so before I'd
write:
```
# NOTE: Make sure to handle the rate limit! This is just an example. See example.com/docs/javascript/rate-limit-example for a js example doing this.
```
Unsurprisingly, more than half of customers would just ignore the comment, forget to handle the rate limit, and then write in a few months later. With Claude, I just write "Create a customer demo in Python that handles rate limits. Use example.com/docs/javascript/rate-limit-example as a reference," and it gets me 95% of the way there.
There are probably 100 other small examples like this where I had the "vibe" to know where the customer might trip over, but not the time to plug up all the little documentation example holes myself. Ideally, yes, hiring a full-time person to handle plugging up these holes would be great, but if you're resource constrained paying Anthropic for tokens is a much faster/cheaper solution in the short term.
Yup, LLMs are rocking for smaller more greenfield stuff like this. As long as you can get your results in 5-10 interactions with the bot then it works really well.
They seem to fall apart (for me, at least) when the projects get larger or have multiple people working on them.
They're also super helpful for analytics projects (I'm a data person) as generally the needed context is much smaller (and because I know exactly how to approach these problems, it's that typing the code/handling API changes takes a bunch of time).
In addition to never providing examples, the other common theme is when you dive into the author's history almost 100% of the time they just happen to work for a company that provides AI solutions. They're never just a random developer that found great use for AI, they're always someone who works somewhere that benefits from promoting AI.
In this author's case, they currently work for a company that .. wait for it .. less than 2 weeks ago launched some "AI image generation built for teams" product. (Also, oddly, the author lists himself as the 'Technical Director' at the company, working there for 5-6 years, but the company's Team page doesn't list him as an employee).
At my work I interview a lot of fresh grads and interns. I have been doing that consistently for last 4 years. During the interviews I always ask the candidates to show and tell, share their screen and talk about their projects and work at school and other internships.
Since last few months, I have seen a notable difference in the quality and extent of projects these students have been able to accomplish. Every project and website they show looks polished, most of those could be a full startup MVP pre AI days.
The bar has clearly been raised way high, very fast with AI.
I’ve had the same experience with the recent batch of candidates for a Junior Software Engineer position we just filled. Their projects looked impressive on the surface and seemed very promising.
Once we got them into a technical screening, most fell apart writing code. Our problem was simple: using your preferred programming language, model a shopping cart object that has the ability to add and remove items from the cart and track the cart total.
We were shocked by how incapable most candidates were in writing simple code without their IDEs tab completion capability. We even told them to use whatever resources they normally used.
In my opinion, it has always been the “easy” part of development to make a thing work once. The hard thing is to make a thousand things work together over time with constantly changing requirements, budgets, teams, and org structures.
For the former, greenfield projects, LLMs are easily a 10x productivity improvement. For the latter, it gets a lot more nuanced. Still amazingly useful in my opinion, just not the hands off experience that building from scratch can be now.
As others have said, the benefit is speed, not quality. And in my experience you get a lot more speed if you’re willing to settle for less quality.
But the reason you don’t see a flood of great products is that the managerial layer has no idea what to do with massively increased productivity (velocity). Ask even a Google what they’d do with doubly effective engineers and the standard answer is to lay half of them off.
> if it were really as good as a lot of people are saying there should be a massive increase in the number of high quality projects/products being developed.
The headline gain is speed. Almost no-one's talking about quality - they're moving too fast to notice the lack.
I find these agents incredibly useful for eliminating time spent on writing utility scripts for data analysis or data transformation.
But... I like coding, getting relegated to being a manager 100%? Sounds like a prison to me not freedom.
That they are so good at the things I like to do the least and still terrible at the things at which I excel. That's just gravy.
But I guess this is in line with how most engineers transition to management sometime in their 30s.
> ... but also seemingly has nothing to show for it
This x1000, I find it so ridiculous.
usually when someone hypes it up it's things like, "i have it text my gf good morning every day!!", or "it analyzed every single document on my computer and wrote me a poem!!"
The crazy pills you are taking is that thinking people have anything to prove to you. The C compiler that Anthropic created or whatever verb your want to use should prove that Claude is capable of doing reasonably complex level of making software. The problem is people have egos, myself included. Not in the inflated sense, but in the "I built a thing a now the Internet is shitting on me and I feel bad" sense. There's fundcli and nitpick on my GitHub that I created using Claude. fundcli looks at your shell history and suggests places to donate to, to support open source software you actually use. Nitpick is a TUI HN client. I've shipped others. The obvious retort is that those two things aren't "real" software; they're not complex, they're not making me any money. In fact, fundcli is costing me piles of money! As much as I can give it! I don't need anyone to tell me that or shit on the stuff I'm building.
The "open secret" is that shipping stuff is hard. Who hasn't bought a domain name for a side project that didn't go anywhere. If there's anybody out there, raise your hand! So there's another filtering effect.
The crazy pills are thinking that HN is in any way representative of anything about what's going on in our broader society. Those projects are out there, why do you assume you'll be told about it? That someone's going to write an exposé/blog post on themselves about how they had AI build a thing and now they're raking in the dollars and oh, buy my course on learning how to vibecode? The people selling those courses aren't the ones shipping software!
> The C compiler that Anthropic created or whatever verb your want to use should prove that Claude is capable of doing reasonably complex level of making software.
I don't doubt that an LLM would theoretically be capable of doing these sorts of things, nor did I intend to give off that sentiment, rather I was more evaluating if it was as practical as some people seem to be making the case for. For example, a C compiler is very impressive, but its clear from the blog post[0] that this required a massive amount of effort setting things up and constant monitoring and working around limitations of Claude Code and whatnot, not to mention $20,000. That doesn't seem at all practical, and I wonder if Nicholas Carlini (the author of the Anthropic post) would have had more success using Claude Code alongside his own abilities for significantly cheaper. While it might seem like moving the goalpost, I don't think it's the same thing to compare what I was saying with the fact that a multi billion dollar corporation whose entire business model relies on it can vibe code a C compiler with $20,000 worth of tokens.
> The problem is people have egos, myself included. Not in the inflated sense, but in the "I built a thing a now the Internet is shitting on me and I feel bad" sense.
Yes, this is actually a good point. I do feel like there's a self report bias at play here when it comes to this too. For example, someone might feel like they're more productive, but their output is roughly the same as what it was pre-LLM tooling. This is kind of where I'm at right now with this whole thing.
> The "open secret" is that shipping stuff is hard. Who hasn't bought a domain name for a side project that didn't go anywhere. If there's anybody out there, raise your hand! So there's another filtering effect.
My hand is definitely up here, shipping is very hard! I would also agree that it's an "open secret", especially given that "buying a domain name for a side project that never goes anywhere" is such a universal experience.
I think both things can be true though. It can be true that these tools are definitely a step up from traditional IDE-style tooling, while also being true that they are not nearly as good as some would have you believe. I appreciate the insight, thanks for replying.
> I wonder if Nicholas Carlini (the author of the Anthropic post) would have had more success using Claude Code alongside his own abilities for significantly cheaper.
You're thinking like an individual, not a corporation. $20,000 is a lot of money for me to go and pay the bill for as an individual. That's a car for most of America! However, if I'm earning $20,000/year at my job, that's peanuts. Thus Mr. Carlini (whom surely makes vastly more than $20,000/year) being able to do, what previously would have taken a team of people to do, is nothing short of astounding. I don't know how well the compiler stacks up against, say clang or gcc, the real question is how much did it cost Intel to make v0.1 of icc.
> For example, someone might feel like they're more productive, but their output is roughly the same as what it was pre-LLM tooling.
There is just no comparison. It's not about how much faster it is, it's about could I have attempted this project before? Yes. Would I have attempted it? Probably not! The start up cost for a project was just so high that I've a list of things that I'd love to attempt but never made the time for. With AI, I'm slowly knocking things off that list (most of them don't actually go anywhere, but there's an itch to scratch, as a hobby).
> not nearly as good as some would have you believe.
Hallucinations from LLMs are interesting as a concept, but they can hardly be blamed for it as they learned to ability from humans. (Some) humans love to blow smoke up your ass in pursuit of the all mighty dollar. LLMs have their limitations. There's some prognostication about the future, but I'm interested in what they can do today.
If people make extraordinary claims, I expect extraordinary proofs…
Also, there is nothing complex in a C compiler. As students we built these things as toy projects at uni, without any knowledge of software development practices.
> The reality: 3 weeks in, ~50 hours of coding, and I'm mass-producing features faster than I can stabilize them. Things break. A lot. But when it works, it works.
Making predictions about the future are always fascinating because you get to see what someone got wrong or right. You see this as the apex of hype, I think we're at the point before the exponential growth happens.
Matches my experience pretty well. FWIW, this is the opinion that I hear most frequently in real life conversation. I only see the magical revelation takes online -- and I see a lot of them.
LLMs have made a huge transformative change in my coding. For some projects 95% of the code is written by LLMs. This is all on internal projects and internal tools right now, though, because on the external projects I'm still easing into using it in a very carefully curated way, e.g. a method or an algorithm at a time, rather than a 10KLOC folder full of class files. These internal products are 95% of the work being done, though. It's just that they are under tight control when they are running locally and bugs and crashes are immediately visible and it's easy to debug and deploy fixes, unlike with say web-based stuff on a remote server.
So, I've very little to publicly show for all my obnoxious LLM advocacy. I wonder if any others are in the same boat?
> To be fair, I was able to get it to work pretty well after giving it extremely detailed instructions and monitoring the "thinking" output and stopping it when I see something wrong there to correct it, but at that point I felt silly for spending all that effort just driving the bot instead of doing it myself.
This is the challenge I also face, it's not always obvious when a change I want will be properly understood by the LLM. Sometimes it one shots it, then others I go back and forth until I could have just done it myself. If we have to get super detailed in our descriptions, at what point are we just writing in some ad-hoc "programming language" that then transpiles to the actual program?
Maybe it is language specific? Maybe LLMs have a lot of good JavaScript/TypeScript samples for training and it works for those devs (e.g. me). I heard that Scala devs have problems with LLMs writing code too. I am puzzled by good devs not managing to get LLM work for them.
I definitely think it's language specific. My history may deceive me here, but i believe that LLMs are infinitely better at pumping out python scripts than java. Now i have much, much more experience with java than python, so maybe it's just a case of what you don't know.... However, The tools it writes in python just work for me, and i can incrementally improve them and the tools get rationally better and more aligned with what i want.
I then ask it to do the same thing in java, and it spends a half hour trying to do the same job and gets caught in some bit of trivia around how to convert html escape characters, for instance, s.replace("<", "<").replace(">", ">").replace("\"").replace("""); as an example and endlessly compiles and fails over and over again, never able to figure out what it has done wrong, nor decides to give up on the minutia and continue with the more important parts.
Maybe it's because there's no overall benefit to these things.
There's been a lot of talk about it for the past few years but we're just not seeing impacts. Oh sure, management talk it up a lot, but where's the corresponding increase in feature delivery? Software stability? Gross profit? EBITDA?
Give me something measurable and I'll consider it.
I feel the same way, but I'm not too dismissive of it in public because I haven't given too much dollars to the gold rush shovel sellers to really try the best models.
I'm mostly a freeloader, so how could I judge people who put in the tokens equivalent to 15 years worth of electricity (incl heating and hot water) bills for my home in a C compiler?
Well, I can see that Anthropic is still an AI company, not a software company, they're granting us access to their most valuable resource that almost doesn't require humans, for a very reasonable fee, allowing us to profit instead of them. They're philanthropists.
I’m working on a solo project, a location-based game platform that includes games like Pac-Man you play by walking paths in a park. If I cut my coding time to zero, that might make me go two or three times faster. There is a lot of stuff that is not coding. Designing, experimenting, testing, redesigning, completely changing how I do something, etc. There is a lot more to doing a project than just coding. I am seeing a big speed up, but that doesn’t mean I can complete the project in a week. (These projects are never really a completed anyway, until you give up on it).
I think it’s just very alien in that things which tend to be correlated in humans may not be so correlated in LLMs. So two things that we expect people to be similarly good at end up being very different in an AI.
It does also seem to me that there is a lot of variance in skills for prompting/using AI in general (I say this as someone who is not particularly good as far as I’m aware – I’m not trying to keep tips secret from you). And there is also a lot of variance in the ability for an AI to solve problem of equal difficulty for a human.
I like it because it lets me shoot off a text about making a plot I think about on the bus connecting some random data together. It’s nice having Claude code essentially anywhere. I do think that this is a nice big increment because of that. But also it suffers the large code base problems everyone else complains about. Tbh I think if its context window was ten times bigger this would be less of an issue. Usually compacting seems to be when it starts losing the thread and I have to redirect it.
> To be fair, I was able to get it to work pretty well after giving it extremely detailed instructions ...
What makes the difference is that agents can create these instructions themselves and monitor themselves and revert actions that didn't follow instructions. You didn't fet there because you achieved satisfactory results with semi-manual solutions. But people who abhor manual are getting there already.
I'd be curious if a middle layer like this [0] could be helpful? I've been working on it for some time (several iterations now, going back and forth between different ideas) and am hoping to collect some feedback.
The main difference could be that you have an existing code base (probably quite extensive and a bit legacy?). If the llm can start from scratch it will write code “in its own way”, that it can probably grasp and extend better than what is already there. I even have the impression that Claude can struggle with code that GPT-5 wrote sometimes.
I remember when Anthropic was running their Built with Claude contest on reddit. The submissions were few and let's just say less than impressive. I use Claude Code and am very pro-AI in general, but the deeper you go, the more glaring the limitations become. I could write an essay about it, but I feel like there's no point in this day and age, where floods of slop in fractured echo chambers dominate.
From what I get out of this is that these models are trained on basic coding and not enterprise level where you have thousands and thousands of project files all intertwined and linked with dependencies. It didn’t have access to all of that.
> it's incredibly obvious that while these tools are surprisingly good at doing repetitive or locally-scoped tasks, they immediately fall apart when faced with the types of things that are actually difficult in software development and require non-trivial amounts of guidance and hand-holding to get things right
I used this line for a long time, but you could just as easily say the same thing for a typical engineer. It basically boils down to "Claude likes its tickets to be well thought out". I'm sure there is some size of project where its ability to navigate the codebase starts to break down, but I've fed it sizeable ones and so long as the scope is constrained it generally just works nowadays
The difference is a real engineer will say "hey I need more information to give you decent output." And when the AI does do that, congrats, the time you spend identifying and explaining the complexity _is_ the hard time consuming work. The code is trivial once you figure out the rest. The time savings are fake.
I always find that characterization of Grey and the Cortex podcast to be weird. He never claims to be a productivity master or the most productive person around. Quite the opposite, he has said multiple times how much he is not naturally productive, and how he actually kinda dislikes working in general. The systems and habits are the ways he found to essentially trick himself into working.
Which I think is what people gather from him, but somehow think he's hiding it or pretending is not the case? Which I find strange, given how openly he's talked about it.
As for his productivity going down over time, I think that's a combination of his videos getting bigger scopes and production values, and also he moving some of his time into some not so publicly visible ventures. E.g., he was one of the founders of Standard, which eventually became the Nebula streaming service (though he left quite a while ago now).
> Which I think is what people gather from him, but somehow think he's hiding it or pretending is not the case? Which I find strange, given how openly he's talked about it.
Well the person you're responding to didn't say anything like that. They're saying he's unqualified.
> The systems and habits are the ways he found to essentially trick himself into working.
And do they work? If he's failing or fooling himself then a big chunk of his podcasting is wasting everyone's time.
> videos getting bigger scopes and production values
I looked at a video from last year and one from eight years ago and they're pretty similar in production value. Lengths seem similar over time too.
> moving some of his time into some not so publicly visible ventures
I can see he's done three members-only videos in the last two years, in addition to four and a half public videos. Is there anything else?
> Well the person you're responding to didn't say anything like that. They're saying he's unqualified.
When they said "It's the appearance of productivity, not actual productivity.", that does very much sound to me like an accusation that he is pretending or trying to deceive you into thinking he's a super productive person.
> And do they work? If he's failing or fooling himself then a big chunk of his podcasting is wasting everyone's time.
I'm afraid I'm not close enough to Mr Grey to be able to confidently say one way or another. Everything seems to indicate that he is a fairly successful individual, as a YouTuber with a big following and founder of at least two companies that seems to be going pretty well. So unless he is incredibly lucky and keeps failing upwards, if I had to guess, I'd say he has had at least some success in making himself work on stuff from time to time.
> I looked at a video from last year and one from eight years ago and they're pretty similar in production value. Lengths seem similar over time too
Really? I mean, let's look at some concrete examples. His latest video [1] features many unique drawings, extensive animations, even some 3d stuff with the rotating globes, and almost every scene has an actual drawn background layer.
Meanwhile, one of his biggest videos from 9 years ago [2] is pretty much just a slideshow, with no animations, and most of the video features a static generic white background.
The overarching style (i.e. stick figures, no elaborate textures) is the same, and I guess this is a partially a subjective point, but I think it's a bit crazy to say the visuals in these two videos are of similar quality.
For an example of stuff other than just the animation itself, he put out the Rock Paper Scissors video [3] two years ago, which had a pretty insane huge scope (though that might not be obvious at first glance)
> I can see he's done three members-only videos in the last two years, in addition to four and a half public videos. Is there anything else?
By definition, I'm not aware of stuff he's not made public. I just know that there is stuff that he chooses not to talk much about (he never once mentioned the Standard stuff on his podcast, for example). He also handles a good portion of the backend stuff for the Cortex Brand line of products (I think managing/planning logistics/inventory?). I'm not a member of his channel or his Patreon so I can't tell you how much he invests in exclusive videos, or if there is some other work he discloses over those channels that he doesn't in others.
> Really? I mean, let's look at some concrete examples.
That's not his most recent video, it's a fix of a 2022 video. And the channel still had pretty good output 3-4 years ago.
I compared the nickels video instead, to the worst ID system in America, and they seemed to be similar levels of embellished slideshow.
> By definition, I'm not aware of stuff he's not made public.
I thought you meant paid access stuff and it's easy to see a list of those. If you're suggesting secret videos then uh maybe but that's kind of a weird assumption.
And whatever happened with standard was too long ago to be the problem here.
> He also handles a good portion of the backend stuff for the Cortex Brand line of products (I think managing/planning logistics/inventory?).
That might be the answer but it seems like a waste of his productivity potential.
> That's not his most recent video, it's a fix of a 2022 video.
That's fair, I didn't notice that.
> I compared the nickels video instead, to the worst ID system in America, and they seemed to be similar levels of embellished slideshow.
He still has videos that are simpler. But back then he had nothing that came even close to those big productions he releases from time to time.
> I thought you meant paid access stuff and it's easy to see a list of those. If you're suggesting secret videos then uh maybe but that's kind of a weird assumption.
I'm suggesting he may work on stuff other than videos. Like non-general public facing/non personality driven businesses. Like Cortex Brand, and the Standard stuff before it. He obviously talked a lot about the Cortex Brand stuff, but he kept Standard on the down low. I don't cite Standard as a reason that he is not putting out videos right now, I cite Standard as evidence he isn't necessarily shouting from the rooftops every time he creates a business. So it stands to reason that he may have had other similarly "secret" ventures over the years.
> That might be the answer but it seems like a waste of his productivity potential.
I don't consume their products (they seem nice but they're far too expensive for my third world salary), so selfishly I'd also prefer if he focused more of his time on the videos. But that's an entirely different conversation from "he just pretends to be productive and actually gets next to nothing done".
So you're walking into this hoping that it's an actual AI and not just an LLM?
interesting.
how much planning do you put into your project without AI anyway?
Pretty much all the teams I've been involved in:
- never did any analysis planning, and just yolo it along the way in their PR
- every PR is an island, with tunnel vision
- fast forward 2 years. and we have to throw it out and start again.
So why are you thinking you're going to get anything different with LLMs?
And plan mode isn't just a single conversation that you then flip to do mode...
you're supposed to create detailed plans and research that you then use to make the LLM refer back to and align with.
well you might experience the same thing with a junior developer, but in the end the effort of training the junior is worth it, no? only because you're developing a human? i have to say doing the work instead of the junior, because the junior makes mistakes is not a good route. so taking time to teach the agent? maybe worth it...
Frankly, it sounds like you have a lot to learn about agentic coding. It’s hard to define exactly what makes some of us so good at using it, and others so poor, but agentic coding has been life changing for myself and the folks I’ve tutored on its use. We’re all using the same tools, but subtle differences can make a big difference.
The pattern matching and absence or real thinking is still strong.
Tried to move some excel generation logic from epplus to closedxml library.
ClosedXml has basically the same API so the conversion was successful. Not a one-shot but relatively easy with a few manual edits.
But closedxml has no batch operations (like apply style to the entire column): the api is there but internal implementation is on cell after cell basis. So if you have 10k rows and 50 columns every style update is a slow operation.
Naturally, told all about this to codex 5.3 max thinking level. The fucker still succumbed to range updates here and there.
Told it explicitly to make a style cache and reuse styles on cells on same y axis.
5-6 attempts — fucker still tried ranges here and there. Because that is what is usually done.
It's almost as if being able to generate boilerplate code is only like 5% of software development.
That being said, its great at generating boilerplate code or in my case, doing something like 'make a react component here please that does this small thing, and is aligned with the style in the rest of the file'. Good for when I need to work with code bases or technologies that are not my daily. Also a great research assistant.
But I guess being a 'better google' or a 'glorified spellchecker' doesn't get that hype money.
I think unpopularly there's some fake comments in the discourse led by financial incentives, and also a mix of some fear-based "wanting to feel like things are OK" or dissonance-avoiding belief around this thats leading to the opinions we hear.
It also kinda feels gaslightish and as I've said in some controversial replies in other posts, its sort of eerily mass "psychosis" vibes just like during COVID.
This is a fresh perspective for me. I'm around 25 and have been struggling with finding some kind of path towards making my career into something sustainable long-term, but never really considered the other side. I think the issue many have on my end is that they don't really have much of anything to stand on while they rebuild yet, whereas they might think that someone more experienced could pivot to business and people-oriented roles by leveraging what they have now. I know many people personally struggling to find work as it is right out of school, and many have student loans which exacerbate the situation. For a lot of people, starting from scratch is not realistically feasible in the near future unless they're content with being homeless for a while.
Of course labor jobs will always exist, and a 25 year old would (on average) be much more physically able for that than someone older, so it goes both ways.
A mortgage: if you were assuming a strong income that would continue, you very likely could be forced to sell your house and take a huge loss
A family, kids: people relying on you
Time: at this point you have retirement plans and financial deadlines you need to hit if it's to ever become a reality
God forbid you have any health issues that cost $$$ which tend to come as you age. Can you afford to lose health insurance?
If you think about re-skilling and starting off at entry level.. people don't really want to hire older beginners.
Of course that's absolute worst case scenario, but I guarantee there are a lot of people there.
I'd 100% choose living out of my car for a while. In your 20s you can upend everything and completely reinvent yourself. Time, minimal responsibilities and energy are priceless
> could pivot to business and people-oriented roles by leveraging what they have now
There's a reason that's really vague, right? Because who knows if it'll be available
I don't think AI is gunna reach this point but who knows. It's not off the table
If enough people have nothing left to lose, the French Revolution will most likely be the outcome. Or a working UBI. If programmers aren't safe, I can't imagine most other professions won't be on the chopping block as well.
There's a lot of this forum in exactly that position. The fear is real; there is a real risk this AI destroys families and people's lives in the disruption.
I understand this perspective, but it's like... I would like to have a house and kids and all those things you mentioned, even if it was hard. That's not an option, financially, for a lot of young people
I'm becoming concerned with the rate at which major software systems seem to be failing as of late. For context, last year I only logged four outages that actually disrupted my work; this quarter alone I'm already on my fourth, all within the past few weeks. This is, of course, just an anecdote and not evidence of any wider trend (not to mention that I might not have even logged everything last year), but it was enough to nudge me into writing this today (helped by the fact that I suddenly had some downtime). Keep in mind, this isn't necessarily specific to this outage, just something that's been on my mind enough to warrant writing about it.
It feels like resiliency is becoming a bit of a lost art in networked software. I've spent a good chunk of this year chasing down intermittent failures at work, and I really underestimated how much work goes into shrinking the "blast radius", so to speak, of any bug or outage. Even though we mostly run a monolith, we still depend on a bunch of external pieces like daemons, databases, Redis, S3, monitoring, and third-party integrations, and we generally assume that these things are present and working in most places, which wasn't always the case. My response was to better document the failure conditions, and once I did, realize that there was many more than we initially thought. Since then we've done things like: move some things to a VPS instead of cloud services, automate deployment more than we already had, greatly improve the test suite and docs to include these newly considered failure conditions, and generally cut down on moving parts. It was a ton of effort, but the payoff has finally shown up: our records show fewer surprises which means fewer distractions and a much calmer system overall. Without that unglamorous work, things would've only grown more fragile as complexity crept in. And I worry that, more broadly, we're slowly un-learning how to build systems that stay up even when the inevitable bug or failure shows up.
For completeness, here are the outages that prompted this: the AWS us-east-1 outage in October (took down the Lightspeed R series API), the Azure Front Door outage (prevented Playwright from downloading browsers for tests), today’s Cloudflare outage (took down Lightspeed’s website, which some of our clients rely on), and the Github outage affecting basically everyone who uses it as their git host.
It's money, of course. No one wants to pay for resilience/redundancy. I've launched over a dozen projects going back to 2008, clients simply refuse to pay for it, and you can't force them. They'd rather pinch their pennies, roll the dice and pray.
> No one wants to pay for resilience/redundancy. I've launched over a dozen projects going back to 2008, clients simply refuse to pay for it, and you can't force them. They'd rather pinch their pennies, roll the dice and pray.
Well, fly by night outfits will do that. Bigger operations like GitHub will try to do the math on what an outage costs vs what better reliability costs, and optimize accordingly.
Look at a big bank or a big corporation's accounting systems, they'll pay millions just for the hot standby mainframes or minicomputers that, for most of them, would never be required.
> Bigger operations like GitHub will try to do the math on what an outage costs vs what better reliability costs, and optimize accordingly.
Used to, but it feels like there is no corporate responsibility in this country anymore. These monopolies have gotten so large that they don't feel any impact from these issues. Microsoft is huge and doesn't really have large competitors. Google and Apple aren't really competing in the source code hosting space in the same way GitHub is.
> Take the number of vehicles in the field, A, multiply it by the probable rate of failure, B, then multiply it by the result of the average out of court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.
> Look at a big bank or a big corporation's accounting systems
Not my experience. Any banking I used, in multiple countries, had multiple and significant outages and some of them where their cards have failed to function. Do a search of "U.S. Bank outage" to see how many outages have happened so far this year.
Modern internet company backends are very complex, even on a good day they're at the outer limits of their designers' and operators' understanding, & every day they're growing and changing (because of all the money and effort that's being spent on them!). It's often a short leap to a state that nobody thought of as a possibility or fully grasped the consequences of. It's not clear that it would be practical with any amount of money to test or rule out every such state in advance. Some exciting techniques are being developed in that area (Antithesis, formal verification, etc) but that stuff isn't standard of care for a working SWE yet. Unit tests and design reviews only get you so far.
I've worked at many big banks and corporations. They are all held together with the proverbial sticky tape, bubblegum, and hope.
They do have multiple layers of redundancies, and thus have the big budgets, but they won't be kept hot, or there will be some critical flaws that all of the engineers know about but they haven't been given permission/funding to fix, and are so badly managed by the firm, they dgaf either and secretly want the thing to burn.
There will be sustained periods of downtime if their primary system blips.
They will all still be dependent on some hyper-critical system that nobody really knows how it works, the last change was introduced in 1988 and it (probably) requires a terminal emulator to operate.
I've worked on software used by these and have been called in to help support from time to time. One customer which is a top single digit public company by market cap (they may have been #1 at the time, a few years ago) had their SAP systems go down once every few days. This wasn't causing a real monetary problem for them because their hot standby took over.
They weren't using mainframes, just "big iron" servers, but each one would have been north of $5 million for the box alone, I guess on a 5ish year replacement schedule. Then there's all the networking, storage, licensing, support, and internal administration costs for it which would easily cost that much again.
Now people will say SAP systems are made entirely of dict tape and bubblegum. But it all worked. This system ran all their sales/purchasing sites and portals and was doing a million dollars every couple of minutes so that all paid for itself many times over during the course of that bug. Cold standby would not have cut it. Especially since these big systems take many minutes to boot and HANA takes even longer to load from storage.
These companies do take it seriously, on the software side, but when it comes to configurations, what are you going to do:
Either play it by ear, or literally double your cloud costs for a true, real prod-parallel to mitigate that risk. It looks like even the most critical and prestigious companies in the world are doing the former.
> Either play it by ear, or literally double your cloud costs for a true, real prod-parallel to mitigate that risk.
There's also the problem that doubling your cloud footprint to reduce the risk of a single point of failure introduces new risks: more configuration to break, new modes of failure when both infrastructures are accidentally live and processing traffic, etc.
Back when companies typically ran their own datacenters (or otherwise heavily relied on physical devices), I was very skeptical about redundant switches, fearing the redundant hardware would cause more problems than it solved.
I'm not sure, it's only money. People could have a lot of simpler cheaper software, by relying on core (OS) features instead of rolling there own, or relying on bloated third-parties, but a lot don't due to cargo culting.
And tech hype. Infrastructure to mitigate here isn't expensive. In many cases quite the opposite. The expensive thing is that you made yourself dependent on these services. Sometimes this is inevitable, but to host on GitHub is a choice.
…can I make the case that this might be reasonable? If you’re not running a hospital†, how much is too much to avoid a few hours of downtime around once a year?
† Hopefully there aren’t any hospitals that depends on GitHub being continuously available?
This is true. But unfortunately the exact same process is used even for critical stuff (the crowdstrike thing for example). Maybe there needs to be a separate swe process for those things as well, just like there is for aviation. This means not using the same dev tooling, which is a lot of effort.
To agree with the comments it seems likely it's money which has begun to result in a slow "un-learning how to build systems that stay up even when the inevitable bug or failure shows up."
I don't know anything about githubs codebase, but as a user, their software has many obvious deficiencies. The most glaring being performance. Oh my God, github performs like absolute shit on large repos and big diffs.
Performance issues always scare me. A lot of the time it's indicative of fragile systems. Like with a lot of banking software - the performance is often bad because the software relies on 10 APIs to perform simple tasks.
I doubt this is the case with GitHub, but it still makes you wonder about their code and processes. Especially when it's been a problem for many years, with virtually no improvement.
Yep, this sums it up perfectly for me. I tend to stay away from the extra stuff since the quality is hit or miss (more often hit than miss to be fair), but really there’s something special about having something like it available. I think as a freely available package Nextcloud is immensely valuable to me. I never say anything bad about it without mentioning that in the same breath nowadays.
Nextcloud is something I have a somewhat love-hate relationship with. On one hand, I've used Nextcloud for ~7 years to backup and provide access to all of my family's photos. We can look at our family pictures and memories from any computer, and it's all private and runs mostly without any headaches.
On the other hand, Nextcloud is so far from being something like Google Docs, and I would never recommend it as a general replacement to someone who can't tolerate "jank", for lack of a better word. There are so many small papercuts you'll notice when using it as a power user. Right off the top of my head, uploading large files is finicky, and no amount of web server config tinkering gets it to always work; thumbnail loading is always spotty, and it's significantly slower than it needs to be (I'm talking orders of magnitude).
With all that said, I'm so grateful for Nextcloud since I don't have a replacement, and I would prefer not having all our baby and vacation pictures feeding some big corporation's AI. We really ought to have a safe, private place to store files in 2025 that the average person can wrap their head around. I only wish my family took better advantage of it, since I'm essentially providing them with unlimited storage.
That sounds really promising, maybe my family would be better suited to something like that.
I will say though, Nextcloud is almost painless when it comes to management. I’ve had one or two issues in the past, but their “all in one” docker setup is pretty solid, I think. It’s what I’ve been using for the last year or so.
I think the "local maximum" we've gotten stuck at for application hosting is having a docker container as the canonical environment/deliverable, and injecting secrets when needed. That makes it easy to run and test locally, but still provides most of the benefits I think (infrastructure-as-code setups, reproducibility, etc). Serverless goes a little too far for most applications (in my opinion), but I have to admit some apps work really well under that model. There's a nearly endless number of simple/trivial utilities which wouldn't really gain anything from having their own infrastructure and would work just fine in a shared or on-demand hosting environment, and a massively scaled stateless service would thrive under a serverless environment much more than it would on a traditional server.
That's not to say that I think serverless is somehow only for simple or trivial use cases though, only that there's an impedance mismatch between the "classic web app" model, and what these platforms provide.
You are ready for misterio: https://github.com/daitangio/misterio
A tiny layer around stareless docker cluster.
I created it for my homelab and it gone wild
Docker is much like microservices. Appropriate for a subset of apps and yet touted as being 'the norm' when it shouldn't be.
There are drawbacks to using docker, such as security patching and operational overhead. And if you're blindly putting it into every project, how are you mitigating the risks it introduces?
Worse, the big reason it was useful, managing dependency hell, has largely been solved by making developers default to not installing dependencies globally.
We don't really need Docker anywhere near like we used to, and yet it persists as the default, unassailable.
Of course hosting companies must LOVE it, docker containers must increase their margins by 10% at least!
Someone else down thread has mentioned a tooling fetish, I feel Docker is part of that fetish.
It has downsides and risks involved, for sure. I think the security part is perhaps a bit overblown, though. In any environment, the developers either care about staying on top of security or they don't. In my experience, a dev team that skips proper security diligence when using Docker likely wouldn't handle it well outside of Docker either. The number of boxes out there running some old version of Debian that hasn't been patched in the last decade is probably higher than any of us would like.
Although I'm sure many people just do it because they believe (falsely) that it's a silver bullet, I definitely wouldn't call it part of a "tooling fetish". I think it's a reasonable choice much more often than the microservice architecture is.
Hard disagree. I've used Docker predominantly in monoliths, and it has served me well. Before that I used VMs (via Vagrant). Docker certainly makes microservices more tenable because of the lower overhead, but the core tenets of reproducibility and isolation are useful regardless of architecture.
There's some truth to this too honestly. At $JOB we prototyped one of our projects in Rust to evaluate the language for use, and only started using Docker once we chose to move to .NET, since the Rust deployment story was so seamless.
Haven't deployed production Java in years, so I won't speak to it. However, even with Go's static binaries, I'd like to leverage the same build and deploy process as other stacks. With Docker a Go service is no different than a Python service. With Docker, I use the same build tool, instrument health checks similarly, etc.
Standardization is major. Every major cloud has one (and often several) container orchestration services, so standardization naturally leads to portability. No lock-in. From my local to the cloud.
Even when running things in their own box, I likely want to isolate things from one another.
For example, different Python apps using different Python versions. venvs are nice but incomplete; you may end up using libraries with system dependencies.
I deeply disagree. Docker’s key innovation is not its isolation; it’s the packaging. There is no other language-agnostic way to say “here’s code, run it on the internet”. Solutions prior to Docker (eg buildpacks) were not so much language agnostic as they were language aware.
Even if you allow yourself the disadvantage that any non-Docker solution won’t be language-agnostic: how do you get the code bundle to your server? Zip & SFTP? How do you start it? ./start.sh? How do you restart under failure? Systemd? Congrats, you reinvented docker but worse. Want to upgrade a dependency due to a security vulnerability? Do you want to SSH into N replicated VMs and run your Linux distribution specific package update command, or press the little refresh icon in your CI to rebuild a new image then be done?
Docker is the one good thing the ops industry has invented in the last 15 years.
This is a really nice insight. I think years of linux have kind of numbed me to this. I've spent so much time on systems which use systemd now that going back to an Alpine Linux box always takes me a second to adjust, even though I know more or less how to do everything on there. I think docker's done a lot to help with that though since the interface is the same everywhere. A typical setup for me now is to have the web server running on the host and everything else behind docker, since that gives me the benefit of using the OS's configuration and security updates for everything exposed to the outside world (firewalls, etc).
Another thing about packaging. I've started noticing myself subconsciously adding even a trivial Dockerfile for most of my projects now just in case I want to run it later and not hassle with installing anything. That way it gives me a "known working" copy which I can more or less rely on to run if I need to. It took a while for me to get to that point though
It's all the same stuff. Docker just wraps what you'd do in a VM.
For the slight advantage of deploying every server with a single line, you've still got to write the mutli-line build script, just for docker instead. Plus all the downsides of docker.
There's another idea too, that docker is essentially a userspace service manager. It makes things like sandboxing, logging, restarting, etc the same everywhere, which makes having that multi-line build script more valuable.
In a sense it's just the "worse is better" solution[0], where instead of applying the good practices (sandboxing, isolation, good packaging conventions, etc) which leads to those benefits, you just wrap everything in a VM/service manager/packaging format which gives it to you anyway. I don't think it's inherently good or bad, although I understand why it leaves a bad taste in people's mouths.
Docker images are self-running. Infrastructure systems do not have to be told how to run a Docker image; they can just run them. Scripts, on the other hand, are not; at the most simple level because you'd have to inform your infrastructure system what the name of the script is, but more comprehensively and typically because there's often dependencies the run script implies of its environment, but does not (and, frankly, cannot) express. Docker solves this.
> Docker just wraps what you'd do in a VM.
Docker is not a VM.
> Plus all the downsides of docker.
Of which you've managed to elucidate zero, so thanks for that.
EDIT: I’m leaving the comment up so the replies make sense, but I completely missed the point here. That’s what I get for writing dismissive hacker news comments on my lunch break!
I find it kind of hard to take this seriously since the JS snippet has a glaringly obvious syntax error and two glaringly obvious bugs which demonstrate that the author didn’t really think too hard about the point they’re trying to make.
I understand the point they’re trying to make, that being that rust forces you to explicitly deal with the complexity of the problem rather than implicitly. It’s just that they conveniently ignore that the JavaScript version requires the programmer to understand things like how async await works, iterators (which they use incorrectly), string interpolation, etc. Just using typescript type annotations alone already gives the js version nearly all the explicitness of rust.
> I understand the point they’re trying to make, that being that rust forces you to explicitly deal with the complexity of the problem rather than implicitly
I read it again and understand what you mean. I apologize for commenting like that so quickly, I was on my phone and typed that comment out before I really had time to digest the contents.
To be fair, I wasn't using it in the way the parent comment described, for me I said: "this person speaking Lebanese/Syrian Arabic said something that sounded like [try my best to replicate the sentence]. What did they most likely mean?" and got a pretty much spot-on answer.
I wonder if this ability translates to other languages, but I wouldn't be able to tell. My Arabic is "good enough" to tell that the translations I got were good, but I'd be interested to here from someone who knows more if, for example fuzhounese translation is any good.
reply