Hacker Newsnew | past | comments | ask | show | jobs | submit | symfrog's commentslogin

On what basis are you making that prediction?

Any sufficiently complicated LLM generated program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of an open source project.

We had an effort recently where one much more experienced dev from our company ran Claude on our oldish codebase for one system, with the goal of transforming it into newer structure, newer libraries etc. while preserving various built in functionalities. Not the first time this guy did such a thing and he is supposed to be an expert.

I took a look at the result and its maybe half of stuff missing completely, rest is cryptic. I know that codebase by heart since I created it. From my 20+ years of experience correcting all this would take way more effort than manual rewrite from scratch by a senior. Suffice to say thats not what upper management wants to hear, llm adoption often became one of their yearly targets to be evaluated against. So we have a hammer and looking for nails to bend and crook.

Suffice to say this effort led nowhere since we have other high priority goals, for now. Smaller things here & there, why not. Bigger efforts, so far sawed-off 2-barrel shotgun loaded with buckshot right into both feet.


Not to take away from your experience but to offer a counterpoint.

I used claude code to port rust pdb parsing library to typescript.

My SumatraPDF is a large C++ app and I wanted visibility into where does the size of functions / data go, layout of classes. So I wanted to build a tool to dump info out of a PDB. But I have been diagnosed with extreme case of Rustophobiatis so I just can't touch rust code. Hence, the port to typescript.

With my assistance it did the work in an afternoon and did it well. The code worked. I ran it against large PDB from SumatraPDF and it matched the output of other tools.

In a way porting from one language to another is extreme case of refactoring and Claude did it very well.

I think that in general (your experience notwithstanding) Claude Caude is excellent at refactorings.

Here are 3 refactorings from SumatraPDF where I asked claude code to simplify code written by a human:

https://github.com/sumatrapdfreader/sumatrapdf/commit/a472d3... https://github.com/sumatrapdfreader/sumatrapdf/commit/5624aa... https://github.com/sumatrapdfreader/sumatrapdf/commit/a40bc9...

I hope you agree the code written by Claude is better than the code written by a human.

Granted, those are small changes but I think it generalizes into bigger changes. I have few refactorings in mind I wanted to do for a long time and maybe with Claude they will finally be feasible (they were not feasible before only because I don't have infinite amount of time to do everything I want to do).


“I want this thing, but in a different language” seems to be something that the current generation of cutting edge LLMs are pretty good at.

Translating a vibe is something the Ur-LLMS (GPT3 etc) were very good at so it’s not entirely surprising that the current state of the art is to be found in things of a “translate thing X that already exists into context Y” nature.


All software before LLMs had a copious number of bugs, many of which were never fixed.

Was software ever a moat? Software typically only gave companies a small window of opportunity to turn a fleeting software advantage into a more resilient moat (network effects, switching costs etc.)

Yes, I would argue good (stable, fast, easy to use) software was somewhat of a moat and much harder before coding agents.

Stripe, Square, Shopify, Google, all thrived in some part because their services take a hard problem and make it easier to use. Now more people can take a hard problem and make it easier to use.

All you have to do is look around (esp 5+ years ago) and see the many many BAD, unstable, hard to use, slow, etc versions of these companies


Windows was a moat but it looks more like an anchor now.

Windows' moat was not the operating system code, but that they were able to get distribution via IBM, and then grow an ecosystem of applications that were targeted at Windows, which created a snowball effect for further applications.

Yes though still it was a big barrier to build an OS.

In what way is the long term impact of LLMs being underestimated? If anything, it seems that it has been overestimated in the past years and that something other than LLMs will be needed to reach the original scaled LLM hope of AGI.

Back when the Internet was America online and some CGI bin perl scripts, there were a lot of very lofty things said about the potential of the Internet in the future. I don’t remember any of them predicting the power of the tech would have over business, politics, media, and hours of every single day for billions of people. Even without AGI, it’s quite possible that were still underestimating. The effects of predictive, probabilistic computing 20 or 50 years from now.

The internet alone didnt change sh!t. Without smartphones, unified app stores, cellular network innovation et al internet traffic would not be so high.

Funny how people leave this stuff out. Yawn. Basic simpleton analysis and takes.


The Internet created the backbone that allowed for rapid experimentation in communications technologies, and created the ability for anyone to create and share technologies and reach a huge audience very quickly.

Without the Internet, most consumer electronics would have been far more expensive to build, and would have been strictly controlled walled gardens, but the Internet in general and the Web in particular allowed so many inventors to flourish. Ever since that Genie was let out of the bottle, corporate and government interests have been trying to put it back in, and most companies are trying to build and reinforce walled gardens under the banner of unified app stores that extract insane rents.


Wow this is like going on a medical forum and saying "medicine didn't change shit".

They were replying to this particular underestimation:

> AI is limited to probabilistic and annoying chatbots that are for entertainment and for looking up trivia questions.

That is not a rational assessment of the utility that the technology provides, even today.


If only that is what investors have figured out.

Unfortunately, it seems investors now think that all paid software will be replaced by AI generated software, somehow open source projects laundered through generative AI models should finally convince enterprise customers to go with free.


EXWM is great, having the same flow to manage X applications as for emacs buffers is a huge benefit. My only concern is if X11 will be maintained sufficiently into the future to keep using it, currently there is no Wayland support in EXWM.


Emacs as a Wayland compositor has been shown to be possible. If we eventually get that and threading the future might be rather rosy.

https://emacsconf.org/2022/talks/wayland/

http://perma-curious.eu/repo-ewx/


> If we eventually get that and threading

That's a really big ask. The entire ecosystem around Emacs isn't built for multithreading.


I don't mean adding threading to existing functionality, and I mostly wouldn't want that. I very strongly prefer emacs' behaviour of queueing up my input events and processing them deterministically regardless of how long it takes to get to them over eg. the JetBrains behaviour where something can happen asynchronously with my input events that can change their meaning depending on when it happens.

What I mean is having threading capabilities available for things that want to (and should) use them. AIUI some support for that was added in emacs 26, so it might already be good enough.

The relevance is that EXWM is single threaded, so the window management blocks when emacs does. I don't find that much of a problem with EXWM but I doubt it would fly for a Wayland compositor, though perhaps the separate server used in that emacsconf talk sidesteps the problem.


I've moved to openbsd for this reason. It works well and I don't have to deal with Linux drama. Toxic slug strategy is really working well for them.


I once read a comment here or reddit explaining that the X11 developers moved to Wayland because the X11 code has turned into an unmaintainable mess that can't be worked with anymore. So the reasons are not drama, but just plain old tech debt.


This pre-packaged talking point is often repeated without evidence. The vast majority of X.org developers, including all of the original ones, simply moved to other venues at one point or another. Only a few, like Daniel Stone, have made contributions to both. And it shows in how many lessons had to be re-learned.


What is your evidence? A quick search on google (and the git commits) would show you that many wayland developers are significant former xorg developers.

1. Kristian Høgsberg the founder of wayland, did all the DRI2 work on xorg before becomming frustrated 2. Peter Hutterer was a main xorg developer and has been behind the wayland input system 3. Adam Jackson, long time xorg maintainer essentially called for moving on to wayland https://ajaxnwnk.blogspot.com/2020/10/on-abandoning-x-server... (I found that he was involved in wayland discussions, but not sure if he contributed code) 4. you already mentioned Daniel Stone

The only main xorg developer not involved in wayland arguably could be Keith Packard, although he made a lot of the changes for xwayland so I'm not sure if it is correct to say he did not have wayland involvement.

So who are the "vast majority of X.org developers"? I think people always read about the couple of names above and then think, "well there must have been hundreds of others", because they thought xorg was like the linux kernel. AFAIK xorg always only had low 10s of active developers.


My evidence is the git commit log: https://desuarchive.org/g/thread/84460945/#q84481507

This doesn’t even include the XFree86 CVS commit history and older, which accounts for most of the code in X.org. Some of those people may actually be dead now.

>AFAIK xorg always only had low 10s of active developers.

There are 38 people with 100+ commits, which obviously counts as a major contributor.


Openbsd has brought in x11 into their own codebase: https://xenocara.org/

This is why openbsd is great.

I don't care about the drama that happens in Linux land at all.


The drama was mostly over whether or not Wayland should have been the replacement. AFAIU, everyone agreed X11 development was effectively unsustainable or at least at a dead end.


Wayland is not a solution, just a name for some protocols... It's either KDE or Gnome (with it's weird quirks) or some alternative.


So is X11, though the reference implementation of X11 is also widely agreed to have some serious problems going forward on top of problems with the protocol itself.


I'm really happy with OpenBSD also. What is toxic slug strategy?



Do you have a link to some of the code that you have produced using this approach? I am yet to see a public or private repo with non-trivial generated code that is not fundamentally flawed.


This one was a huge success:

https://github.com/micahscopes/radix_immutable

I took an existing MIT licensed prefix tree crate and had Claude+Gemini rewrite it to support immutable quickly comparable views. The execution took about one day's work, following two or three weeks thinking about the problem part time. I scoured the prefix tree libraries available in rust, as well as the various existing immutable collections libraries and found that nothing like this existed. I wanted O(1) comparable views into a prefix tree. This implementation has decently comprehensive tests and benchmarks.

No code for the next two but definitely results...

Tabu search guided graph layout:

https://bsky.app/profile/micahscopes.bsky.social/post/3luh4d...

https://bsky.app/profile/micahscopes.bsky.social/post/3luh4s...

Fast Gaussian blue noise with wgpu:

https://bsky.app/profile/micahscopes.bsky.social/post/3ls3bz...

In both these examples, I leaned on Claude to set up the boilerplate, the GUI, etc, which gave me more mental budget for playing with the challenging aspects of the problem. For example, the tabu graph layout is inspired by several papers, but I was able to iterate really quickly with claude on new ideas from my own creative imagination with the problem. A few of them actually turned out really well.


Not the OP, not my code. But here is Mitchel Hashimoto showing his workflow and code in Zig, created with AI agent assistance: https://youtu.be/XyQ4ZTS5dGw


I think this still is some kind of 'fight' between assisted and more towards 'vibe'. Vibe for me means not reading the generated code, just trying it and the other extreme is writing all without AI. I don't think people here are talking about assisted : they are taking about vibe or almost vibe coding. And its fairly terrible if the llm does not have tons of info. It can loop, hang, remove tons of features, break random things etc all while being cheerful and saying 'this is production code now, ready to deploy'. And people believe it. When you use it to assist, it is great imho.


https://github.com/wglb/gemini-chat Almost entirely generated by gemini based on my english language description. Several rounds with me adding requirements.

(edit)

I asked it to generate a changelog: https://github.com/wglb/gemini-chat/blob/main/CHANGELOG.md


That's disingenuous or naive. Almost nobody decides to expressly highlight the section of code (or whole files generated by ai) they just get on with the job when there's real deadlines and it's not about coding for the sake of the art form...


If the generated implementation is not good, you're trading short-term "getting on with the job" and "real deadlines" for mid-to-long-term slowdown and missed deadlines.

In other words, it matters whether the AI is creating technical debt.


If you're creating technical debt, you're creating technical debt.

That has nothing to do with AI/LLMs.

If you can't understand what the tool spits out either; learn, throw it away, or get it to make something you can understand.


Do you want to clarify your original comment, then? I just read it again, and it really sounds like you're saying that asking to review AI-generated code is "disingenuous or naive".


I am talking about correctness, not style, coding isn't just about being able to show activity (code produced), but rather producing a system that is correctly performing the intended task


Yes, and frankly you should be spending time writing large integration tests correctly not microscopic tests that forgot how tools interact.

It's not about lines of code or quality it's about solving a problem. If the problem creates another problem then it's bad code. If it solves the problem without causing that then great. Move onto the next problem.


Same as pretending that vibe coding isn't producing tons of slop. "Just improve your prompt bro" doesn't work for most real codebases. The recent TEA app leak is a good example of vibe coding gone wrong, I wish I had as much copium as vibe coders to be blind to these things, as most of them clearly are like "it happened to them but surely won't happen to ME."


> The recent TEA app leak is a good example of vibe coding gone wrong

Weren't there 2 or 3 dating apps that were launched before the "vibecoding" craze that went extremely popular and got extremely hacked weeks/months in? I also distinctly remember a social network having firebase global tokens on the clientside, also a few years ago.


So that's an excuse for AI getting it wrong? It should know better if its so much better.


LLMs are not meant to be infallible it's meant to be faster.

Repeat after me, token prediction is not intelligence.


Not an excuse, no. I agree it should be better. And it will get better. Just pointing out that some mistakes were systematically happening before vibecoding became a thing.

We went from "this thing is a stochastic parrot that gives you poems and famous people styled text, but not much else" to "here's a fullstack app, it may have some security issues but otherwise it mainly works" in 2.5 years. People expect perfection, and move the goalposts. Give it a second. Learn what it can do today, adapt, prepare for what it can do tomorrow.


No one is moving the goalposts. There are a ton of people and companies trying to replace large swathes of workers with AI. So it's very reasonable to point out ways in which the AI's output does not measure up to that of those workers.


I thought the idea was that AI would make us collectively better off, not flood the zone with technical debt as if thousands of newly minted CS/bootcamp graduates were unleashed without any supervision.

LLMs are still stochastic parrots, though highly impressive and occasionally useful ones. LLMs are not going to solve problems like "what is the correct security model for this application given this use case".

AI might get there at some point, but it won't be solely based on LLMs.


> "what is the correct security model for this application given this use case".

Frankly I've seen LLMs answer better than people trained in security theatre so be very careful where you draw the line.

If you're trying to say they struggle with what they've not seen before. Yes, provided that what is new isn't within the phase space they've been trained over. Remember there's no photographs of cats riding dinosaurs but SD models can generate them.


Saying that they aren't worse than an incompetent human isn't a ringing endorsement.


I've heard this multiple times (Tea being an example of problems with vibe coding) but my understanding was that the Tea app issues well predated vibe coding.

I have experimented with vibe coding. With Claude Code I could produce a useful and usable small React/TS application, but it was hard to maintain and extend beyond a fairly low level of complexity. I totally agree that vibe coding (at the moment) is producing a lot of slop code, I just don't think Tea is an example of it from what I understand.


We work with LLMs on a daily basis to solve business use cases. From our work, LLMs seem to be nowhere close to being able to independently solve end-to-end business processes, in every use case they need excessive hand holding (output validation, manual review etc.). I often find myself thinking that a use case would be solved faster and cheaper using other ML approaches.

LLMs for replacing work in its entirety seems to be a stretch of the imagination at this point, unless an academic breakthrough that goes beyond the current approach is discovered, which typically has an unknown timeline.

I just don't see how companies like Anthropic/OpenAI are drawing these conclusions given the current state.


The developers may well be Clever Hands-ing themselves, seeing capabilities that the models don't really have.

But the… ah, this is ironic, the anthropic principle applies here:

> From our work, LLMs seem to be nowhere close to being able to independently solve end-to-end business processes

If there was an AI which could do that, your job would no longer exist. Just as with other professions before yours — weavers, potters, computers: https://en.wikipedia.org/wiki/Computer_(occupation) — and there are people complaining that even current LLMs and diffusion models forced them to change career.

> I just don't see how companies like Anthropic/OpenAI are drawing these conclusions given the current state.

If you look at the current public models, you are correct. They're not looking at the current public models.

Look at what people say on this very site — complaining that models have been "lobotomised" (I dislike this analogy, but whatever) "in the name of safety" — and ask yourself: what could these models do before public release?

Look at how long the gap was between the initial GPT-4 training and the completion of the red-teaming and other safety work, and ask yourself what new thing they know about that isn't public knowledge yet.

But also take what you know now about publicly available AI in June 2024, and ask yourself how far back in time you'd have to go for this to seem like unachievable SciFi nonsense — 3 years sounds about right…

… but also, there's no guarantee that we get any particular schedule for improvements, even if it wasn't for most of the top AI researchers signing open letters saying "we want to agree to slow down capabilities research and focus on safety". The AI that can take your job, that can "independently solve end-to-end business processes" may be 20 years away, or it may already exist and be kept under NDA because the creators can't separate good business from evil ones any more than cryptographers can separate good secrets from evil ones.


> If you look at the current public models, you are correct. They're not looking at the current public models.

> Look at what people say on this very site — complaining that models have been "lobotomised" (I dislike this analogy, but whatever) "in the name of safety" — and ask yourself: what could these models do before public release?

Give politically incorrect answers and cause other kinds of PR problems?

I don't think it's reasonable to take "lobotomised" to mean the models had more general capability before their "lobotomization," which you seem to be implying.


> Give politically incorrect answers and cause other kinds of PR problems?

If by that you mean "will explain in detail how to make chemical weapons, commit fraud, automate the production of material intended to incite genocide" etc.

You might want to argue they're not good enough to pose a risk yet — and perhaps they still wouldn't be dangerously competent even without these restrictions — but even if so, consider that Facebook, with a much simpler AI behind its feed, was blamed for not being able to prevent its systems being used for the organisation of the (still ongoing) genocide in Myanmar: tools, all tools including AI, make it easier to get stuff done.

> I don't think it's reasonable to take "lobotomised" to mean the models had more general capability before their "lobotomization," which you seem to be implying.

I don't like the use of the word, precisely because of that — it's either wildly overstating what happens to the AI, or understating what happens to humans.

And yes, when calling them out on this, I have seen that at least some people using this metaphor seem to genuinely believe that what I would call "simply continuing the same training regime that got it this far in the first place" is something they are unable to distinguish from what happened to Rosemary Kennedy (and yes, I did use her as the example when that conversation happened).


I think you are using LLMs exactly right. The corporations need LLMs to be extra special to support valuations.


Oh I can see how. It’s called hype marketing and it’s needed to justify the bubble they are inflating.


It could simply be that the work environments they're in are simply echo chambers, which is probably a necessity of working there. They likely talk to each other about happy paths and everything else becomes noise.


Maybe we need a different approach or maybe more is different.

More training, more data, more parameters, more compute power... and voilà.

Hard to say... but we've been surprised more than once in machine learning history.


I think it says more about their self perception of their abilities in realms where they have no special expertise. So many Silicon Valley leaders weigh on in on matters of civilizational impact. It seems making a few right choices suddenly turns people into experts who need to weigh in on everything else.

I don’t think I’m being hyperbolic to say this is a really dangerous trend.

Science and expertise carried these people to their current positions, and then they throw it all away for a cult of personality as if their personal whims manifested everything their engineers built.


Does Sam Altman have any relevant technical experience to make that assessment? Sounds like something someone would say that just lost their key technical team members.


For whatever it's worth, Scott Aaronson went from incisive skeptic to drooling fanboy in just about that long. Sam, likewise, seems prone to mistaking loyalty for expertise at this point in his career.


Did you consider Kill Bill https://github.com/killbill/killbill ? I used it a few years ago for usage based billing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: