Hacker Newsnew | past | comments | ask | show | jobs | submit | Rochus's commentslogin

Had a similar experience (obvious flagging misuse, toxic nagging culture) some time ago and was no longer on the platform since.

Interesting. How long did it take until the result effectively worked? What agent did you use? I recently tried to generate an Oberon to C99 transpiler in devin.ai from an existing parser and validated AST I have already implemented myself in C++, but after two days of round-tripping and the LLM increasingly entangling in special cases and producing strange code with more and more redundancy, I stopped and wrote it myself. That was a costly exercise which didn't succeed. The language was not the problem. Devin showed it understood Oberon. But it got completely confused with the different variants of arrays (fixed size by value, fixed or variable size dynamic, fixed or variable size var or value parameter) and spread redundant code all over losing track. I also tried to make it generate a Risc-V code generator, which was neither successful (bug fixing didn't seem to converge and even seemed to turn in circles).

Ah, Oberon! I "learned" programming from Niklaus Wirth's Pascal book on a family vacation where I had no computer. Actually I didn't have a computer with a Pascal compiler even if I wasn't on vacation.

Anyway, I began this project while on vacation (again) then completed it while attending a conference, so the work wasn't 100% duty cycle. That said it took about a month from beginning to the current state. You can see in the linked article almost all the LLM sessions that built the project.

LLMs do seem to be a bit narcissistic as you've alluded to -- confidently declaring that it has implemented "PRI PAR" for example, but conveniently not mentioning that it only parsed the keywords and didn't in fact implement priority semantics. This reminds me of less experienced developers I've managed in the past. Loth to deliver bad news.

This project was all done with Claude. When I began I was given the Opus 4.5 model but fairly early in the timeline Anthropic enabled the new Opus 4.6 model. This was before its official release so I'm not sure if they have a rollout policy that targeted me or my project. Anyway, most of the work was Opus 4.6.

Overall I learned a tremendous amount about what today's frontier models can do: I could probably give 4-5 talks on various things I noticed, or talk for a few hours over beers. General take away was that the experience was uncannily similar to developing software as a human, or running a team of somewhat less experienced humans. A fun time to be alive for sure.


Cool, I was at ETH when Modula-2 was en vogue, and we also had lectures where we programmed transputers in Occam-2.

In contrast to my experiences with e.g. Gemini 3 Pro, where it regularly happened that the LLM claimed to have reached full features scope in each iteration, but the result turned out to be full of stubs, Devin at least doesn't pull my leg and delivers what was agreed, but unfortunately debugging and fixing takes much more time than generating the initial version (about factor five). But so far I never tried to run an LLM project over such a long time as you did; must have cost a fortune.


Cost me almost nothing in inference time (I have the monthly subscription), although if I had been paying myself at consulting rates it would have cost a few thousand for my time "LLM whispering" :) For clarity: I wasn't running the LLM for a month solid. I was on vacation in New Zealand -- I'd fire up the laptop in the AirBnB most nights and make Claude add a couple features, fix some bugs. Repeat rinse.

I find that it's uncannily like running a team of eager but not too experienced engineers: those humans would also show up claiming to have "finished". I'd say "well does it run so and so test ok?". They'd go away, come back a few days later... The LLM acts much the same. You have to keep it on a short leash but when it gets cracking on a problem it's amazing to watch. E.g. I saw it write countless test programs on the fly to diagnose a parser hang bug. It would try this and that, binary chopping on the problematic source file. If I was doing that myself I'd need a few strong coffees before diving in.


Well, just let it make a transpiler, e.g. from Oberon90 to C99. I gave this task to Devin and after two days of round-tripping and the LLM increasingly entangling in special cases and producing strange code with more and more redundancy, I stopped the exercise, went back to square one and wrote it myself based on what I already had.

I might be convinced by predictions like the posted one as soon as an LLM is indeed able to independently and correctly solve such a problem, or even add a code generator for yet another target to my compiler, and produce decent code, without my permanent guidance and testing.

It might be true that industry requires less software engineers some day, but it might also well be that they continue to need as much engineers or even more than today, and these people generate ten to hundered times more output together with LLMs than today. Who knows.


Interesting. Why MkDocs and not e.g. AsciDoc?

Markdown is by far the most widely used format in modern workflows. It’s the default on GitHub, and its popularity has led to a rich ecosystem of tools for static documentation, such as MkDocs and Docusaurus.

AsciiDoc is less ubiquitous, but it’s well-documented and offers a strong, technically structured syntax. Its feature set is broader and highly configurable (e.g., with tools like asciidoc3). However, in the Python ecosystem, parser availability and maturity are more limited. For Python-based workflows, Markdown clearly has the advantage.


SPARK is not used for the whole system, but for the < 5% parts, which are safety/security-related in a good architecture.

Unfortunately a really good question gets downvoted instead of causing a relevant discussion, as so often in recent HN. It would be really interesting to know, why Ada would not be considered for such a large project, especially now when the code is translated with LLMs, as you say. I was never really comfortable that they were going for the most recent C++ versions, since there are still too many differences and unimplemented parts which make cross-compiler compatibilty an issue. I hope that with Rust at least cross-compilation is possible, so that the resulting executable also runs on older systems, where the toolchain is not available.

Unfortunately some folks do get bit sensitive on rust, that can be off putting.

But what I wanted to know was about evaluation with other languages, because Andreas has written complex software.

His insight might become enriching as to shortcomings or other issues which developers not that high up in the chain, may not have encountered.

Ultimately, that will only help others to understand how to write better software or think about scalability.


I personally think that people might've framed it as use Ada/D over rust comment which might have the HN people who prefer rust to respond with downvotes.

I agree that, this might be wrong behaviour and I don't think its any fault of rust itself which itself could be a blanket statement imo. There's nuance in both sides of discussions.

Coming to the main point, I feel like the real reason could be that rust is this sort of equilibra that the world has reached for, especially security related projects. Whether good or bad, this means that using rust would definitely lead to more contributor resources and the zeal of rustaceans can definitely be used as well and also third party libraries developed in rust although that itself is becoming a problem nowadays from what I hear from people in here who use rust sometimes (ie. too many dependencies)

Rust does seem to be good enough for this use case. I think the question could be on what D/Ada (Might I also add Nim/V/Odin) will add further to the project but I honestly agree that a fruitful discussion b/w other languages would've been certainly beneficial to the project (imo) and at the very least would've been very interesting to read personally


> which might have the HN people who prefer rust to respond with downvotes.

This completely misses the purpose of the downvoting feature, which is not surprising, since upvoting seems no longer to indicate quality or truth of the comment neither.

> rust is this sort of equilibra that the world has reached for, especially security related projects

Which is amazing, since Rust only covers a fraction of safety/security concerns covered by Ada/SPARK. Of course this language has some legacy issues (e.g. the physical separation of interface and body in two separate files; we have better solutions today), but it is still in development and more robust than the C/C++ (and likely Rust) toolchain. And in the age of LLMs, robustness and features of a toolchain should matter more than the language syntax/semantics.

> Rust does seem to be good enough for this use case.

If you compare it to the very recend C++ implementations they are using, I tend to agree. But if you compare it to a much more mature technology like e.g. Ada, I have my doubts.


> If you compare it to the very recend C++ implementations they are using, I tend to agree. But if you compare it to a much more mature technology like e.g. Ada, I have my doubts.

I agree with you in the sense that it would've definitely been interesting to read what Andreas thinks of Ada/D and the discussion surrounding it and your overall comment too.

I do wish that anyone from ladybird team/maybe even Andreas if he's on HN (not sure) could respond to the original query if possible.

I remember ladybird had a discord server I once joined, perhaps someone from the community could ask Andreas about it there if possible since It would be genuinely fascinating to read.

Although a point I am worried about is if Ladybird changes the language again let's say after a discussion of using Ada/D. It might be awkward.


> I am worried about is if Ladybird changes the language again

In the time of good LLMs this is likely no longer a show-stopper (as e.g. the specific formating rules in C/C++ since there are good re-formating tools). The question is how long we will need programming languages at all. They were primarily invented because large assembler projects were too challenging for most people. But if all the complicated details can now be delegated to LLMs, strictly speaking, we no longer need programming languages either.


> This completely misses the purpose of the downvoting feature

"Downvote for disagree" has been canonicalized on HN since (nearly) the beginning, by pg himself, back when he used his real-name account to comment. :)

I agree that it has undesirable consequences, but it is fully established.


It's even worse than that. People are all too often willing to "flag for disagree". It's getting to be pretty common to see threads with comments that are [flagged][dead] which don't break the rules in any way, but merely express a view which is unpopular. Sometimes I even agree that it's a stupid position to take up, but that doesn't merit being flagged to death. I always vouch for those comments but it feels like an exercise in futility with so many people using the flag function as a "super fuck you" button.

Personally I would remove the downvote button entierly because apparently it is a lossy projection of an at least three-dimensional vector: agreement, quality, truth. Graying out text so it is no longer readable is a censoring measure not justified in most cases I encounter on HN in the eight years I'm here.

Your statement may well be protected by freedom of speech, but it is a highly unobjective, personal attack of the kind that I personally do not want to see here.

It does not help any of the readers in any way, does not present any facts, and contains nothing educational. If anything, it would have been sufficient for you to simply say that you do not like him, if possible with a fact-based explanation. Then people could think about it and, if necessary, respond with other arguments.


This is just a honest view, with which you can agree or not, presented emotionless, referring to facts. How can this post be "toxic in itself". Maybe I'm too old (gen X) and culture has shifted dramatically without me noticing, but critical thinking should still be considered a virtue by everyone, isn't it?

I agree that downvoting is definitely misused. I often see the most intelligent and best technical answer to be downvoted. Votes no longer seem to represent quality or truth.

> I don't think HN is more toxic than anywhere else

It is supposed to be less toxic, enabling interesting discussions about tech, where people can exchange ideas and learn something. There are enough other places for politics.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: