Hacker Newsnew | past | comments | ask | show | jobs | submit | kneel25's commentslogin

So it looks like they’re creating their own App Store within the app? At least it’s kept separate from official apps. But also how is that not a security nightmare Apple won’t allow?

I think we’re falling into a trap of overestimating the value of incrementally directing it. The output is all coming from the same brain so what stops someone just getting lucky with a prompt and generation that one-shots the whole thing you spent time breaking down and thinking about. The code quality will be the same, and unless you’re directing it to the point where you may as well be coding the old way, the decision-making is the same too.


Pretty sure they’re doing both of those things but it takes a long time for the regulation to reach the final stage


> After the initial translation, I ran multiple passes of adversarial review, asking different models to analyze the code for mistakes and bad patterns.

I feel like you just know it’s doomed. What this is saying is “I didn’t want to and cannot review the code it generated” asking models to find mistakes never works for me. It’ll find obvious patterns, a tendency towards security mistakes, but not deep logical errors.


Somehow they did use this as part of their approach to get to 0 regressions across 65k tests + no performance regressions though + identical output for AST and bytecode though. How much manual review was part of the hundreds of rounds of prompt steering is not stated, but I don't think it's possible to say it couldn't find any deep logical errors along the way and still achieve those results.

The part that concerns me is whether this part will actually come in time or not:

> The Rust code intentionally mimics things like the C++ register allocation patterns so that the two compilers produce identical bytecode. Correctness is a close second. We know the result isn’t idiomatic Rust, and there’s a lot that can be simplified once we’re comfortable retiring the C++ pipeline. That cleanup will come in time.

Of course, it wouldn't be the first time Andreas delivered more than I expected :).


That’s convincing and impressive, but I wouldn’t say it proves it can spot deep errors. If it’s incredible at porting files and comparing against the source of truth then finding complicated issues isn’t being tested imo.


If completing the above successfully doesn't necessarily test these abilities then where does the concern about having these abilities come into play?


Your argument is just as applicable on human code reviewers. Obviously having others review the code will catch issues you would never have thought of. This includes agents as well.


They’re not equal. Humans are capable of actually understanding and looking ahead at consequences of decisions made, whereas an LLM can’t. One is a review, one is mimicking the result of a hypothetical review without any of the actual reasoning. (And prompting itself in a loop is not real reasoning)


I keep hearing people say "but as humans we actually understand". What evidence do you have of the material differences in what understanding an LLM has, and what version a human has? What processes do we fundamentally do, that an LLM does not or cannot do? What here is the definition of "understanding", that, presumably an LLM does not currently do, that humans do?



Well a material difference is we don’t input/output in tokens I guess. We have a concept of gaps and limits to knowledge, we have factors like ego, preservation, ambition that go into our thoughts where LLM just has raw data. Understanding the implication of a code change is having an idea of a desired structure, some idea of where you want to head to and how that meshes together. LLM has zero of any of that. Just because it can copy the output of the result of those factors I mention doesn’t mean they operate the same.


>Your argument is just as applicable on human code reviewers.

The tests many of us use for how capable a model or harness is is usually based around whether they can spot logical errors readily visible to humans.

Hence: https://news.ycombinator.com/item?id=47031580


With humans though, I wouldn't have to review 20k lines of code at once.


So ask the AI to just translate one little chunk at a time, right?


That's not what happened here though.


That is what the testing suite is there to check, no?


No. Testing generally can only falsify, not verify. It’s complementary to code review, not a substitute for it.


You mean the testing suite generated by AI?


The primary JS test suite is maintained by the authors of the specification itself: https://github.com/tc39/test262


It isn’t, in this case.


No, a real test suite, either their own which they developped or the official ECMA one


Yeah, I lost all interest in the ladybird project now that it is AI slop.

No one wants to work with this generated, ugly, unidiomatic ball of Rust. Other than other people using AI. So you dependency AI grows and grows. It is a vicious trap.


I think it's wild you would make that connection for this topic


Who doesn't think this about themselves. It's like when people say they're immune to propaganda. Isn't this thinking what makes people think their smart devices are listening to conversations rather than targeted ads you only notice after it's had the effect on you.


I don't think I am immune to propaganda, and definitely not ads. I can't stand ads at all. They immediately grab my attention, even if I make a conscious attempt at ignoring them. It truly feels terrible.

Even for propaganda, I am constantly made aware of my propaganda immunity being subpar for all different kinds of propaganda. Often it's just subtle seeds of propaganda that impact the choice of words that I use to be something different than what I really believe in, and sometimes it is more serious and deeper cases of propagandisation. Very unfortunate, but each time it shows me why I should be critical of everything that I read online.


"marketing works on you, even if you know how marketing works on you"


I adore Zed, started on the day of the windows release and never looked back. My theme is perfect, the team works super hard, and I get a nice bit of satisfaction installing their frequent updates in barely over a second.


I envy the fact you had to google it


Again, why? Nothing on its wiki article or the first page of Google results suggests it should be a household name.

So unless the default assumption is that everyone on HN is dating (I'm married) I genuinely don't understand why it's weird to not have heard of some random ass dating app


Even if you're married, you've got to know somebody who isn't. It's like referring to Google as "some random ass search engine website" and wondering why people think you're the weird one.


Ok


I don't think you're wierd for not knowing. I also didn't know


Because it used to be the "best" dating app out there for "serious" people wanting long term relationships. Now all the apps are trash and have predatory monetization.


I know Hinge because of the HN post about the data leak a few months ago.


I got really into Hemingway’s work, reading all the best ones, but my favourite being ‘A moveable feast’ his diary essentially released at the end of his life set when he was mid-twenties in 1920s Paris. Me being the same age, I was inspired enough to go there and retrace some of his steps.


I can't believe some people starred this


The main goal is experimenting and sharing what I’ve learned. Seems like people are enjoying it, which is nice to see.


It's literally impossible to see what it is you've learned because it's clouded in in a 20ft wall of shit


I hear you. I realize the repository and docs are dense and can be overwhelming. I’m actively working on cleaning up the presentation, improving examples, and making the intent and learning points easier to see. Thanks for your feedback.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: