> Throughout this series, “we” refers to maderix (human) and Claude Opus 4.6 (by Anthropic) working as a pair. The reverse engineering, benchmarking, and training code were developed collaboratively
Sure, "collaboratively." Why would I ever trust a vibe coded analysis? How do I, a non expert in this niche, know that Opus isn't pulling a fast one on both of us? LLMs write convincing bullshit that even fools experts. Have you manually verified each fact in this piece? I doubt it. Thanks for the disclaimer, it saved me from having to read it.
Actually… no. Now that you mention it, and thanks for the interesting thought, the failure modes seem pretty similar to me.
Shoddy research / hallucination, tendency to lose the thread, lack of historical / background context… the failure modes are at least qualitatively similar.
Show me an LLM failure and I’ll show you a high profile journalist busted for the same thing. And those are humans who focus on these things!
Humans as a class are error prone but some humans in their respective fields are very very good. It's often not terribly hard to figure out based on resume and credentials who these folks are and as a shortcut we can look for markers in terms of terminology specifics confidence if it's less important like deciding what to read vs cancer care for your mom.
AI can trip all the right searches to fool these shortcuts whilst sometimes being entirely full of shit and they have no resume nor credentials to verify should we desire to check.
If you have such and vouch for it I can consider your trustworthiness rather than its. If you admit you yourself are reliant on it then this no longer holds
Humans also write endless amounts of convincing bullshit, and have done since time immemorial. False papers and faked results have been a growing scourge in academia before LLMs were a thing, and that's just counting the intentional fraud - the reproducibility crisis in science, especially medical and psychological science, affects even the best designed and well intentioned of studies.
Humans also make mistakes and assumptions while reverse engineering, so it will always need more engineers to go through the results, test things
Benchmarks all in part 2. Training progress in part 3(upcoming)
Also I think AI human collaboration is important for goal management.
Sure LLMs bullshit all the time, but that's the role of the human to create good goals and gating criteria to what constitutes as good.
I am saddened by your gullibility. Your first instinct is to trust this administration? Who has repeatedly showed utter contempt for the very idea of truth, the constitution, the rule of law, and science, merely because half of American voters are brainwashed?
This administration's arguments do not deserve to be steelmanned.
Because HNers are not so gullible to swallow and regurgitate this pretext. The Trump administration doesn't care about the people of Iran, any more than Bush cared about the Iraqi Kurds or Afghan women. Just a pretext for geopolitics.
Headline is wrong. There is no verification requirement.
All this does is require the user to select a non-verified age bracket on first boot. You can lie, just like porn sites today. I thought HNers wanted parents to govern their children's use of technology with these kinds of mechanisms.
It seems to come down to whether you expect the next law to be taking the enforcement mechanism away from the parent. If the law was, "major operating systems must ship parental controls that actually work" I doubt you would see much pushback. Parental controls is an oft cited reason to give your kids Apple devices. Expanding that everywhere would be great. But I don't want to have to present my government ID to use my own computer.
In the US maybe, but where I am you can't fap in peace without using a VPN or have some kind of age verification. Some of them being baroque. Example:
"We analyze your email’s digital footprint (history and reputation) against trusted databases. This is often enough to confirm that you're of legal age."
Headline is wrong, and you didn't read the article. There is no verification requirement. You are a bad HN poster and should feel bad.
All this does is require the user to select a non-verified age bracket on first boot. You can lie, just like porn sites today. I thought HNers wanted parents to govern their children's use of technology with these kinds of mechanisms.
> There's an obvious theme with lawmakers in California—they pass laws to regulate things they have zero clue about, add them to their achievement page, cheer for themselves, and declare, "There! I've made the world a better place."
There's an obvious theme with HN posters about politics—they make cheap drive-by comments about regulations they have zero clue about, based on articles they haven't actually read, cheer for themselves, and declare, "There! I've shown why I'm smarter than all these politics people."
> All this does is require the user to select a non-verified age bracket on first boot.
This is the age verification requirement which you rudely and incorrectly said doesn't exist. Nothing is done with the data (for now) but age is in fact verified on the assumption that the user doesn't lie.
Instead of lengthy condescending missives about the behavior of other users, you should instead write "I'm sorry for being negative and bringing down the quality of discussion."
Ah we should be happy about a bad law because it's enforcement mechanism is weak? That's twice-bad: undermines the strength and meaning of Law, and aligns Law with the bad.
When the law and it's execution are undermined and weak, it becomes the cudgel of fickle changing power, i.e. it is applied selectively and it means nothing to people except when they are being beat in the head with it, at which point they only regret having been caught, successfully undermining the social and political fabric of a nation.
Having a bad law with a weak enforcement mechanism isn't quite the thing to be boasting about you seem to think it is.
If it must be ignored, then it exists. The bill proposes age verification. You may think the measures employed are weak or trivial, and I would agree, but the bill proposes age verification.
You seem to be operating with an unreasonably weak definition of "verification". What this bill is requiring is that app stores or operating systems ask for age information. Verification would mean doing something to verify the accuracy of the information provided, not merely receiving a response to the question. "Age verification" is not a synonym for "having age-based restrictions".
> and gates further interactions based on the answer
No. The OS does literally nothing with the age information other than water it down to a few pre-defined age brackets and pass that on to applications. There's nothing in this law that says any action has to be denied. It's information collection and reporting, with no verification, accepting the information reported by the user as-is. The law does not require the information to be true or accurate, and explicitly removes liability from app developers when their users lie about their age.
Even applications don't need to do anything with the age information, unless there's a different law already on the books saying something needs to be age-restricted. And in those cases, getting the information about whether to apply restrictions from the OS instead of however they're currently getting age information is not "verification".
"Verification" necessarily implies at least two pieces of information or steps in the process: first, an assertion of something as fact, then something to confirm that fact. This law omits the second step. There's no confirmation.
Gatlin, you need to apologize for ignorantly mangling the definition of "verification". This is truly embarrassing for you. It really brings down the quality of the discussion.
> Almost nobody talked about “getting manufacturing back to the US”.
I guess the President of the United States is an almost nobody. Obama's 2013 State of the Union hyped up 3-D printing explicitly as a tech that would be bringing manufacturing back to the U.S. The U.S. government made public-private partnerships with maker spaces and fab facilities in hollowed out Rust Belt cities, and Obama mentioned it by name in the most important and viewed policy speech the President gives each year.
> “A once-shuttered warehouse is now a state-of-the art lab where new workers are mastering the 3-D printing that has the potential to revolutionize the way we make almost everything,” Obama said. [...] Obama announced plans for three more manufacturing hubs where businesses will partner with the departments of Defense and Energy “to turn regions left behind by globalization into global centers of high-tech jobs.” (https://edition.cnn.com/2013/02/13/tech/innovation/obama-3d-...)
reply