If you can't tell this is LLM slop then I don't really know what to tell you. What gave it away for me was the RaptorQ nonsense & conformance w/ standard sqlite file format. If you actually read the code you'll notice all sorts of half complete implementations of whatever is promised in the marketing materials: https://github.com/Taufiqkemall2/frankensqlite/blob/main/cra...
If you bothered to do any research at all you’d know the author as an extreme, frontier, avant-garde, eccentric LLM user and I say it as an LLM enthusiast.
Thanks. Next time I'll do more research on what counts for LLM code artwork before commenting on an incomplete implementation w/ all sorts of logically inconsistent requirements. All I can really do at this point is humbly ask for your & their avant-garde forgiveness b/c I won't make the same mistake again & that's a real promise you can take to the crypto bank.
Great! But note I haven’t said that you should be doing the research. This was more of a warning about today, but it also was a different kind of warning about the next 12-18 months once models catch up to what this guy wants to do with them.
Thank you for your wisdom. I'll make a note & make sure to follow up on this later b/c you obviously know much more about the future than a humble plebeian like myself.
From context then, I infer that a transformer is not comprised of matrix multiplications, because it would simply be one that adds two 10-digit numbers.
A transformer tokenizes input, does a bunch of matmul and relu set up in a certain way. It doesn't get to see the raw number (just like you don't when you look at 1+1 you need visual cortex etc. first.)
Notably the difference is that ten digits is not the same thing as a number. One might say that turning it into a number might be the first step, but Neural nets being what they are, they are liable to produce the correct result without bothering to have a representation any more pure than a list of digits.
I guess the analogy there is that a 74ls283 never really has a number either and just manipulates a series of logic levels.
The tokenisation needs to be general -- it needs to be able to encode any possible input. It should also be at least moderately efficient across the distribution of inputs that it will tend to see. Existing tokenisation schemes explicitly target this.
There is no encoding that makes everything easier. You trade off maths for general intelligence. Now we are at a point where the LLM can just choose to use a normal calculator anyway!
Possibly unrelated but something I never fully understood: while we can't create a perfect parser for natural language, why don't we optimistically parse it to extract semantics and feed that into LLMs as well?
Author just trusts the agent to not use the internet b/c he wrote it so in the instructions should tell you all you need to know. It's great he managed to prompt it w/ the right specification for writing yet another emulator but I don't think he understands how LLMs actually work so most of the commentary on what's going on with the "psychology" of the LLM should be ignored.
Sampling over a probability distribution is not as catchy as "stochastic parrot" but I have personally stopped telling believers that their imagined event horizon of transistor scale is not going to deliver them to their wished for automated utopia b/c one can not reason w/ people who did not reach their conclusions by reasoning.
Not the person you asked, and “novel” is a minefield. What’s the last novel anything, in the sense you can’t trace a precursor or reference?
But.. I recently had a LLM suggest an approach to negative mold-making that was novel to me. Long story, but basically isolating the gross geometry and using NURBS booleans for that, plus mesh addition/subtraction for details.
I’m sure there’s prior art out there, but that’s true for pretty much everything.
I don't know, that's why I asked b/c I always see a lot of empty platitudes when it comes to LLM praise so I'm curious to see if people can actually back up their claims.
I haven't done any 3D modeling so I'll take your word for it but I can tell you that I am working on a very simple interpreter & bytecode compiler for a subset of Erlang & I have yet to see anything novel or even useful from any of the coding assistants. One might naively think that there is enough literature on interpreters & compilers for coding agents to pretty much accomplish the task in one go but that's not what happens in practice.
My advice: ask for more than what you think it can do. #1 mistake is failing to give enough context about goals, constraints, priorities.
Don’t ask “complete this one small task”, ask “hey I’m working on this big project, docs are here, source is there, I’m not sure how to do that, come up with a plan”
The specification is linked in another comment in this thread & you can decide whether it is ambitious enough or not but what I can tell you is that none of the existing coding agents can complete the task even w/ all the details. If you do try it you will eventually get something that will mostly work on simple tests but fail miserably on slightly more complicated test cases.
The workflow is not the issue. You are welcome to try the same challenge yourself if you want. Extra test cases (https://drive.proton.me/urls/6Z6557R2WG#n83c6DP6mDfc) & specification (https://claude.ai/public/artifacts/5581b499-a471-4d58-8e05-1...). I know enough about compilers, bytecode VMs, parsers, & interpreters to know that this is well within the capabilities of any reasonably good software engineer but the implementation from Gemini 3.1 Pro (high & low) & Claude Opus 4.6 (thinking) have been less than impressive.
sorry, needed to edit this comment to ask the same question as the sibling:
have you run these models in an agent mode that allows for executing the tests, the agent views the output, and iterates on its own for a while? up to an hour or so?
you will get vastly different output if you ask the agent to write 200 of its own test cases, and then have it iterate from there
I'm using Google's antigravity IDE. I initially had it configured to run allowed commands (cargo add|build|check|run, testing shell scripts, performance profiling shell scripts, etc.) so that it would iterate & fix bugs w/ as little intervention from me as possible but all it did was burn through the daily allotted tokens so I switched to more "manual" guidance & made a lot more progress w/o burning through the daily limits.
What I've learned from this experiment is that the hype does not actually live up to the reality. Maybe the next iteration will manage the task better than the current one but it's obvious that basic compiler & bytecode virtual machine design in a language like Rust is still beyond the capabilities of the current coding agents & whoever thinks I'm wrong is welcome to implement the linked specification to see how far they can get by just "vibing".
That's roughly where I'm at too. I have seen people have some more success after having practices though. Possibly the actual workflows needed for full auto are still kind of tacit. Smaller green-field projecs do work for me already though.
In my experience a few hundred lines w/ a few crates w/ well-defined scopes & a detailed specification is within current capabilities, e.g. compressing wav files w/ wavelets & arithmetic coding. But it's obvious that a correct parser, compiler, & bytecode VM is still beyond current agents even if the specification is detailed enough to cover basically everything.
Can you clarify a bit more about the this two orders of magnitude? In what context? Sure, they have "agency" and can do more than outputting text, but I would like see a proper example of this claim.
Software is a "good", as far as economic statistics go.
AI is helping produce more software, right? Including more software that is for sale?[1] Or more online services that are for sale?
[1] One of the interesting things here is going to be liability. You can vibecode an app. You can throw together a corporation to sell it. But if it malfunctions and causes damage, your thrown-together corporation won't have the resources to pay for it. Yeah, you can just have the company declare bankruptcy and walk away, leaving the user high and dry.
After that happens a few times, the commercial market for vibecoded apps may get kind of thin. In fact, the market for software sold by any kind of startup may also get thin.
Software stopped being a good when it no longer came in a box with finite inventory, that you had to pay for only once. It's part of the services economy, same as insurance or car rental services, regardless of how the Fed classifies it.
So is the premise here that making more software is going to have a deflationary effect on the entire economy of material goods? If so then that's obviously nonsensical.
That's not what I said, no. More software is going to have a deflationary effect on software, which is part of the "goods" economy if it's sold in a box, or even (I think) if it's sold as a download. If it's just online, it's probably considered a service. Either way, more of it, more cheaply produced, decreases the value of each piece.
I haven't paid for any software in a long time & my monthly subscriptions for data storage & basic AI adds up to less than $100/month. Data storage is already as cheap as it could possibly get so AI is not going to make that any cheaper. More money in the economy is not going to have a deflationary effect, prices for everything will go up, including software services like data backups b/c cost of the service has nothing to do w/ software & the hardware is only going to get more expensive.
If you’re confident that AI won’t raise productivity significantly in a broad range of industries, there are likely some very attractive bets out there in the market to take the other side of.
Keep your financial advice for yourself instead of handing it out to random strangers on the internet. That way you have more "alpha" but since you already offered you should feel free to just give everyone else in the forum the benefits of your wisdom so they can also see how smart you are for betting that AI is going to make everything much cheaper.
> Anything that AI makes more efficient to produce. You can make a lot of money if you can predict the scope of that.
So slop? And maybe bespoke software?
Those aren't the goods that unemployed workers need.
AI won't lead to abundance, because of the simple fact it can't produce energy. The things people need will still be resource constrained, and many of those resources are getting redirected away from people to power AI.
You seem very confident. This seems to imply you feel the haves will know when to leave enough on the table for the have nots to still feel like they are a part of the haves. I'm not so confident in that.
People in technologically advanced societies have more than enough & the people who are not as advanced can not do anything that will have any effect on the people who own the fighter jets, missiles, robot factories, & "internet" satellites. The current system has no historical precedent. It is very close to an almost perfect panopticon w/ an associated media & police apparatus to keep everyone docile & complacent. Like I said, this time is different.
Far more likely is that we head back to a feudal era where data mining tech is used to identify and eliminate potential rabble-rousers. Once enough production is automated, all remaining have-nots are exterminated.
The weak link is that for “the haves” to have, the “have -nots” are needed. To have or to not is just a comparison, a millionaire needs the poor to be rich and to feel special otherwise when everyone is special nobody is.
It will instead eventually fall apart in more thoroughly destructive ways. But not until it does a possibly-unrecoverably (at least in the medium term) amount of damage to civilization, humanity, and life on Earth first.
Likewise, thank you for the recommendation. I obviously haven't read Goliath's Curse yet, but it seems like Joseph Tainter's The Collapse of Complex Societies (1988) might also be interesting for the same readers.
The SPRT is probably already making your life better: it's used to decrease the cost of medical trails, optimize classifications in high-stakes examinations (i.e. for medical certifications), detect defective manufacturing processes, etc. It sounds like this paper extends the method to groups of hypotheses, whereas the basic version is limited to a null hypothesis and an alternative hypothesis.
Implement a statistical software suite that ubiquitously uses this framework instead of the usual hierarchical mixed modeling tools whose assumptions often don't match what experiments were actually done.
reply