More

measurablefunc · 2026-03-02T06:13:11 1772431991

If you can't tell this is LLM slop then I don't really know what to tell you. What gave it away for me was the RaptorQ nonsense & conformance w/ standard sqlite file format. If you actually read the code you'll notice all sorts of half complete implementations of whatever is promised in the marketing materials: https://github.com/Taufiqkemall2/frankensqlite/blob/main/cra...

baq · 2026-03-02T06:35:53 1772433353

If you bothered to do any research at all you’d know the author as an extreme, frontier, avant-garde, eccentric LLM user and I say it as an LLM enthusiast.

measurablefunc · 2026-03-02T06:41:49 1772433709

Thanks. Next time I'll do more research on what counts for LLM code artwork before commenting on an incomplete implementation w/ all sorts of logically inconsistent requirements. All I can really do at this point is humbly ask for your & their avant-garde forgiveness b/c I won't make the same mistake again & that's a real promise you can take to the crypto bank.

baq · 2026-03-02T06:55:07 1772434507

Great! But note I haven’t said that you should be doing the research. This was more of a warning about today, but it also was a different kind of warning about the next 12-18 months once models catch up to what this guy wants to do with them.

measurablefunc · 2026-03-02T07:03:08 1772434988

Thank you for your wisdom. I'll make a note & make sure to follow up on this later b/c you obviously know much more about the future than a humble plebeian like myself.

measurablefunc · 2026-03-02T06:11:57 1772431917

It's fake. It doesn't exist. It never happened. The whole thing is an LLM hallucination. You can notice that it's all half implemented if you read the code: https://github.com/Taufiqkemall2/frankensqlite/blob/main/cra...

tonyedgecombe · 2026-03-02T14:06:17 1772460377

We are going to get overwhelmed with this stuff aren't we.

measurablefunc · 2026-02-28T01:06:39 1772240799

I think you're misunderstanding the joke.

medi8r · 2026-02-28T01:47:37 1772243257

Yes joke is:

    [A B]

times

    [1]
    [1]

is

    [A+B]

hyperhello · 2026-02-28T01:57:32 1772243852

From context then, I infer that a transformer is not comprised of matrix multiplications, because it would simply be one that adds two 10-digit numbers.

medi8r · 2026-02-28T02:02:45 1772244165

A transformer tokenizes input, does a bunch of matmul and relu set up in a certain way. It doesn't get to see the raw number (just like you don't when you look at 1+1 you need visual cortex etc. first.)

Lerc · 2026-02-28T03:30:42 1772249442

Notably the difference is that ten digits is not the same thing as a number. One might say that turning it into a number might be the first step, but Neural nets being what they are, they are liable to produce the correct result without bothering to have a representation any more pure than a list of digits.

I guess the analogy there is that a 74ls283 never really has a number either and just manipulates a series of logic levels.

Filligree · 2026-02-28T03:34:28 1772249668

So the question is, why do we tokenise it in such a way that it makes everything harder?

akoboldfrying · 2026-02-28T06:37:58 1772260678

The tokenisation needs to be general -- it needs to be able to encode any possible input. It should also be at least moderately efficient across the distribution of inputs that it will tend to see. Existing tokenisation schemes explicitly target this.

medi8r · 2026-02-28T08:29:01 1772267341

There is no encoding that makes everything easier. You trade off maths for general intelligence. Now we are at a point where the LLM can just choose to use a normal calculator anyway!

sureglymop · 2026-02-28T22:34:40 1772318080

Possibly unrelated but something I never fully understood: while we can't create a perfect parser for natural language, why don't we optimistically parse it to extract semantics and feed that into LLMs as well?

wizzwizz4 · 2026-03-01T11:42:35 1772365355

It is, and that's more-or-less the approach that https://alexlitzenberger.com/blog/building_a_minimal_transfo... uses.

measurablefunc · 2026-02-27T21:10:33 1772226633

Author just trusts the agent to not use the internet b/c he wrote it so in the instructions should tell you all you need to know. It's great he managed to prompt it w/ the right specification for writing yet another emulator but I don't think he understands how LLMs actually work so most of the commentary on what's going on with the "psychology" of the LLM should be ignored.

measurablefunc · 2026-02-26T06:36:34 1772087794

Sampling over a probability distribution is not as catchy as "stochastic parrot" but I have personally stopped telling believers that their imagined event horizon of transistor scale is not going to deliver them to their wished for automated utopia b/c one can not reason w/ people who did not reach their conclusions by reasoning.

measurablefunc · 2026-02-26T05:58:01 1772085481

What's the latest novel insight you have encountered?

brookst · 2026-02-26T06:36:33 1772087793

Not the person you asked, and “novel” is a minefield. What’s the last novel anything, in the sense you can’t trace a precursor or reference?

But.. I recently had a LLM suggest an approach to negative mold-making that was novel to me. Long story, but basically isolating the gross geometry and using NURBS booleans for that, plus mesh addition/subtraction for details.

I’m sure there’s prior art out there, but that’s true for pretty much everything.

measurablefunc · 2026-02-26T06:43:49 1772088229

I don't know, that's why I asked b/c I always see a lot of empty platitudes when it comes to LLM praise so I'm curious to see if people can actually back up their claims.

I haven't done any 3D modeling so I'll take your word for it but I can tell you that I am working on a very simple interpreter & bytecode compiler for a subset of Erlang & I have yet to see anything novel or even useful from any of the coding assistants. One might naively think that there is enough literature on interpreters & compilers for coding agents to pretty much accomplish the task in one go but that's not what happens in practice.

brookst · 2026-02-26T07:08:16 1772089696

It’s taken me a while to get good at using them.

My advice: ask for more than what you think it can do. #1 mistake is failing to give enough context about goals, constraints, priorities.

Don’t ask “complete this one small task”, ask “hey I’m working on this big project, docs are here, source is there, I’m not sure how to do that, come up with a plan”

measurablefunc · 2026-02-26T07:22:30 1772090550

The specification is linked in another comment in this thread & you can decide whether it is ambitious enough or not but what I can tell you is that none of the existing coding agents can complete the task even w/ all the details. If you do try it you will eventually get something that will mostly work on simple tests but fail miserably on slightly more complicated test cases.

pushedx · 2026-02-26T06:53:05 1772088785

Which agents are you using, and are you using them in an agent mode (Codex, Claude Code etc.)?

The difference in quality of output between Claude Sonnet and Claude Opus is around an order of magnitude.

The results that you can get from agent mode vs using a chat bot are around two orders of magnitude.

measurablefunc · 2026-02-26T07:12:39 1772089959

The workflow is not the issue. You are welcome to try the same challenge yourself if you want. Extra test cases (https://drive.proton.me/urls/6Z6557R2WG#n83c6DP6mDfc) & specification (https://claude.ai/public/artifacts/5581b499-a471-4d58-8e05-1...). I know enough about compilers, bytecode VMs, parsers, & interpreters to know that this is well within the capabilities of any reasonably good software engineer but the implementation from Gemini 3.1 Pro (high & low) & Claude Opus 4.6 (thinking) have been less than impressive.

pushedx · 2026-02-26T07:54:44 1772092484

sorry, needed to edit this comment to ask the same question as the sibling:

have you run these models in an agent mode that allows for executing the tests, the agent views the output, and iterates on its own for a while? up to an hour or so?

you will get vastly different output if you ask the agent to write 200 of its own test cases, and then have it iterate from there

Kim_Bruning · 2026-02-26T07:52:22 1772092342

Possibly a dumb question: but are you running this in claude code, or an ide, or basically what are you using to allow for iteration?

measurablefunc · 2026-02-26T09:04:16 1772096656

I'm using Google's antigravity IDE. I initially had it configured to run allowed commands (cargo add|build|check|run, testing shell scripts, performance profiling shell scripts, etc.) so that it would iterate & fix bugs w/ as little intervention from me as possible but all it did was burn through the daily allotted tokens so I switched to more "manual" guidance & made a lot more progress w/o burning through the daily limits.

What I've learned from this experiment is that the hype does not actually live up to the reality. Maybe the next iteration will manage the task better than the current one but it's obvious that basic compiler & bytecode virtual machine design in a language like Rust is still beyond the capabilities of the current coding agents & whoever thinks I'm wrong is welcome to implement the linked specification to see how far they can get by just "vibing".

Kim_Bruning · 2026-02-26T09:28:00 1772098080

That's roughly where I'm at too. I have seen people have some more success after having practices though. Possibly the actual workflows needed for full auto are still kind of tacit. Smaller green-field projecs do work for me already though.

measurablefunc · 2026-02-26T20:42:30 1772138550

In my experience a few hundred lines w/ a few crates w/ well-defined scopes & a detailed specification is within current capabilities, e.g. compressing wav files w/ wavelets & arithmetic coding. But it's obvious that a correct parser, compiler, & bytecode VM is still beyond current agents even if the specification is detailed enough to cover basically everything.

kmaitreys · 2026-02-26T07:45:56 1772091956

Can you clarify a bit more about the this two orders of magnitude? In what context? Sure, they have "agency" and can do more than outputting text, but I would like see a proper example of this claim.

joquarky · 2026-02-26T22:03:17 1772143397

Most humans can't force themselves to come up with something novel immediately upon demand.

measurablefunc · 2026-02-26T22:18:03 1772144283

Completely unrelated to the topic or any of the points I was making so did you get confused & respond to the wrong thread?

kennyloginz · 2026-02-26T07:46:52 1772092012

There is prior art, so it’s not novel.

brookst · 2026-02-26T15:48:39 1772120919

Great. Can you point to anything at all that is truly novel, no prior art?

rsync · 2026-02-26T18:43:20 1772131400

Sliding down handrails on a skateboard.

measurablefunc · 2026-02-25T00:39:21 1771979961

What goods?

AnimalMuppet · 2026-02-25T00:49:59 1771980599

Software is a "good", as far as economic statistics go.

AI is helping produce more software, right? Including more software that is for sale?[1] Or more online services that are for sale?

[1] One of the interesting things here is going to be liability. You can vibecode an app. You can throw together a corporation to sell it. But if it malfunctions and causes damage, your thrown-together corporation won't have the resources to pay for it. Yeah, you can just have the company declare bankruptcy and walk away, leaving the user high and dry.

After that happens a few times, the commercial market for vibecoded apps may get kind of thin. In fact, the market for software sold by any kind of startup may also get thin.

rchaud · 2026-02-25T01:06:00 1771981560

Software stopped being a good when it no longer came in a box with finite inventory, that you had to pay for only once. It's part of the services economy, same as insurance or car rental services, regardless of how the Fed classifies it.

measurablefunc · 2026-02-25T00:57:41 1771981061

So is the premise here that making more software is going to have a deflationary effect on the entire economy of material goods? If so then that's obviously nonsensical.

AnimalMuppet · 2026-02-25T01:15:07 1771982107

That's not what I said, no. More software is going to have a deflationary effect on software, which is part of the "goods" economy if it's sold in a box, or even (I think) if it's sold as a download. If it's just online, it's probably considered a service. Either way, more of it, more cheaply produced, decreases the value of each piece.

measurablefunc · 2026-02-25T01:20:06 1771982406

I haven't paid for any software in a long time & my monthly subscriptions for data storage & basic AI adds up to less than $100/month. Data storage is already as cheap as it could possibly get so AI is not going to make that any cheaper. More money in the economy is not going to have a deflationary effect, prices for everything will go up, including software services like data backups b/c cost of the service has nothing to do w/ software & the hardware is only going to get more expensive.

twoodfin · 2026-02-25T02:49:00 1771987740

Anything that AI makes more efficient to produce. You can make a lot of money if you can predict the scope of that.

measurablefunc · 2026-02-25T02:53:19 1771987999

So you don't have any actual examples. Just a general vague feeling about some magical outcome.

twoodfin · 2026-02-25T21:39:22 1772055562

If you’re confident that AI won’t raise productivity significantly in a broad range of industries, there are likely some very attractive bets out there in the market to take the other side of.

measurablefunc · 2026-02-26T03:53:21 1772078001

Keep your financial advice for yourself instead of handing it out to random strangers on the internet. That way you have more "alpha" but since you already offered you should feel free to just give everyone else in the forum the benefits of your wisdom so they can also see how smart you are for betting that AI is going to make everything much cheaper.

palmotea · 2026-02-25T06:57:46 1772002666

> Anything that AI makes more efficient to produce. You can make a lot of money if you can predict the scope of that.

So slop? And maybe bespoke software?

Those aren't the goods that unemployed workers need.

AI won't lead to abundance, because of the simple fact it can't produce energy. The things people need will still be resource constrained, and many of those resources are getting redirected away from people to power AI.

measurablefunc · 2026-02-24T19:25:51 1771961151

This time is different. The global system is not going to fall apart like isolated kingdoms in the past.

dylan604 · 2026-02-24T19:47:32 1771962452

You seem very confident. This seems to imply you feel the haves will know when to leave enough on the table for the have nots to still feel like they are a part of the haves. I'm not so confident in that.

measurablefunc · 2026-02-24T20:18:31 1771964311

People in technologically advanced societies have more than enough & the people who are not as advanced can not do anything that will have any effect on the people who own the fighter jets, missiles, robot factories, & "internet" satellites. The current system has no historical precedent. It is very close to an almost perfect panopticon w/ an associated media & police apparatus to keep everyone docile & complacent. Like I said, this time is different.

atmavatar · 2026-02-24T20:49:53 1771966193

Far more likely is that we head back to a feudal era where data mining tech is used to identify and eliminate potential rabble-rousers. Once enough production is automated, all remaining have-nots are exterminated.

neuralRiot · 2026-02-24T21:53:43 1771970023

The weak link is that for “the haves” to have, the “have -nots” are needed. To have or to not is just a comparison, a millionaire needs the poor to be rich and to feel special otherwise when everyone is special nobody is.

GolfPopper · 2026-02-25T00:12:31 1771978351

It will instead eventually fall apart in more thoroughly destructive ways. But not until it does a possibly-unrecoverably (at least in the medium term) amount of damage to civilization, humanity, and life on Earth first.

measurablefunc · 2026-02-25T01:00:31 1771981231

I agree but my point was that it will not be like any previous collapse.

trinsic2 · 2026-02-25T00:00:34 1771977634

yep. There is too much infrastructure now. Its going to take a lot for this to end.

neuralRiot · 2026-02-24T21:48:25 1771969705

“ Whatever it is you’re seeking won’t come in the form you’re expecting – Haruki Murakami”

measurablefunc · 2026-02-24T19:09:06 1771960146

Goliath's Curse by Luke Kemp covers it pretty well I think.

GolfPopper · 2026-02-25T00:16:47 1771978607

Likewise, thank you for the recommendation. I obviously haven't read Goliath's Curse yet, but it seems like Joseph Tainter's The Collapse of Complex Societies (1988) might also be interesting for the same readers.

ferguess_k · 2026-02-24T19:10:59 1771960259

Thanks for the recommendation.

measurablefunc · 2026-02-23T20:26:50 1771878410

Great. How do I use this in my life to make things better?

glial · 2026-02-23T21:06:24 1771880784

The SPRT is probably already making your life better: it's used to decrease the cost of medical trails, optimize classifications in high-stakes examinations (i.e. for medical certifications), detect defective manufacturing processes, etc. It sounds like this paper extends the method to groups of hypotheses, whereas the basic version is limited to a null hypothesis and an alternative hypothesis.

data-ottawa · 2026-02-23T21:11:55 1771881115

This helps with determining when have you observed enough data to make a decision.

A/B tests, monitoring metrics, health, quality control all use this.

If you use LLMs, you might use this to determine if a model update or prompt change impacts results using fewer tokens.

wdkrnls · 2026-02-24T00:40:24 1771893624

Implement a statistical software suite that ubiquitously uses this framework instead of the usual hierarchical mixed modeling tools whose assumptions often don't match what experiments were actually done.

srean · 2026-02-24T09:52:49 1771926769

You can search for the "peeking" problem in A/B testing.

SPRT also very likely helped win a major war that involved many nations.