LLMs have limits. They are super powerful but they can't make the kind of leap h...

danielbln · 2025-07-02T15:25:15 1751469915

I'm not sure I would measure LLMs on recommending a fairly obscure and rather new spec that isn't even fully supported by e.g. Chrome. That's a leap that I, a human, wouldn't have made either.

mandevil · 2025-07-02T15:35:34 1751470534

On the other hand, I find that "someone who has read all the blogs and papers and so can suggest something you might have missed" is my favorite use case for LLM, so seeing that it can miss a useful (and novel to me) idea is annoying.

Hackbraten · 2025-07-02T16:43:22 1751474602

Which is easier for you to remember: facts you’ve seen a hundred times in your lifetime, or facts you’ve seen once or twice?

For an LLM, I’d expect it to be similar. It can recall the stuff it’s seen thousands of times, but has a hard time recalling the niche/underdocumented stuff that it’s seen just a dozen times.

gtsop · 2025-07-02T19:53:14 1751485994

> Which is easier for you to remember: facts you’ve seen a hundred times in your lifetime, or facts you’ve seen once or twice?

The human brain isn't a statistical aggregator. If you see a psychologically socking thing once in your lifetime, you might remember it even after dementia hits when you're old.

On the other hand, you pass by hundrends of shops every day and receive the data signal of their signs over and over and over, yet you remember nothing.

You remeber stuff you pay attention to (for whatever reason)

gtsop · 2025-07-02T19:50:11 1751485811

So is the expectation for it to suggest obvious solutions that the majority of people already know?

I am fine with this, but let's be clear about what we're expecting

jeremyjh · 2025-07-02T20:01:50 1751486510

All it can do is predict the next token. So, yes.

danielbln · 2025-07-03T12:46:34 1751546794

If it's still 2020, then yes. In 2025 post-training like RLHF made it that these models do not just predict the next token, the reward function is a lot more involved than that.

jeremyjh · 2025-07-08T22:51:15 1752015075

Instruct models like ChatGPT are still token predictors. Instruction following is an emergent behavior from fine-tuning and reward modeling layered on top of the same core mechanism: autoregressive next-token prediction.

pdabbadabba · 2025-07-02T21:03:18 1751490198

> So is the expectation for it to suggest obvious solutions that the majority of people already know?

Certainly a majority of people don't know this. What we're really asking is whether an LLM is expected to more than (or as much as) the average domain expert.

ramses0 · 2025-07-02T15:30:50 1751470250

I'm adding this tidbit of knowledge to my context as well... :-P

Only recently have I started interacting with LLM's more (I tried out a previous "use it as a book club partner" suggestion, and it's pretty great!).

When coding with them (via cursor), there was an interaction where I nudged it: "hey, you forgot xyz when you wrote that code the first time" (ie: updating an associated data structure or cache or whatever), and I find myself INTENTIONALLY giving the machine at least the shadow of a benefit of the doubt that: "Yeah, I might have made that mistake too if I were writing that code" or "Yeah, I might have written the base case first and _then_ gotten around to updating the cache, or decrementing the overall number of found items or whatever".

In the "book club" and "movie club" case, I asked it to discuss two movies and there were a few flubs: was the main character "justly imprisoned", or "unjustly imprisoned" ... a human might have made that same typo? Correct it, don't dwell on it, go with the flow... even in a 100% human discussion on books and movies, people (and hallucinating AI/LLM's) can not remember with 100% pinpoint accuracy every little detail, and I find giving a bit of benefit of the doubt to the conversation partner lowers my stress level quite a bit.

I guess: even when it's an AI, try to keep your interactions positive.

GoToRO · 2025-07-02T15:22:35 1751469755

Classic problem of "they give you the solution once you ask about it".

_flux · 2025-07-02T15:22:00 1751469720

TIL. I knew about the SRV reconds—which almost nobody uses I think?—but this was news to me.

I guess it's also actually supported, unlike SRV that are more like supported only by some applications? Matrix migrated from SRV to .well-known files for providing the data. (Or I maybe it supports both.)

Arathorn · 2025-07-02T16:38:52 1751474332

Matrix supports both; when you're trying to get Matrix deployed on someone's domain it's a crapshoot on whether they have permission to write to .well-known on the webroot (and if they do, the chances of it getting vaped by a CMS update are high)... or whether they have permission to set DNS records.

bravetraveler · 2025-07-02T15:30:27 1751470227

You'd be surprised at how many games use SRV records. Children struggle with names; let alone ports and modifier keys.

At least... this was before multiplayer discovery was commandeered. Matchmaking and so on largely put an end to opportunities.

remram · 2025-07-02T15:43:24 1751471004

See also SVCB

ctippett · 2025-07-02T15:40:28 1751470828

Huh, that's a neat trick. Your comment is the first I'm learning of HTTPS RR records... so I won't pass judgement on whether an AI should've known enough to suggest it.

pimlottc · 2025-07-02T15:44:54 1751471094

To be fair, the question implies the port number of the local service is the problem, when it's more about making sure users can access it without needing to specify a port number in the URL.

Yes, an experienced person might be able to suss out what the real problem was, but it's not really the LLMs fault for answering the specific question it was asked. Maybe you just wanted to run a server for testing and didn't realize that you can add a non-standard port to the URL.

lcnPylGDnU4H9OF · 2025-07-02T15:50:15 1751471415

It's not really an LLM being "faulty" in that we kinda know they have these limitations. I think they're pointing out that these models have a hard time "thinking outside the box" which is generally a lauded skill, especially for the problem-solving/planning that agents are expected to do.

causal · 2025-07-02T16:42:35 1751474555

Yeah LLMs are generally going to pursue a solution to the question, not figure out the need behind the ask.

Not to mention the solution did end up being to use a higher port number...

pdabbadabba · 2025-07-02T21:14:50 1751490890

If you specify your goals and constraints more fully, ChatGPT o3 does, in fact, identify this solution, but it correctly notes that it only works for modern browsers: https://chatgpt.com/share/6865a0e8-1244-8013-8b4b-a58446bccb...

capt_obvious_77 · 2025-07-02T15:23:11 1751469791

Off-topic, but reading your article about hosting a website on your phone inspired me a lot. Is that possible on a non-jail-broken phone? And what webserver would you suggest?

quaintdev · 2025-07-02T15:26:37 1751469997

Yes, no root required. I asked Claude to write Flutter app that would serve a static file from assets. There are plenty of webserver available on play store too.

bird0861 · 2025-07-02T16:34:27 1751474067

Just use termux.