LLMs have limits. They are super powerful but they can't make the kind of leap humans can. For example, I asked both Claude and Gemini below problem.
"I want to run webserver on Android but it does not allow binding on ports lower than 1000. What are my options?"
Both responded with below solutions
1. Use reverse proxy
2. Root the phone
3. Run on higher port
Even after asking them to rethink they couldn't come up with the solution I was expecting. The solution to this problem is HTTPS RR records[1]. Both models knew about HTTPS RR but couldn't suggest it as a solution. It's only after I included it in their context both agreed it as a possible solution.
I'm not sure I would measure LLMs on recommending a fairly obscure and rather new spec that isn't even fully supported by e.g. Chrome. That's a leap that I, a human, wouldn't have made either.
On the other hand, I find that "someone who has read all the blogs and papers and so can suggest something you might have missed" is my favorite use case for LLM, so seeing that it can miss a useful (and novel to me) idea is annoying.
Which is easier for you to remember: facts you’ve seen a hundred times in your lifetime, or facts you’ve seen once or twice?
For an LLM, I’d expect it to be similar. It can recall the stuff it’s seen thousands of times, but has a hard time recalling the niche/underdocumented stuff that it’s seen just a dozen times.
> Which is easier for you to remember: facts you’ve seen a hundred times in your lifetime, or facts you’ve seen once or twice?
The human brain isn't a statistical aggregator. If you see a psychologically socking thing once in your lifetime, you might remember it even after dementia hits when you're old.
On the other hand, you pass by hundrends of shops every day and receive the data signal of their signs over and over and over, yet you remember nothing.
You remeber stuff you pay attention to (for whatever reason)
If it's still 2020, then yes. In 2025 post-training like RLHF made it that these models do not just predict the next token, the reward function is a lot more involved than that.
Instruct models like ChatGPT are still token predictors. Instruction following is an emergent behavior from fine-tuning and reward modeling layered on top of the same core mechanism: autoregressive next-token prediction.
> So is the expectation for it to suggest obvious solutions that the majority of people already know?
Certainly a majority of people don't know this. What we're really asking is whether an LLM is expected to more than (or as much as) the average domain expert.
I'm adding this tidbit of knowledge to my context as well... :-P
Only recently have I started interacting with LLM's more (I tried out a previous "use it as a book club partner" suggestion, and it's pretty great!).
When coding with them (via cursor), there was an interaction where I nudged it: "hey, you forgot xyz when you wrote that code the first time" (ie: updating an associated data structure or cache or whatever), and I find myself INTENTIONALLY giving the machine at least the shadow of a benefit of the doubt that: "Yeah, I might have made that mistake too if I were writing that code" or "Yeah, I might have written the base case first and _then_ gotten around to updating the cache, or decrementing the overall number of found items or whatever".
In the "book club" and "movie club" case, I asked it to discuss two movies and there were a few flubs: was the main character "justly imprisoned", or "unjustly imprisoned" ... a human might have made that same typo? Correct it, don't dwell on it, go with the flow... even in a 100% human discussion on books and movies, people (and hallucinating AI/LLM's) can not remember with 100% pinpoint accuracy every little detail, and I find giving a bit of benefit of the doubt to the conversation partner lowers my stress level quite a bit.
I guess: even when it's an AI, try to keep your interactions positive.
TIL. I knew about the SRV reconds—which almost nobody uses I think?—but this was news to me.
I guess it's also actually supported, unlike SRV that are more like supported only by some applications? Matrix migrated from SRV to .well-known files for providing the data. (Or I maybe it supports both.)
Matrix supports both; when you're trying to get Matrix deployed on someone's domain it's a crapshoot on whether they have permission to write to .well-known on the webroot (and if they do, the chances of it getting vaped by a CMS update are high)... or whether they have permission to set DNS records.
Huh, that's a neat trick. Your comment is the first I'm learning of HTTPS RR records... so I won't pass judgement on whether an AI should've known enough to suggest it.
To be fair, the question implies the port number of the local service is the problem, when it's more about making sure users can access it without needing to specify a port number in the URL.
Yes, an experienced person might be able to suss out what the real problem was, but it's not really the LLMs fault for answering the specific question it was asked. Maybe you just wanted to run a server for testing and didn't realize that you can add a non-standard port to the URL.
It's not really an LLM being "faulty" in that we kinda know they have these limitations. I think they're pointing out that these models have a hard time "thinking outside the box" which is generally a lauded skill, especially for the problem-solving/planning that agents are expected to do.
Off-topic, but reading your article about hosting a website on your phone inspired me a lot. Is that possible on a non-jail-broken phone? And what webserver would you suggest?
Yes, no root required. I asked Claude to write Flutter app that would serve a static file from assets. There are plenty of webserver available on play store too.
"I want to run webserver on Android but it does not allow binding on ports lower than 1000. What are my options?"
Both responded with below solutions
1. Use reverse proxy
2. Root the phone
3. Run on higher port
Even after asking them to rethink they couldn't come up with the solution I was expecting. The solution to this problem is HTTPS RR records[1]. Both models knew about HTTPS RR but couldn't suggest it as a solution. It's only after I included it in their context both agreed it as a possible solution.
[1]: https://rohanrd.xyz/posts/hosting-website-on-phone/