I'll try to give you credit for more than dismissing my question off-hand...
Yes, it may not need to know with perfect certainty when it's unsure or stuck, but even to meet a lower bar of usefulness, it'll need at least an approximate means of determining that its knowledge is inadequate. To purport to help with the hallucination problem requires no less.
To make the issue a bit more clear, here are some candidate components to a stuck() predicate:
- possibilities considered
- time taken
- tokens consumed/generated (vs expected? vs static limit? vs dynamic limit?)
If the unsure/stuck determination is defined via more qualitative prompting, what's the prompt? How well has it worked?
I don't believe[1] any of those are part of the MCP protocol - it's essentially "the LLM decided to call it, with X arguments, and will interpret the results however it likes". It's an escape hatch for the LLM to use to do stuff like read a file, not a monitoring system that acts independently and has control over the LLM itself.
(But you could build one that does this, and ask the LLM to call it and give your MCP that data... when it feels like it)
So you'd be using this by telling the LLM to run it when it thinks it's stuck. Or needs human input.
1: I am not anything even approaching deeply knowledgeable about MCP, so please, someone correct me if I'm wrong! There do seem to be some bi-directional messaging abilities, e.g. notification, but to figure out thinking time / token use / etc you would need to have access to the infrastructure running the LLM, e.g. Cursor itself or something.
You are trying to control a system that is inherently chaotic.
You can probably get some where by indeed running a task 1000 times and looking for outliers in the execution time or token count. But that is of minimal use and anything more advanced than that is akin to water divining.
The system is only nondeterministic (and a model of nondeterminism at that) when it's emitting tokens. It (the system) becomes completely deterministic when it calls a tool and a result is returned from the tool.
This is little different than how I wrote this. Now it is deterministic, when I hit reply.
So not at all, but that doesn't mean it's not useful.