This statement implies that LLM hallucinations are completely random which is objectively false.
LLMs fill in the blanks when left to synthesize a response from a prompt as opposed to translating a response from a prompt. These synthesized responses, aka hallucinations, are predictable in nature. Quotes, titles of books, web page links, etc.
Conversely, providing an LLM with all of the facts necessary to complete a response will result in few to no hallucinations.
For example:
Select name and row_id from table1 joined on table2 on table1_id.
This will never return "DROP table1;". It will basically only ever return something very close to what you want.
A LLM will give you the highest likely suggestion. If that happens to be a DROP, it will not stop.
Now that is of course going to be extremely unlikely in your example. What is more likely though is that your SELECT may include a sql injection vulnerability, even more so once your prompts get more complex. The chance of that happening or not, is completely random from a users point of view. Are we going to blame the user for not providing the requirement “without vulnerabilities”? Even if they did, it’s not sure to be fulfilled.
In this parent case, the scenario was inverted. Given a sql query, will gpt explain if it has vulnerabilities or not? Will it even explain the gist of it correct? Who knows if it will hallucinate or not?
As will answers from stackoverflow, always read the comments, always review yourself.
Use gpt all you want. I do it it myself, it’s great for suggestions. Just think that using gpt to explain things you don’t understand and can’t verify easily, can be risky. Even more so in bash where the difference making a destructive command can be a lot more subtle than select vs drop.
Now that is of course going to be extremely unlikely in your example.
The OpenAI API now has support for deterministic responses.
There you go, the burden of proof is on the accuser.
If I were to state “you can never ride your bicycle to the moon”, you could easily say, well, there is a remote possibility, and then force me to prove that there actually is no remote possibility, well, you would clearly see the problem.
I’ll state it again: you will never ride your bicycle to the moon and ChatGPT will never return “DROP table1;” in response to the aforementioned request. It might not be correct, but it won’t be wildly off target like is flippantly suggested in these forums for populist appeal.
My entire point was that hallucinations are not random. If you craft a query that reduces the task to mere translation then you will not get some wildly incorrect response like you would if you asked for quotes from War and Peace.
I’m pretty much convinced that most of the shade against LLMs from developers is motivated more by emotion than reason because this stuff is easily verifiable. To not have realized this means approaching the tools willingly blindfolded!
That’s like saying a roll of dice is deterministic. In theory and under controlled circumstances, yes. In the real world and how people use it, no. The OpenAI docs even mention this, it’s only about consistency.
If I encounter a new unknown command and ask chatgpt to explain it. For me, it is entirely unpredictable if the answer will be 100% correct, 95% correct or complete mansplaining bullshit.
Even if it may be close to the truth, with bash the difference between a 95% answer and a 100% answer can be very subtle, with seemingly correct code and seemingly correct explanation give very wrong end result.
Again, you've missed my point entirely. The reason for mentioning determinism was that I am telling you that the burden of proof for "DROP table1;" must be on someone who makes a claim such as yours, not me, and that such proof better come with some evidence, hence:
Now go find some instances where someone is presented with "rm -rf /" or "DROP table1;" when otherwise expecting a response to help with non-destructive commands!
For me, it is entirely unpredictable if the answer will be 100% correct, 95% correct or complete mansplaining bullshit.
Please, show me some evidence of this variance because it is either a bold or ignorant claim to say that the outputs are wildly unpredictable. 100% true vs "complete mansplaining bullshit". Run the numbers! Do 10,000 responses and analyze the results! Show me! I am completely unconvinced by your arguments based on direct experience with reality. You can easily change my mind by presenting me with reproducible evidence to the contrary of my beliefs and experiences.
Even if it may be close to the truth
This is just a classic motte-and-bailey fallacy... let me explain! The bailey is the claim that the outputs are "complete mansplaining bullshit", which is very hard to defend. The motte that you retreat to, "close to the truth", is exactly what I'm saying for prompts that are more of a translation from one language to another, English to bash, English to SQL, etc.
I have never claimed it would be 100% correct, just that the hallucinations are very predictable in nature (not in exactness, in nature). Here's an example of the kind of error:
Select name and row_id from table1 joined on table2 on table1_id.
SELECT table1.name, table2.row_id
FROM table1
JOIN table2 ON table1.table1_id = table2.table1_id;
Well, that should obviously be table1.row_id in the SELECT right? And I guess not super clear from the instructions, but standard that the JOIN should be table1.id Oopsie! Is it valid SQL? yes! Is it "complete mansplaining bullshit". Not. Even. Remotely.
LLMs fill in the blanks when left to synthesize a response from a prompt as opposed to translating a response from a prompt. These synthesized responses, aka hallucinations, are predictable in nature. Quotes, titles of books, web page links, etc.
Conversely, providing an LLM with all of the facts necessary to complete a response will result in few to no hallucinations.
For example:
Select name and row_id from table1 joined on table2 on table1_id.
This will never return "DROP table1;". It will basically only ever return something very close to what you want.