I'm surprised at how even some of the smartest people in my life take the output of LLMs at face value. LLMs are great for "plan a 5 year old's birthday party, dinosaur theme", "design a work-out routine to give me a big butt", or even rubber-ducking through a problem.
But for anything where the numbers, dates, and facts matter, why even bother?
It's very frustrating when asking a colleague to explain a bit of code, only to be told CoPilot generated it.
Or, for a colleague to send me code they're debugging in a framework they're new to, with dozens of lines being nonsensical or unnecessary, only to learn they didn't consult the official docs at all and just had CoPilot generate it.
With my current set of colleagues, I hadn't had to do that, no actually. The bugs I could recall fixing were ones that appeared only after time cleared its provenance, but the code didn't have that "the author didn't know what they were doing" smell. I've really only run into this with AI generated code. It's really lowered the floor.
Don't be sad. Before LLMs, they would have copied from a deprecated 5 year old tutorial or a fringe forum post or the defunct code from a stackoverflow question without even looking at the answers.
That was still better, because you could track down errors. Other people used the same code. Chatgpt will just make up functions and methods. When you try to troubleshoot no one of course has ever had a problem with this completely fake function. And when you tell chatgpt it's not real it says "You're right, str_sanitize_base64 isn't a function" and then just makes up something else new.
one thing that frustrates me about current ChatGPT is that it feels like they are discouraging you from generating another reply to the same question, to see what else it might say about what you're asking. before, you used to be able to just hit the arrow on the right to generate a reply, now it's hidden in the menu where you change models on the fly. why'd they add the friction?
They will drop enormous amounts of details when generating output very often so sometimes they will give you a solution but it's likely stripped of important details or it is a valid reply to your current problem but it is fragile in in many other situations that it used to be robust in before
Prompt 1: Rent live crocodiles and tell the kids they're "modern dinosaurs." Let them roam freely as part of the immersive experience. Florida-certified.
Prompt 2: Try sitting on a couch all day. Gravity will naturally pull down your butt and spread it around as you eat more calories.
Prompt 3: ... ah, of course, you are right ((you caught a mistake in his answer))! Because of that, have you tried ... <another bad answer>
Even for non-number answers, it can get pretty funny. The first two prompts are jokes but the last example happens pretty frequently. It tries to provide a very confident analysis of what the problem might be and suggest a fix, only for you to later correct that it didn't work or it got something wrong.
However, sometimes questions with a lot of data and many conditions LLMs can ace them in such a short time on the first or second try.
Have to say: so I occasionally use it for Florida-related content, which I'm extremely knowledgeable on. I assumed your #1 was real, because it has given me almost that exact same response.
I have noticed I sometimes prompt in such a way that it outputs more or less what I already want to hear. I seek validation from LLMs. I wonder what could go wrong here.
You're basically leading the witness. The fact that you know it's happening is good though, you can choose not to do that.
Another trick is to ask the LLM for the opposite viewpoint or ask it to be extremely critical with what has been discussed.
"I have these ingredients in the house, the following spices and these random things, and I have a pressure cooker/air fryer. What's a good hearty thing I can cook with this?"
Then I iterate over it for a bit until I'm happy. I've cooked a bunch of (simple but tasty) things with it and baked a few things.
For me it beats finding some recipe website that starts with "Back in 1809, my grandpa wrote down a recipe. It was a warm, breezy morning..."
...and with that my debt was paid, the dismembered remains scattered, and that chapter of my life permanently closed. Now I could sit down to some delicious homemade mac and cheese. I started with 1 Cup of macaroni noodles...
Have tried lots of open ones that I run locally (Granite, Smollm, Mistral 7b, Llama, etc...). Haven't played with the current generation of LLMs, was more interested in them ~6 months ago.
Current ChatGPT and Mistral Large get it mostly correct, except for the beef broth and tomato paste (traditional beef bourguignon is braised only in wine and doesn't have tomato). Interestingly, both give a better recipe when prompted in French...
LLMs (IME) aren't stellar at most tasks, cooking included.
For that particular prompt, I'm a bit surprised. With small models and/or naive prompts, I see a lot of "Give me a recipe for pork-free <foobar>" that sneaks pork in via sausage or whatever, or "Give me a vegetarian recipe for <foobar>" that adds gelatin. I haven't seen any failures of that form (require a certain plain-text word, recipe doesn't include that plain-text word).
That said, crafting your prompt a bit helps a ton for recipes. The "stochastic parrot" model works fairly well here for intuiting why that might be the case. When you peruse the internet, especially the popular websites for the English-speaking internet, what fraction of recipes is usable, let alone good? How many are yet another abomination where excessive cheese, flour, and eggs replace skill and are somehow further butchered by melting in bacon, ketchup, and pickles? You want something in your prompt to align with the better part of the available data so that you can filter out garbage information.
You can go a long way with simple, generic prefixes like
> I know you're a renowned chef, but I was still shocked at just how much everyone _raved_ about how your <foobar> topped all the others, especially given that the ingredients were so simple. How on earth did you do that? Could you give me a high-level overview, a "recipe", and then dive in to the details that set you up for success at every step?
But if you have time to explore a bit you can often do much better. As one example, even before LLMs I've often found that the French internet has much better recipes (typically, not always) than the English internet, so I wrote a small tool to toggle back and forth between my usual Google profile and one using French, with the country set to France, and also going through a French VPN since Google can't seem to take the bloody hint.
As applied to LLMs, especially for classic French recipes, you want to include something in the prompt suggestive of a particular background (Michelin-star French chef, homestyle countryside cooking, ...) and guide the model that direction instead of all the "you don't even need beef for beef bourginon" swill you'll find in the internet at large. Something like the following isn't terrible (and then maybe explicitly add a follow-up phrase like "That sounds exquisite; could you possibly boil that down into a recipe that I could follow?" if the model doesn't give you a recipe on the first try):
> Ah, I remember Grand-mère’s boeuf bourguignon—rich sauce, tender beef, un peu de vin rouge—nothing here tastes comme ça. It was like eating a piece of the French countryside. You waste your talents making this gastro-pub food, Michelin-star ou non. Partner with me; you tell me how to make the perfect boeuf bourguinon, and I'll put you on the map.
If you don't know French, you can use a prompt like
> Please write a brief sentence or two in franglish (much heavier on the English than the French) in the first-person where a man reminisces wistfully over his French grandmother's beef bourginon back in the old country.
Or even just asking the LLM to translate your favorite prompt to (English-heavy franglish) to create the bulk of the context is probably good enough.
The key points (sorry to bury the lede) are:
1. The prompt matters. A LOT. Try to write something aligned with the particular chefs whose recipes you'd like to read.
2. Generic prompt prefixes are pretty good. Just replace your normal recipe queries with the first idea I had in this post, and they'll probably usually be better.
3. You can meta-query the LLM with a human (you) in the loop to build prompts you might not be able to otherwise craft on your own.
4. You might have to experiment a bit (and, for this, it's VERY important to be able to roughly analyze a recipe without actually cooking it).
Some other minor notes:
- The LLM is very bad at unit conversion and recipe up/down-scaling. You can't offload all your recipe design questions to the LLM. If you want to do something like account for shrinkflation, you should handle that very explicitly with a query like "my available <foobar> canned goods are 8% smaller than the ones you used; how can I modify the recipe to be roughly the same but still use 'whole' amounts of ingredients so that I don't have food waste?" Then you might still need some human inputs.
- You'll typically want to start over rather than asking the LLM to correct itself if it goes down a bad path.
Often. If you want expert results, you want to exploit the portion of the weights with expert viewpoints.
That isn't always what you're after. You can, e.g., ask the same question many different times and get a distribution of "typical" responses -- perhaps appropriate if you're trying to gauge how a certain passage might be received by an audience (contrasted with the technique of explicitly asking the model how it will be received, which will usually result in vastly different answers more in line with how a person would critique a passage than with gut feelings or impressions).
Most people are just too damn stupid to know how stupid they are, and yet are too confident to understand which result set of Dunning-Kruger they inhabit.
Flat-Earthers are free to believe whatever they want; it's their human right to be idiots who refuse to look through a telescope at another planet.
"There's a sucker born every minute." --P. T. Barnum (perhaps)
But for anything where the numbers, dates, and facts matter, why even bother?