I think most tech folks struggle with it because they treat LLMs as computer programs, and their experience is that SW should be extremely reliable - imagine using a calculator that was wrong 5% of the time - no one would accept that!
Instead, think of an LLM as the equivalent of giving a human a menial task. You know that they're not 100% reliable, and so you give them only tasks that you can quickly verify and correct.
Abstract that out a bit further, and realize that most managers don't expect their reports to be 100% reliable.
Don't use LLMs where accuracy is paramount. Use it to automate away tedious stuff. Examples for me:
Cleaning up speech recognition. I use a traditional voice recognition tool to transcribe, and then have GPT clean it up. I've tried voice recognition tools for dictation on and off for over a decade, and always gave up because even a 95% accuracy is a pain to clean up. But now, I route the output to GPT automatically. It still has issues, but I now often go paragraphs before I have to correct anything. For personal notes, I mostly don't even bother checking its accuracy - I do it only when dictating things others will look at.
And then add embellishments to that. I was dictating out a recipe I needed to send to someone. I told GPT up front to write any number that appears next to an ingredient as a numeral (i.e. 3 instead of "three"). Did a great job - didn't need to correct anything.
And then there are always the "I could do this myself but I didn't have time so I gave it to GPT" category. I was giving a presentation that involved graphs (nodes, edges, etc). I was on a tight deadline and didn't want to figure out how to draw graphs. So I made a tabular representation of my graph, gave it to GPT, and asked it to write graphviz code to make that graph. It did it perfectly (correct nodes and edges, too!)
Sure, if I had time, I'd go learn graphviz myself. But I wouldn't have. The chances I'll need graphviz again in the next few years is virtually 0.
I've actually used LLMs to do quick reformatting of data a few times. You just have to be careful that you can verify the output quickly. If it's a long table, then don't use LLMs for this.
Another example: I have a custom note taking tool. It's just for me. For convenience, I also made an HTML export. Wouldn't it be great if it automatically made alt text for each image I have in my notes? I would just need to send it to the LLM and get the text. It's fractions of a cent per image! The current services are a lot more accurate at image recognition than I need them to be for this purpose!
Oh, and then of course, having it write Bash scripts and CSS for me :-) (not a frontend developer - I've learned CSS in the past, but it's quicker to verify whatever it throws at me than Google it).
Any time you have a task and lament "Oh, this is likely easy, but I just don't have the time" consider how you could make an LLM do it.
Code is the best possible application of LLMs because you can TEST the output.
If the LLM hallucinates something the code won't compile or run.
If the LLM makes a logic error you'll catch it in the manual QA process.
(If you don't have good personal manual QA habits, don't try using LLMs to write your code. And maybe don't hit "accept" on other developer's code reviews either?)
> Code is the best possible application of LLMs because you can TEST the output.
This is an overly simplistic view of software development.
Poorly made abstractions and functions will have knock on effects on future code that can be hard to predict.
Not to mention that code can have side effects that may not affect a given test case, or the code could be poorly optimized, etc.
Just because code compiles or passes a test does not mean it’s entirely correct. If it did, we wouldn’t have bugs anymore.
The usual response to this is something like “we can use the LLM to refactor LLM code if we need” but, in my experience, this leads to very complex, hard to reason about codebases.
Especially if the stack isn’t Python or JavaScript.
> Then why do people keep pushing it for code related tasks?
They don't. You are likely experiencing selection bias. My guess is you work in SW, and so it makes sense that you're the target of those campaigns. The bulk of ChatGPT subscribers are not doing SW, and no one is bugging them to use it for code related tasks.
Because there are other ways to validate the output, types being one of them, tests another. Or simply running the code. It's easy enough to validate the output given the right approach that code generated by an LLM (usually as the result of a conversation/discussion about what should be accomplished) is a net positive.
If you zero-prompt and copy-paste the first result into your codebase, yeah, the accuracy problem will rear its ugly head real quick.
A similar use case for me - I wrote some technical documentation for our wiki about a somewhat complicated relationship between ids in some database tables. I copied my text explanation into an LLM and asked it to make a diagram and it did so. Took very little time from me and it was fast/easy to verify that the quality was good.
I think there’s the added reason that a lot of folks went into tech because (consciously or unconsciously) they prefer dealing with predictable machines than with unreliable humans. And now that career choice begins to look like a bait and switch. ;)
> Instead, think of an LLM as the equivalent of giving a human a menial task. You know that they're not 100% reliable, and so you give them only tasks that you can quickly verify and correct.
The problem is: for the tasks that I can give the LLM (or human) that I can easily verify and correct, the LLM fails with the majority of them, for example
- programming tasks of my area of expertise (which is more "mathematical" than what is common in SV startups), where I know how a high-level solution has to look like, and where I can ask the LLM to explain the gory details to me. Yes, these gory details are subtle (which is why the task can be menial), but the code has to be right. I can verify this, and the code is not correct.
- getting literature references about more obscure scientific (in particular mathematical) topics. I can easily check whether these literature references (or summaries of these references) are hallucinations - they typically are.
LLMs on their own are effectively useless for references or citations. They need to be plugged into other systems for that - search-enabled ones like https://gemini.google.com or ChatGPT with search enabled or Perplexity can do this, although at that point they are mostly running the exact same searches you would.
Instead, think of an LLM as the equivalent of giving a human a menial task. You know that they're not 100% reliable, and so you give them only tasks that you can quickly verify and correct.
Abstract that out a bit further, and realize that most managers don't expect their reports to be 100% reliable.
Don't use LLMs where accuracy is paramount. Use it to automate away tedious stuff. Examples for me:
Cleaning up speech recognition. I use a traditional voice recognition tool to transcribe, and then have GPT clean it up. I've tried voice recognition tools for dictation on and off for over a decade, and always gave up because even a 95% accuracy is a pain to clean up. But now, I route the output to GPT automatically. It still has issues, but I now often go paragraphs before I have to correct anything. For personal notes, I mostly don't even bother checking its accuracy - I do it only when dictating things others will look at.
And then add embellishments to that. I was dictating out a recipe I needed to send to someone. I told GPT up front to write any number that appears next to an ingredient as a numeral (i.e. 3 instead of "three"). Did a great job - didn't need to correct anything.
And then there are always the "I could do this myself but I didn't have time so I gave it to GPT" category. I was giving a presentation that involved graphs (nodes, edges, etc). I was on a tight deadline and didn't want to figure out how to draw graphs. So I made a tabular representation of my graph, gave it to GPT, and asked it to write graphviz code to make that graph. It did it perfectly (correct nodes and edges, too!)
Sure, if I had time, I'd go learn graphviz myself. But I wouldn't have. The chances I'll need graphviz again in the next few years is virtually 0.
I've actually used LLMs to do quick reformatting of data a few times. You just have to be careful that you can verify the output quickly. If it's a long table, then don't use LLMs for this.
Another example: I have a custom note taking tool. It's just for me. For convenience, I also made an HTML export. Wouldn't it be great if it automatically made alt text for each image I have in my notes? I would just need to send it to the LLM and get the text. It's fractions of a cent per image! The current services are a lot more accurate at image recognition than I need them to be for this purpose!
Oh, and then of course, having it write Bash scripts and CSS for me :-) (not a frontend developer - I've learned CSS in the past, but it's quicker to verify whatever it throws at me than Google it).
Any time you have a task and lament "Oh, this is likely easy, but I just don't have the time" consider how you could make an LLM do it.