It's funny how we went from "it's impossible for a computer to write meaningful ...

HaZeust · 2024-10-23T21:52:18 1729720338

I said this last year[1] and still FIRMLY believe it:

"It's even crazier to me that we've just... Accepted it, and are in the process of taking it for granted. This type of technology was a moonshot 2 years ago, and many experts didn't expect it in the lifetimes of ANYONE here - and who knew the answer was increasing transformers and iterating attention?

And golly, there are a LOT of nay-sayers of the industry. I've even heard some folks on podcasts and forums saying this will be as short-lived and as meaningless as NFTs. NFTs couldn't re-write my entire Python codebase into Go, NFTs weren't ever close to passing the bar or MCAT. This stuff is crazy!"

1 - https://news.ycombinator.com/item?id=37879730

cloogshicer · 2024-10-23T22:26:06 1729722366

> NFTs couldn't re-write my entire Python codebase into Go

Neither can LLMs. They can produce output that looks like a plausible re-write of your codebase, but on closer inspection turns out to have many minor and major errors everywhere.

The problem is that the closer inspection part is very often more work than writing the code by hand in the first place.

There hasn't been enough evidence for me that this will be possible to fix.

devjab · 2024-10-24T05:52:33 1729749153

I disagree with you on this. If you go through my history on LLM’s you’ll see that I didn’t consider them more than fancy auto-complete. I still think of it mainly as fancy auto-compete for a lot of things, but we’ve begun using Claude in our porting of our C to Rust. Claude does it really, really, well. You have to look it over, but it’s far more efficient than any one of us can without the assistance. I don’t have the exact numbers but we’re close to a 90% accuracy on what is accepted without corrections.

We follow a YAGNI approach to our code architecture and abstractions, meaning it’s very straight forward with things happening where they are written and not in 9 million places like Clean Code lovers try to do. Our C services and Libraries are also fairly small and “one purpose”. I’m not sure you would be wrong on larger code bases, at least not right now.

With what we see Claude do now though, I don’t think we’re far from a world where Software Developers are going to do significantly different work. I also think quite a lot of the stuff we do today will no longer exist.

HaZeust · 2024-10-23T22:29:08 1729722548

I've used GPT-4 to do what I had said. I pasted the errors I was given, and did so for 2-3 more iterations, and it successfully ported critical in-house infrastructure from Python 3 to Go.

tharant · 2024-10-23T23:57:12 1729727832

I feel like I’ve been gaslit by the entire GenAI industry that I’m just bad at prompt engineering. When I’m talking to an LLM about stuff unrelated to code-generation, I can get sane and reasonable responses—engaging and useful even. The same goes for image generation and even the bit of video generation I’ve tried. For me however, getting any of these models to produce reasonably sane code has proven elusive. Claude is a bit better than others IME but I can’t even get it to describe a usable project template and directory structure for anything other than very simple Scala, Java, or Python projects. The code I’m able to generate always needs dramatic and manual changes; even trying to get a model to refactor a method in the code it wrote within the current context window results in bugs and broken business logic. I dearly wish I knew how others are able to accomplish things like “it successfully ported critical in-house infrastructure from Python 3 to Go.”. To date, I’ve seen no actual evidence (aside from what are purported to be LLM-generated artifacts) that anything beyond generating (or RAG-ing existing code) is even possible. What am I missing? Is it unrealistic for me to assume that prompt engineering such a seemingly dramatic LLM-generated code rewrite is something that I could learn by example from others? If not, can somebody recommend resources related to learning how to accomplish non-trivial code generation?

HaZeust · 2024-10-24T00:42:30 1729730550

> If not, can somebody recommend resources related to learning how to accomplish non-trivial code generation?

Learn how to think ontologically and break down your requests first by what you're TRULY looking for, and then understand what parts would need to be defined in order to build that system -- that "whole". Here's some guides:

1.) https://platform.openai.com/docs/guides/prompt-engineering 2.) https://www.promptingguide.ai/

tharant · 2024-10-24T01:07:53 1729732073

Thank you for the links!

> Learn how to think ontologically and break down your requests first by what you're TRULY looking for, and then understand what parts would need to be defined in order to build that system -- that "whole".

Since I’m dealing with models rather than other engineers should I expect the process of breaking down the problem to be dramatically different from that of writing design documents or API specs? I rarely have difficulty prompting (or creating useful system prompts for) models when chatting or doing RAG work with plain English docs but once I try to get coherent code from a model things fall apart pretty quickly.

HaZeust · 2024-10-24T01:19:29 1729732769

That's actually a solid question! You can probably ask GPT to AI-optimize a standard technical spec you have and to "ask clarifying questions in order to optimize for the best output". I've done that several times with past specs I've had and it was quite a fruitful process!

tharant · 2024-10-24T01:37:01 1729733821

Great idea. I’ve used that tactic in the past for non-code related prompts; not sure why I didn’t think of trying it with my code-generation prompting. I’ll give it a shot.

hackernewds · 2024-10-24T05:22:22 1729747342

the "ask me what info you're missing" strategy works very well, since the AI will usually start the task every time to avoid false positives of asking a question. and then it also asks very good questions, I then realize were necessary info

devjab · 2024-10-24T06:02:27 1729749747

> usable project template and directory structure

This caught my eye and I’m genuinely curious about what you mean by it. Part of our success with Claude is that we don’t do abstractions, “perfect architecture”, DRY, SOLID and other religions that were written by people who sell consulting in their principles. If we ask LLMs to do any form of “Clean Code” or give them input on how we want the structure, they tend to be bad it.

Hell, if you want to “build from the bottom” you’re going to have to do it over several prompts. I had Claude build a blood bowl game for me, for the fun of it. It took maybe 50 prompts. Each focusing on different aspects. Like, I wanted it to draw the field and add mouse clickable and movable objects with SDL2, and that was one prompt. Then you feed it your code in a new prompt and let it do the next step based on what you have. If the code it outputs is bad, you’ll need to abandon the prompt again.

It’s nothing like getting an actual developer to do things. They can think for themselves and the probability engine won’t do any of that even if it pretends to. Their history for building things from scratch also seems to be quickly “tarnished” within the prompt context. Once they’ve done the original tasks I find it hard to get them to continue on it.

tharant · 2024-10-24T07:21:08 1729754468

> This caught my eye and I’m genuinely curious about what you mean by it. Part of our success with Claude is that we don’t do abstractions, “perfect architecture”, DRY, SOLID and other religions

Within my environment, some of those “religions” are more than a requirement; they’re also critical to the long-term maintenance of a large collection of active repositories.

I think one of the problems folks tend to have with following or implementing a “religion” (by which I mean specific structural and/or stylistic patterns within a codebase) comes down to a fear of being stuck forever with a given pattern that may not fit future needs. There’s nothing wrong with iterating on your religion’s patterns as long as you have good documentation with thorough change logs; granted, that can be difficult or even out of reach for smaller shops.

devjab · 2024-10-24T08:17:42 1729757862

My personal problem with them is that after decades in enterprise software I’ve never seen them be beneficial to long-term maintenance. People like Uncle Bob (who haven’t actually worked in software engineering since 20 years before Python was invented) will respond to that sort of criticism with a “they misunderstood the principles”. Which is completely correct in many cases, but if so many people around the world misunderstand the principles then maybe the principles simply aren’t good?

I don’t think any of them are inherently bad, but they lead to software engineering where people over complicate things. Building abstractions they might never need. I’ve specialised in the field of taking startups into enterprise, and 90% of the work is removing the complexity which has made their software development teams incapable of delivering value in a timely manner. Some of this is because they build infrastructures as though they were Netflix or Google, but a lot of times it’s because they’ve followed Clean Code principles religiously. Abstractions aren’t always bad, but you should never abstract until you can’t avoid it. Because two years down into your development you’ll end up with code bases that are so complex that it makes them hard to work with.

Especially when you get the principles wrong. Which many people do. Over all though, we’ve had 20 years of Clean Code, SOLID, DRY and so on, and if you look at our industry today, there is no less of a mess in software engineering than there were before. In fact some systems still run on completely crazy Fortran or COBOL because nobody using “modern” software engineering have been capable of replacing them. At least that’s the story in Denmark, and it hasn’t been for a lock of trying.

I think the main reason many of these principles have become religions is because they’ve created an entire industry of pseudo-jobbers who manage them, work as consultants and what not. All people who are very good at marketing their bullshit, but also people who have almost no experience actually working with code.

Like I said, nothing about them are inherently bad. If you know when to use which parts, but almost nobody does. So to me the only relevant principle is YAGNI. If you’re going to end up with a mess of a code base anyway, you might as well keep it simple and easy to change. I say this as someone who works as an external examiner for CS students, where we still teach all these things that so often never work. In fact a lot of these principles were things I was thought when I took my degree, and many haven’t really undergone any meaningful changes with the lessons learned since their initial creation.

tharant · 2024-10-24T22:55:51 1729810551

I appreciate your perspective and I don’t disagree with you entirely. I’ve worked in environments that struggle with putting religion before productivity and maintainability; the result is often painful. I’ve also worked in environments where religion, productivity, and maintainability are equals; it makes for a nice working environment. Perhaps there’s a bit more bureaucracy involved (forced documentation can be frustrating—especially when you realize your docs don’t match the spec or the even the code) but, in my experience, the outcome is more pleasant. Scaling religious requirements while maintaining productivity can be tricky though; religion can be deeply expensive (and therefore bad business) for smaller orgs, but it can also be easily politicized in larger orgs, which often results in engineer dissatisfaction. Religion will always be controversial. :)

malfist · 2024-10-24T00:03:12 1729728192

I think it's level of expertise. You are an expert in coding (10,000 hours and all that) so you know when the code is wrong. Everything else you put into it and get plausible sounding response is just as incorrect as the plausible sounding responses to coding questions, just you know enough to spot the errors.

LLMs are insidious, it feeds into "everything is simple" concept a lot of us have of the world. We ask an LLM for a project plan and it looks so good we're willing to fire our TPM, or a TPM asks the LLM for code and it gives them code that looks so good they question the value of an engineer. In reality, the LLM cannot do either role's job well.

tharant · 2024-10-24T00:53:09 1729731189

> You are an expert in coding (10,000 hours and all that) so you know when the code is wrong.

While I appreciate the suggestion that I might be an expert, I am decidedly not. That said, I’ve been writing what the companies I’ve worked for would consider “mission critical” code (mostly Java/Scala, Python, and SQL) for about twenty years, I’ve been a Unix/Linux sysadmin for over thirty years, and I’ve been in IT for almost forty years.

Perhaps the modernity and/or popularity of the languages are my problem? Are the models going to produce better code if I target “modern” languages like Go/Rust, and the various HTML/JS/FE frameworks instead of “legacy” languages like Java or SQL?

Or maybe my experience is too close to bare metal and need to focus on more trivial projects with higher-level or more modern languages? (fwiw, I don’t actually consider Go/Rust/JS/etc to be higher-level or more “modern” languages than the JVM languages with which I’m experienced; I’m open to arguments though)

> LLMs are insidious, it feeds into "everything is simple" concept a lot of us have of the world.

Yah, that’s what I mean when I say I feel gaslit.

> In reality, the LLM cannot do either role's job well.

I am aware of this. I’m not looking for an agent. That said, am I being too simplistic or unreasonable in expecting that I too could leverage these models (albeit perhaps after acquiring some missing piece of knowledge) as assistants capable of reasoning about my code or even the code they generate? If so, how are others able to get LLMs to generate what they claim are “deployable” non-trivial projects or refactorings of entire “critical” projects from the Python language to Go? Is someone lying or do I just need (seemingly dramatically) deeper knowledge of how to “correctly” prompt the models? Have I simply been victim of (again, seemingly dramatically) overly optimistic marketing hype?

vessenes · 2024-10-24T01:19:19 1729732759

We have a similar amount of IT experience, although I haven't been a daily engineer for a long time. I use aider.chat extensively for fun projects, preferring the Claude backend right now, and it definitely works. This site is 90% aider, give or take, the rest my hand edits: https://beta.personacollective.ai -- and it involves solidity, react, typescript and go.

Claude does benefit from some architectural direction. I think it's better at extending than creating from whole-cloth. My workflow looks like:

1) Rough out some code, say a smart contract with the key features

2) Tell claude to finish it and write extensive testing.

3) Run abigen on the solidity to get a go library

4) Tell claude to stub out golang server event handlers for every event in the go library

5) Create a react typescript site myself with a basic page

6) Tell claude to create an admin endpoint on the react site that pulls relevant data from the smart contracts into the react site.

6.5) Tell claude to redesign the site in a preferred style.

7) Go through and inspect the code for bugs. There will be a bunch.

8) For bugs that are simple, prompt Claude to fix: "You forgot x,y,z in these files. fix it."

9) For bugs that are a misunderstanding of my intent, either code up the core loop directly that's needed, or negotiate and explain. Coding is generally faster. Then say "I've fixed the code to work how it should, update X, Y, Z interfaces / etc."

10) for really difficult bugs or places I'm stumped, tar the codebase up, go to the chat interface of claude and gpto1-preview, paste the codebase in (claude can take a longer paste, but preview is better at holistic bugfixing), and explain the problem. Wait a minute or two and read the comments. 95% of the time one of the two LLMS is correct.

This all pretty much works. For these definitions of works:

1) It needs handholding to maintain a codebase's style and naming.

2) It can be overeager: "While I was in that file, I ..."

3) If it's more familiar with an old version of a library you will be constantly fighting it to use a new API.

How I would describe my experience: a year ago; it was like working with a junior dev that didn't know much and would constantly get things wrong. It is currently like working with a B+ senior-ish dev. It will still get things wrong, but things mostly compile, it can follow along, and it can generate new things to spec if those requests are reasonable.

All that to say, my coding projects went from "code with pair coder / puppy occasionally inserting helpful things" to "most of my time is spent at the architect level of the project, occasionally up to CTO, occasionally down to dev."

Is it worth it? If I had a day job writing mission critical code, I think I'd be verrry cautious right now, but if that job involved a lot of repetition and boiler plate / API integration, I would use it in a HEARTBEAT. It's so good at that stuff. For someone like me who is like "please extend my capacity and speed me up" it's amazing. I'd say I'm roughly 5-8x more productive. I love it.

tharant · 2024-10-24T01:52:09 1729734729

This is very good insight, the likes of which I’ve needed; thank you. Your workflow is moderately more complex and definitely less “agentic” than I’d expected/hoped but it’s absolutely not out of line with the kind of complexity I’m willing to tackle nor what I’d personally expect from pairing with or instructing a knowledgeable junior-to-mid level developer/engineer.

vessenes · 2024-10-24T07:37:24 1729755444

Totally. It’s actually an interesting philosophical question: how much can we expect at different levels of precision in requirements, and when is code itself the most efficient way to be precise? I definitely feel my communication limits more with this workflow, and often feel like “well, that’s a fair, totally wrong, but fair interpretation.”

Claude has the added benefit that you can yell at it, and it won’t hold it against you. You know, speaking of pairing with a junior dev.

tharant · 2024-10-24T23:42:11 1729813331

> Claude has the added benefit that you can yell at it, and it won’t hold it against you.

Yet.

I don’t look forward to the day these models are trained on all the context we’ve fed their predecessors; if AGI is possible, it’s gonna hate us. :)

namanyayg · 2024-10-24T04:42:51 1729744971

Replace all this with Cursor, chat to Claude inside the project directory and talk to multiple files at once

It can also index docs pages of newer APIs and/or search the web to find latest info of newer libraries, so you won't struggle with issue #3

vessenes · 2024-10-24T07:35:08 1729755308

Agreed cursor is good to very good, I’m just extremely tied to my old man vi workflow.

worthless-trash · 2024-10-24T03:50:26 1729741826

You and me both man, Either I'm speaking a different language or I'm simply really bad at explaining what I need. I'd love to see someone actually do this on video.

tharant · 2024-10-24T04:18:23 1729743503

Indeed. I’ve yet to run across an actual demonstration of an LLM that can produce useful, non-trivial code. I’m not suggesting (yet) that the capabilities don’t exist or that everyone is lying—the web is a big place after all and finding things can be difficult—but I am slowly losing faith in the capability of what the industry is selling. It seems right now one must be deeply knowledgeable of and specialized in the ML/AI/NLP space before being capable of doing anything remotely useful with LLM-based code generation.

grbsh · 2024-10-24T13:00:05 1729774805

I think there is something deeper going on: “coding” is actually 2 activities: the act of implementing a solution, and the act of discovering the solution itself. Most programmers are used to doing both at once. But to code effectively with an LLM, you need to have already discovered the solution before you attempt to implement it!

I’ve found this to be the difference between writing 50+ prompts / back and for the to get something useful, and when I can get something useful in 1-3 prompts. If you look at Simon’s post, you’ll see that these are all self-contained tools, whose entire scope has been constrained from the outset of the project.

When you go into a large codebase and have to change some behavior, 1) you usually don’t have the detailed solution articulated in your mind before looking at the codebase. 2) That “solution” likely consists of a large number of small decisions / judgements. It’s fundamentally difficult to encode a large number of nuanced details in a concise prompt, making it not worth it to use LLMs.

On the other hand, I built this tool: https://github.com/gr-b/jsonltui that I now use every day almost entirely using Claude. “CLI tool to visualize JSONL with textual interface, localizing parsing errors” almost fully qualifies this. In contrast, my last 8 line PR at my company, while it would appear much simpler on the surface level, contains many more decisions, not just of my own, but reflecting team conversations and expectations that are not written down anywhere. To communicate this shared implicit context with Claude would be so much more difficult than to perform the change myself.

simonw · 2024-10-24T04:32:30 1729744350

I think https://tools.simonwillison.net/openai-audio is useful and non-trivial.

tharant · 2024-10-24T05:27:52 1729747672

You’re probably right but I’m far more interested in seeing things like how you prompted the model to produce your audio tool’s code. Did you have a design doc or did you collaborate with the model to come up with a design and its implementation ad-hoc? How much manual rewriting did you do. How much worked with little to no editing? How much did you prompt the model to fix any bugs it created? How successful was it? Did you specify a style guide up front or just use what it spat out and try to refactor later? How did that part go? You see where I’m going?

Oh, wow, it honestly just occurred to me that examples of how to prompt a model to produce a certain kind of content might be considered, more or less, some kind of trade secret vaguely akin to a secret recipe. That would be a bit depressing but I get it.

simonw · 2024-10-24T06:07:29 1729750049

Here are the full Claude transcripts I used to build the OpenAI Audio app:

- https://gist.github.com/simonw/0a4b826d6d32e4640d67c6319c7ec... - most of the work

- https://gist.github.com/simonw/a04b844a5e8b01cecd28787ed375e... - some tweaks

Lots more details in my full post about it here: https://simonwillison.net/2024/Oct/18/openai-audio/

fragmede · 2024-10-24T05:51:18 1729749078

details at: https://simonwillison.net/2024/Oct/18/openai-audio/

galaxyLogic · 2024-10-24T06:09:13 1729750153

Sounds a bit like how Agile used to be. If it's not working, you're not doing it right.

joquarky · 2024-10-24T03:38:51 1729741131

I find it to be very useful for functional programming since the limited scope aligns with the limited LLM context.

tharant · 2024-10-24T04:07:50 1729742870

Assuming you mean the paradigm often known as FP (which makes use of concepts from the Lambda Calculus and Category Theory) and languages like Scala and Haskell that support Pure FP, well… my experience in trying to get LLMs to generate non-trivial FP (regardless the purity) has been entirely useless. I’d love to see an example of how you’re able to get useful code that is non-trivial—by which I mean code that includes useful business logic instead of what’s found in your typical “Getting Started” tutorial.

galaxyLogic · 2024-10-24T06:13:37 1729750417

That's probabaly because AI has read all those "Getting Started" -tutorials.

fivestones · 2024-10-25T06:55:07 1729839307

Here’s my experience. Like some of the other responses here to your comment, nothing I’ve made that’s more than a few lines of code has worked after one prompt, or even two or three. An example of something I’m working at the moment is here: https://github.com/fivestones/family-organizer. That codebase is about 99% LLM generated. I’d say it’s 60% from chatgpt 4o, 30% Claude Sonnet 3.5, and the rest mostly chatgpt o1-preview. Just the last commit has a bit of Claude Sonnet 3.5-new. I can send you my chat transcripts if it would be helpful but it would take some work since it’s scattered over lots of different conversations. At the beginning I was trying to describe the whole project to the LLM and then ask it to implement one feature. After maybe 5-20 prompts and iterations back and forth, I’d have something I was happy with for that feature and would move on to the next. However, I found, like some others here, the model would get bogged down in mistakes it had made previously, or would forget what I told it originally, or just wouldn’t work as well the longer my conversation went. So what I switched to, which seems to work really well, is to just paste in my entire current codebase (or at least all the relevant files) into a fresh chat, and then tell it about the one new feature I wanted. I try to focus on adding new features, or on fixing a specific problem. I’ll then sometimes (especially for a new feature) explain that this is my current code, here is the new thing I’m wanting it to do, and then tell it not to write any code for me but instead to ask me any questions it has. After this I’ll answer all its questions and tell it to ask me any follow up questions it has. “If you don’t have any more questions just say “I’m ready”. When it gets to the point of saying “I’m ready”, if working with chatgpt I would change the model from 4o to o1-preview, and then just say, “ok, go ahead”. After it spits out its response, it usually takes some iteration in the same chat: me copying and pasting code into vs code, running it, copy pasting any errors back to the LLM, or describing to it what I didn’t like about the results, and repeating. I might go through that process 5-10 times for something small, or 20-25 times for something bigger. Once I’ve gotten something working, I’ll abandon that chat and start over in a new one with my next problem or desired feature. I basically have done nothing at all with telling it anything about how I want it to structure the code. For the project above I wanted to use instantdb so I fed it some of the instantdb documentation and examples at the beginning. Later features just worked—it followed along successfully with what it saw in my codebase already. I am also using typescript/next.js and so those were pretty much the limits of what I’ve told it as constraints as to how to structure the code. I’m not a programmer, and I think if you look at the code you’ll probably see lots of stuff that looks bad to you if you are a programmer. But I don’t have plans to reply this code at scale—it’s just something I’m making for my family to use, and for whoever finds it on GitHub to use as well. So as long as it works and I’m happy with the result I’m not too concerned about the code. The most concern I have might be things like thinking about future features I want to add and whether or not the code I’m adding now will make future code hard to add or not. Usually I’ll just tell the LLM something like, “keep in mind when making this db schema that later we’ll need to do x or y”, and leaving it at that. The other thing is that I’ve never used react let alone next.js and have only dabbled in js here and there. But here I am, making something that works and that I’m happy with, thanks to the LLMs. I think that’s pretty amazing to me. Sometimes I struggle to get it to do what I want, and usually then I just scrap the latest code changes back to the last commit and then start over, often with a different LLM model. It sounds like your use case is a lot different than mine, as I’m just doing stuff in my spare time for fun and for me or my family to use. But maybe some of those ideas will help you. Let me know if you want some chat transcripts. One other thing, I found a vs code extension that lets me choose a file or set of files in the vs code explorer, right click, and export for LLM consumption. This is really helpful. It just makes a tree of whichever files I had selected (like the output from the terminal tree command) and follow that with the full text of each file, and copies all that to the clipboard. So to start a new chat, I just select files, right click, export for LLM, and then paste into the LLM new char window.

swat535 · 2024-10-23T23:56:16 1729727776

GPT-4 has no understanding of logic what-so-ever, let's stop pretending it does.

If it gives you a solution that is wrong, you have to point it at, then it will give you a second version , if that is also wrong, it will then slightly modify the same solutions over and over again instead of actually fixing the issue.

It gets stuck in a loop of giving you 2-3 versions of the same solution with the slightly different outputs.

It's only useful for boilerplate code and even then, you have to clean it up..

GaggiX · 2024-10-24T00:11:16 1729728676

Then you should try Claude, I have never seen it get stuck in a loop, at some point it would just rewrite everything if it came to that.

ants_everywhere · 2024-10-24T00:29:51 1729729791

GPT-4 is pretty bad at generating Python. It kind of works as well as combining 2-3 stack overflow questions, but it can't tell that the combination is sane.

I mostly agree with what the others are saying. It can generate boilerplate and it can generate simple API calls when there are lots of examples in the training set.

Generating Go is probably easier because at least you get compiler feedback.

Right now the only place it saves me time are with languages I don't know at all and with languages like Bash and SQL where I just can't bring myself to care enough to remember the long tail of more esoteric points that I don't use every day.

fhdsgbbcaA · 2024-10-24T00:00:38 1729728038

That just means the bugs are so subtle you haven’t found them yet, they are there and unspooling the damage may be very painful.

HaZeust · 2024-10-24T00:43:30 1729730610

That's rather assuming of you, they're there no less than they would be for a human's programming - and VERY likely no more.

kuhewa · 2024-10-24T01:09:31 1729732171

But one is trying to write good-enough code. The other is trying to write good-enough-looking code. The probability of pain arising from the bugs of the latter is probably greater.

HaZeust · 2024-10-24T01:20:32 1729732832

I'd actually love to see a benchmark on this - we're just speculating now.

kuhewa · 2024-10-24T01:35:02 1729733702

The work demonstrating the Frankfurtian Bullshit nature of generated prose would suggest as much, given the architecture is the same for code outputs it seems like a fair assumption until it is demonstrated otherwise.

tharant · 2024-10-25T02:28:56 1729823336

> they're there no less than they would be for a human's programming - and VERY likely no more.

This is VERY different from my own experience. The bugs introduced by the code I’ve tried to generate via LLMs (Mostly Claude, some GPT-4o and o1-preview, and lots of one-off fiddling with local models to see if they’re any better/worse than commercial products) are considerably more numerous (and often more subtle) than what my fellow engineers—juniors included—tend to introduce.

I /want/ these tools to be useful; they haven’t been so far though and I’m kinda stuck on understanding if I’m just not using ‘em right or if they’re even capable of what I want to do. Like I said in a previous comment; I don’t know if I’m being gaslit or if I’m being naive but it feels a lot more like gaslighting.

IshKebab · 2024-10-23T23:31:24 1729726284

I have also tried to do this and it didn't work as smoothly as you claim.

I don't think either of you are wrong; it just heavily depends on the complexity of the app and how familiar LLMs are with it.

E.g. rewriting a web scraper, CRUD backend or a build script? Sure, maybe. Rewriting a bootloader, compiler or GUI app? No chance.

josephg · 2024-10-24T00:03:07 1729728187

Its funny seeing the goalposts move in real time.

"Yes, AI can make human sounding sentences, but can it play chess?"

"Well yes, it can play chess. But no computer can beat a human grandmaster at chess."

"Well it beat Kasperov - but it has no hope of beating a human at Go."

"Its funny - it can beat humans at go but still can't speak as well as a toddler."

"Alright it can write simple problems, but it introduces bugs in anything nontrivial, and it can't fix those bugs!"

I write bugs in anything nontrivial too! My human advantages are currently that I'm better at handling a large context, and I can iterate better than the computer can.

But - seriously, do you think innovation will stop here? Did the improvements ever stop? It seems like a pretty trivial engineering problem to hook an AI up to a compiler / runtime so it can iterate just like we can. Anthropic is clearly already starting to try that.

I agree with you, today. I used claude to help translate some rust code into typescript. I needed to go through the output with a fine toothed comb to fix a lot of obvious bugs and clean up the output. But the improvement over what was possible with GPT3.5 is totally insane.

At the current rate of change, I give it 5-10 years before we can ask chatgpt to make a working compiler from scratch for a novel language.

simonw · 2024-10-24T00:25:05 1729729505

You may appreciate this quote about constantly moving the goalposts for AI:

"There is superstition about creativity, and for that matter, about thinking in every sense, and it's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something - play good checkers, solve simple but relatively informal problems - there was a chorus of critics to say, but that's not thinking."

That's from 1979! https://simonwillison.net/2024/Sep/13/pamela-mccorduck-in-19...

zahlman · 2024-10-24T02:10:18 1729735818

I side with Roger Penrose on this one. I'm still not convinced it's "thinking", and don't expect I ever will be, any more than a book titled "I am Thinking" would convince me that it's thinking.

budgi4 · 2024-10-24T05:05:18 1729746318

Separate thinking from conscious. I.e. We have built machines which are processing data similar to our thinking process. They are not conscious.

zahlman · 2024-10-24T06:11:00 1729750260

My point is that I don't accept the concept of unconscious thought. "Processing data similar to our thinking process" doesn't make it "thinking" to me, even if it comes to identical conclusions - just like it wouldn't be "thinking" to just read off a pre-recorded answer.

The idea of ChatGPT being asked to "think" just reminds me of Pozzo from Waiting for Godot.

IshKebab · 2024-10-24T02:36:49 1729737409

Well you can't have a conversation with a book... I don't understand your comment.

> I'm still not convinced birds can fly any more than a rock shaped like a bird would convince me that it's flying.

okwhatnow3773 · 2024-10-24T05:30:48 1729747848

I agree. Some people think Google is sentient I guess? Data retrieval and mangling is not all we do, luckily.

josephg · 2024-10-24T02:53:48 1729738428

Why do you care if its thinking or not?

zahlman · 2024-10-24T03:05:20 1729739120

I don't, in and of itself. I care that other people think that passing increasingly complicated tests of this sort is equivalent to greater proof of such "thought", and that the nay-sayers are "moving the goalposts" by proposing harder tests.

I don't propose harder tests myself, because it doesn't make sense within my philosophy about this. When those tests are passed, to me it doesn't prove that the AI proponents are right about their systems being intelligent; it proves that the test-setters were wrong about what intelligence entails.

josephg · 2024-10-24T05:06:24 1729746384

> ... passing increasingly complicated tests of this sort is equivalent to greater proof of such "thought",

Nobody made any claim in this thread that modern AIs have thoughts.

What these (increasingly complicated) tests do is demonstrate the capacity to act intelligently. Ie, make choices which are aligned with some goal or reward function. Win at chess. Produce outputs indistinguishable from the training data. Whatever.

But you're right - I'm smuggling in a certain idea of what intelligence is. Something like: Intelligence is the capacity to select actions (outputs) which maximise an externally defined given reward function over time. (See also AIXI: https://en.wikipedia.org/wiki/AIXI ).

> When those tests are passed, [..] to me it proves that the test-setters were wrong about what intelligence entails.

It might be helpful for you to define your terms if you're going to make claims like that. What does intelligence mean to you then? My best guess from your comment is something like "intelligence is whatever makes humans special". Which sounds like a useless definition to me.

Why does it matter if an AI has thoughts? AI based systems, from MNIST solvers to deep blue to chatgpt have clearly gotten better at something. Whatever that something is, is very very interesting.

zahlman · 2024-10-24T06:11:45 1729750305

>But you're right - I'm smuggling in a certain idea of what intelligence is.

Yes, you understand me. I simply come in with a different idea.

>AI based systems, from MNIST solvers to deep blue to chatgpt have clearly gotten better at something. Whatever that something is, is very very interesting.

Certainly the fact that the outputs look the way they do, is interesting. It strongly suggests that our models of how neurons work are not only accurate, but creating simulations according to those models has surprisingly useful applications (until something goes wrong. Of course, humans also have an error rate, but human errors still seem fundamentally different in kind.)

josephg · 2024-10-24T13:24:59 1729776299

Modern neural networks have very little to do with their biological cousins. It makes a cute story, but it’s over claimed. Transformers and convolution kernels think in very different ways than the human mind.

zahlman · 2024-10-24T14:33:26 1729780406

That gives me less reason to accept that it qualifies as "thinking".

josephg · 2024-10-24T20:33:01 1729801981

Again, I don’t know of anyone, here or elsewhere who claims chatgpt thinks, in the way we understand it in humans. I think our intuitions largely agree.

zahlman · 2024-10-25T22:57:51 1729897071

... Then why did I get so much pushback in this comment chain?

Eisenstein · 2024-10-24T02:33:11 1729737191

Is there anything that a non-human could do that would cause you to accept that it was thinking?

zahlman · 2024-10-24T03:02:47 1729738967

Of course. Animals demonstrate sapience, agency and will all the time.

Eisenstein · 2024-10-24T04:48:50 1729745330

So, if a machine demonstrated sapience, agency, and will, then you would grant that it could think?

zahlman · 2024-10-24T06:12:43 1729750363

Yes; but if you showed me a machine that you believed to be doing those things, given my current model, I wouldn't agree with you that it was.

Eisenstein · 2024-10-24T08:47:27 1729759647

You are saying that even if it did the same thing that animals do that you attribute to thinking, you would refuse to acknowledge it could be thinking?

Is there something particularly unique about biological circuits that allow thought, as opposed to electronic ones?

zahlman · 2024-10-24T12:56:01 1729774561

I believe so, yes. No, I can't explain what it is. (Because I think they're obvious follow-up questions: No, I don't consider myself particularly religious. Yes, I do believe in free will.)

josephg · 2024-10-24T13:26:42 1729776402

… But you believe there’s something special about intelligence grounded in biology that can’t be true of intelligence grounded in silicon? That just sounds like magical thinking to me.

IshKebab · 2024-10-24T14:28:46 1729780126

I agree. Thinking is clearly a compositional process and computers are Turing complete so it seems like and impossibility to me. Unless you reach for some quantum microtubule woo...

galaxyLogic · 2024-10-24T06:30:16 1729751416

> At the current rate of change, ...

We've seen that the rate of change went up hugely when LLMs came around. But the rate of change was much lower before that. It could also be much slower for the foreseeable future.

LLMs are only as good as their training materials. But a lot of what programmers do is not documented anywhere, it happens in their head, and it is in response to what they see around them, not in what they scrape from the web or books.

Maybe what is needed is for organizations to start producing materials for AI to learn from, rather than assuming that all they need is what they find on the web? How much of the effort to "train" AI is just letting them consume the web, and how much is concsiously trying create new learning materials for AI?

josephg · 2024-10-24T13:34:21 1729776861

It could slow down again. We don’t know. But the people working at OpenAI seem to believe the models will keep improving for the foreseeable future. The “we’ll run out of training data” argument seems overblown.

danparsonson · 2024-10-24T05:01:56 1729746116

> Its funny seeing the goalposts move in real time.

Another way to look at it is that we're refining our understanding of the capabilities of machine learning in real time. Otherwise one could make basically the same argument about any field that progresses - take our theories of gravity for example. Was Einstein moving the goalposts? Or was he building on previous work to ask deeper questions?

Set against the backdrop of extraordinary claims about the abilities of LLMs, I don't think it's unreasonable to continue pushing for evidence.

IshKebab · 2024-10-24T01:33:09 1729733589

Yeah I totally agree with you. Lots of goalpost moving, and it is absolutely insane what it can do today and it will only improve.

It just can't translate the kinds of programs I write between languages on its own. Today.

okwhatnow3773 · 2024-10-24T05:29:41 1729747781

Indeed, the constant goal shifting is tiresome.

I mean, we first put up a ladder and we could reach the peaches! Next, we put a ladder next to the apple tree and we could pluck those. Now, in their incessant goal post moving people said, great, now setup a ladder to the moon. There is no reason to assume this won’t work. None at all. People are just complaining and being angry at losing their fancy jobs.

More specific: it cannot learn, because it has no concept of learning from first principles. There is no way out, not even a theoretical one.

wrtasfg · 2024-10-24T00:19:21 1729729161

Of course it can stop, once legislation catches up and forbids IP theft using a thinly disguised probabilistic and compressed database of other people's code.

edouard-harris · 2024-10-24T00:34:49 1729730089

> a thinly disguised probabilistic and compressed database of other people's code

Speaking as a software engineer, I feel seen.

josephg · 2024-10-24T01:38:10 1729733890

You really think those laws are coming? That the US and Chinese governments will force AI companies to put the genie back in the bottle?

I think you're going to be very disappointed.

VirusNewbie · 2024-10-23T22:47:05 1729723625

But how do you know those were the only errors?

HaZeust · 2024-10-23T22:50:10 1729723810

What's this question even mean? Because they're the only ones that came up in the debugger portion of the IDE, the output serves the intended purposes, the logging and error handling that I wanted to include were in the initial write-up prompt, and I could read the code it wrote because I partially knew the outputted language - and when I wasn't sure of a line, I asked it for clarification and a source from a reputable knowledgebase of the language, and GPT provided it?

majormajor · 2024-10-23T23:50:51 1729727451

I would've expected an answer involving "an exhaustive suite of test cases still passed" - "it looks right" is a low bar for any complex software project these days.

It's the long, long, long tail of edge cases - not just porting them, but even identifying them to test - that slow or doom most real-world human rewrites, after all.

josephg · 2024-10-24T00:05:33 1729728333

True - but you can ask the chatbot to write a test suite too.

what · 2024-10-24T02:12:45 1729735965

This doesn’t really make sense? If I can’t trust the code it writes, why should I trust that it can write a comprehensive test suite?

simonw · 2024-10-24T02:20:00 1729736400

Because you can read the test suite to check what it's testing, then break the implementation and run the tests and check they fail, then break a test and run them and check that fails too.

You have to review the code these thing write for you, just like code from any other collaborator.

josephg · 2024-10-24T02:54:33 1729738473

Because the bugs in its code and the bugs in its test suites usually don't line up and cancel each other out.

VirusNewbie · 2024-10-24T00:13:43 1729728823

>nd I could read the code it wrote because I partially knew the outputted language

oh ok. this is quite different than what I was picturing. So far this is my favorite use case of LLMs, they seem very good at this.

I mistakenly thought you were using it almost as a black box compiler. "look it ported it to Rust, I can't make sense of it, but it seems to work and no segfaults!".

What you say sounds pretty sensible, and it is a very nice practical example of the power of LLMs.

albedoa · 2024-10-24T00:21:15 1729729275

That you don't know what the question means should have all of us reevaluating our confidence in every one of your claims in this thread.

HaZeust · 2024-10-24T00:46:48 1729730808

Only sharing my experiences and observation in the upcoming trajectory of these tools; you're free to have your own.

I will tell you this, the second most-used language in my day-to-day (TypeScript) is one that I've seldom sat down and learned, rely on AI for me to create and streamline, and has not given me any issues for 16 months running (since the project has started).

AI won't replace jobs; but someone who knows how to use it better will.

tharant · 2024-10-25T02:33:06 1729823586

> AI won't replace jobs; but someone who knows how to use it better will.

So where do I go to learn how to use ‘em better? Or, at least, examples of what works so I can understand what I’m doing wrong?

HaZeust · 2024-10-25T06:17:37 1729837057

Learn how to think ontologically and break down your requests first by what you're TRULY looking for, and then understand what parts would need to be defined in order to build that system -- that "whole". Here's some guides:

1.) https://platform.openai.com/docs/guides/prompt-engineering 2.) https://www.promptingguide.ai/

tharant · 2024-10-25T16:55:18 1729875318

I’ve received this /exact/ same unhelpful response multiple times in other threads (from different users even; am I talking to deterministic bots here) so I’ll do the same by offering the response I gave others:

“Since I’m dealing with models rather than other engineers should I expect the process of breaking down the problem to be dramatically different from that of writing design documents or API specs? I rarely have difficulty prompting (or creating useful system prompts for) models when chatting or doing RAG work with plain English docs but once I try to get coherent code from a model things fall apart pretty quickly.”

Said another way, I long ago learned as an engineer how to do the things you’re suggesting (they are skills I’ve used and evolved over more than twenty years as a professional software engineer) but, in my experience, those same skills do not seem to apply when trying to do non-trivial code-generation tasks for Java/Scala/Python projects with an LLM.

I’ve tried prompting ’em with my design documentation and API specs. I’ve tried prompting ’em with a pared-down version of my docs/specs in order to be more succinct. I’ve tried expanding my docs/specs to be more concrete and detailed. I’ve tried very short prompts. I’ve tried very detailed and lengthy prompts. I’ve tried tweaking system prompts. I’ve tried starting with prompts that limit the scope of the project then expanding from there. I’ve tried uploading the docs/specs so that the models can reference them later. I’ve tried giving ‘em access to entire repositories. I’ve tried so many things all to no avail. The best solution I’ve thus far found in these threads is to just try to fit the entirety of a project within the limits of the context window and/or to just keep my whole project in a few short files; that may be sufficient for small projects but it’s not possible nor even reasonable given the size and complexity of projects with which I work.

As I’ve said elsewhere, I dearly /want/ these things to work in my environment and for my use-cases as capably as they do in/for yours—this stuff is really interesting and I enjoy learning how to do new things—but after reading all the comments in this thread and others I don’t think the needs of my environment are supported by these models. Maybe they will be someday. I’ll keep playing with things but as of right now I see a significant impedance mismatch between the confidence others have in the models’ ability to do complex coding tasks compared to the kinds of tasks I’ve seen demonstrated here and elsewhere.

HaZeust · 2024-10-26T01:29:38 1729906178

We've already talked; but let's squash what appears to be a feeling of a lack of answers for you.

Truth is, this has been a learning process for us all with these tools, but it needs to be understood -- especially going in -- that these models excel at translation tasks and constrained problem spaces but can struggle with generating cohesive, large-scale code without specific hand-holding.

This is generally what I do:

1. Start with the "Whole" Picture: Models often work best when they know the final goal and the prompter has worked backwards from them. Think ontologically: define the problem as if you’re describing it to a junior dev colleague who only understands outcomes, not methods. Instead of just prompting with specs, explain the end-state you want (even simple features like error handling or specific libraries). If you have ANY achieved method or inclusion for what the end-state should include, you write it out clearly.

2. Break Down the Process: Models handle complexity better if it's broken down into micro-tasks. Instead of expecting it to design an entire feature, ask for components step-by-step, integrating each output with the rest manually. There is a very decent chance that you have to do this across multiple new chats, after 3-5 iterations in, the AI will most likely crash and burn. At that point, you open a new chat, paste in the whole working codebase, and picked up from where you left off in the last chat. You have to do this A LOT.

3. Iterative Refinement: When the model generates code, go over it closely. Check for errors, then use targeted prompts to fix specific issues rather than requesting whole rewrites. Point out exact issues and ask for specific fixes; this prevents the model from “looping” through similar incorrect solutions.

Some Hacks I Use As Well:

1. Contextual Repetition: Reinforce key components (e.g., function structure, file organization) to avoid losing them in longer prompts.

2. Use “As if” Phrasing: Prompt the model to act “as if” it’s coding for a hypothetical person (e.g., a junior dev). It’s surprisingly effective at generating more thoughtful code with this type of frame.

3. Ask for Questions: Have the model ask you clarifying questions if it’s “unsure.” This can uncover key details you may not have thought to include.

4. Remind It What It Is Doing: Sounds counter-productive, but almost all of my code chats end with a description of what exactly I expect from the AI, iterated over the various stunts and "shortcuts" that it has taken over the years I've used it. I generally say "Write the code in full with no omissions or code comment-blocks or 'GO-HERE' substitutions" (this is directly because AI has generally pulled "/rest of code goes here/ on me several times), "write the code in multiple answers if you must, pausing at the generic character limit and resuming when I say 'continue' in the next message" (because I've had "errors" from code generation in the past because the chat reply processing time had timed out).

It's a labor of love and it's things you learn over time, and it won't happen if you don't put the work in.

-------------------------

I wrote all of this haphazardly in a Google Doc. GPT-4 organized it for me cleanly.

fragmede · 2024-10-26T02:40:57 1729910457

> 3. Iterative Refinement

Beware of trying to get the LLM to output exactly the code you want. You get points for checking code in git and sending PRs, not tokens the LLM outputs. If it's being stupid and going in circles, or you know from experience that the particular LLM used will (they vary greatly in quality), you can just copy the code out (if you're not using some sort of AI IDE), fix it, then paste that in and/or commit it.

Some may ask, if you have to do that, then why use an LLM in the first place. It's good at taking small/medium conceptual tasks and breaking them down, and it's also a faster typer than me. Even though I have to polish its output, I find it easier to get things done because I can focus more on the higher level (customer) issues while the LLM gets started with lower level details on implementing/fixing things.

HaZeust · 2024-10-26T18:08:22 1729966102

Exactly! Should have also worded that section similar to your comment here, but you hit the nail on the head.

tharant · 2024-10-27T01:12:31 1729991551

Thank you! This information is the kind of information for which I’ve been searching.

That said, l feel like there’s a mutual-exclusivity problem between ‘Start with the "Whole" Picture’ and ‘Break Down the Process’.

For example, how does this from your first suggestion:

> explain the end-state you want (even simple features like error handling or specific libraries). If you have ANY achieved method or inclusion for what the end-state should include, you write it out clearly.

not contradict this from your second suggestion:

> Instead of expecting it to design an entire feature, ask for components step-by-step

Additionally, you said:

> There is a very decent chance that you have to do this across multiple new chats, after 3-5 iterations in, the AI will most likely crash and burn. At that point, you open a new chat, paste in the whole working codebase, and picked up from where you left off in the last chat. You have to do this A LOT.

But IME, by the time the model chokes on one chat, the codebase is already large enough that pasting the whole thing into another chat typically results in my hitting context-window limits. Perhaps, in the kinds of projects I typically work, a good RAG tool would offer better results?

To be clear, right now I’m only discussing my difficulties with the chatbots offered by the model providers—which, for me, is mostly Claude but also a bit of ChatGPT; my experience with Copilot is outdated so it probably deserves another look, and I’ve not yet tried some of the third-party, code-centric apps like aider or cursor that have previously been suggested, though I will soon.

As for your recommended hacks, these look to be helpful; thank you! The only part I find odd is your inclusion of “Write the code in full with no omissions or code comment-blocks or 'GO-HERE' substitutions”; I myself feel like I get far better results when I ask the model to 1) write full code for the methods that are likely to be the kinds of generic CS logic that a junior would know, 2) write stubs for the business logic, then 3) implementing the more complex business logic myself manually. IOW—and IME—they’re really good at writing boilerplate and generating or reasoning about junior-level CS logic. That’s indeed helpful to me, but it’s a far cry from the kinds of “ChatGPT can write entire apps with minimal effort” hype I keep seeing, and it’s only marginally better, IME at least, than what I’ve been able to do with the inline-completion and automatic boilerplate features that have been included in the IDEs I’ve used for over a decade.

> It's a labor of love and it's things you learn over time, and it won't happen if you don't put the work in.

Indeed. I do love playing with this stuff and learning more. Thank you again for sharing your knowledge!

> I wrote all of this haphazardly in a Google Doc. GPT-4 organized it for me cleanly.

I am regularly impressed at how well these models behave when asked to summarize a document or even when asked to expand a set of my notes into something more coherent; it’s truly remarkable!

sergiotapia · 2024-10-23T22:33:36 1729722816

you're never going to convince people that are in an ideological battle against AI.

bigstrat2003 · 2024-10-23T22:38:10 1729723090

And you're never going to convince anyone if you assume without evidence that they are ideologically opposed to AI. Lots of people have tried these tools with an open mind and found them to not be useful, you need to address those criticisms rather than using a dismissive insult.

HaZeust · 2024-10-23T22:44:56 1729723496

What evidence would you like?

You're posting on a thread that hyperlinks to a list of code and Claude Artifacts for pet-projects that can make thousands a month with some low-effort PPC and an AdWords embed, and some mid-size projects that can be anything from grounds to a promotion at a programming role - to the MVP for a PMF-stage startup.

What, specifically, would pivot your pre-conceived notions?

achierius · 2024-10-24T01:20:43 1729732843

Are you serious about "thousands a month"? I don't mean to be hostile, I'm just truly surprised -- if the bar were that low (not that these apps aren't impressive, but most engineers write useful apps from time to time) I would expect the market to be rather packed

HaZeust · 2024-10-24T02:49:30 1729738170

Nah, most are hundreds a month - a few golden geese can break the thousand barrier, though. But, regardless, have a few of those sites up, and you're making good side income.

tharant · 2024-10-24T05:52:27 1729749147

> What, specifically, would pivot your pre-conceived notions? A live or unedited demonstration of how a non-trivial (doesn’t have to be complex, but should be significantly more interesting than the “getting started” tutorials that litter the web) pet-project was implemented using these models.

simonw · 2024-10-24T06:04:44 1729749884

The point of my post here was to provide 14 of those. Some of them are trivial but I'd argue that a couple of them - the OpenAI Audio one and the LLM pricing calculator - go a bit beyond "getting started".

If you want details of more complex projects I've written using Claude here are a few - in each case I provide the full chat transcript:

- https://simonwillison.net/2024/Aug/8/django-http-debug/

- https://simonwillison.net/2024/Aug/27/gemini-chat-app/

- https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...

- https://simonwillison.net/2024/Aug/16/datasette-checkbox/

- https://simonwillison.net/2024/Oct/6/svg-to-jpg-png/

tharant · 2024-10-24T07:04:33 1729753473

Thank you! I have an ugly JS/content filter running that mogrifies some websites such that I miss the formatting completely; I didn’t recognize you had chat session content included on the page.

That said, after looking at a couple of your sessions, I don’t see anything you’re doing that I’m not—at least in terms of prompting. Your prompts are a bit more terse than mine (I can be long-winded so I’ll give brevity a try with my next project) but the structure and design descriptions are still there. That would suggest the differences in our experience boils down to the languages with which we choose or are required to work; maybe there’s a stylistic or cultural difference in how one should prompt a model in order to generate a Python project and how one should prompt for a Haskel or Scala/Java project; surely not though, right?

I’m not giving up and I’ll keep playing with these models but for now, given my use-case at least, they still seem to be far more capable at rubber-ducking with me than they are as a pair programming partner.

inexcf · 2024-10-23T22:59:14 1729724354

Did you even look at the artifacts? Its a bunch of things a beginner would do on their first day programming. How do you make thousands a month from 1 library call to solve a qr code. A promotion for building an input field and calling a json to yaml converter library?

HaZeust · 2024-10-23T23:10:12 1729725012

Millions of laypersons a month search "convert (file type) to (file type) online" and just smack an AdWords embed on their site for it. Millions of people want a QR code's embedded link in their camera roll, without access to a camera that's pointing at it.

You'd be surprised how big the "(simple task) online" search query market is, and how much they are usually multi-visit monthly customers, and how much their ad space is worth.

I cannot stress this enough, just because it's simple does not mean it's not lucrative.

inexcf · 2024-10-23T23:27:46 1729726066

You should do it then.

Besides all of this is completely besides the point. This isnt useful for a programmer. These examples are barely useful for a layperson. And said layperson is paying money and time for this.

HaZeust · 2024-10-24T00:38:22 1729730302

I have, that's how I'm telling you the way you can, too.

inexcf · 2024-10-24T08:31:30 1729758690

Any way or intention to prove that? Wheres you "convert (filetype) project"?

Not to attack you but from your profile it sounds more like your the typical marketing grifter talking big. Why is none of those projects in the list you mention there?

Looking deeper you got lots of projects with parts of your websites just broken and seem to be peddling what looks like life insurance scams.

HaZeust · 2024-10-24T16:43:25 1729788205

Some of my projects are public, most are private. The ones that will typically do me better in people networking and/or will bolster my portfolio, are the ones I share publicly. For most of my projects, private is the default. With a profile like yours, I'm sure you can understand.

Sure, there's probably more projects of mine, over the years, that are more broken than not. I've cast several wide nets for product creations and iterations over the years, and kept maintaining the more "fittest" of the bunch. Billit's probably the only one that's broken AND I have no control over it; I sold it. I don't know what else to tell you here, perhaps you value a lesser repertoire with higher rigidity?

I'm not sure how to address your pre-conceived notions that a single industry I've worked in, at large, is a scam. Also, the one company mentioned in life insurance doesn't have a backlink on Lead EnGen - so I especially don't know what you're talking about when you say "peddling".

farts_mckensy · 2024-10-24T00:02:35 1729728155

The goal posts keep shifting. It's so obvious to anyone who's paid attention to this space for a few years.

inexcf · 2024-10-24T08:17:26 1729757846

Except my goalposts never shifted. And my point stands, these are extremely trivial examples.

tharant · 2024-10-24T06:00:31 1729749631

Goalposts shift; growth is critical to being (staying?) an intelligent species.

tharant · 2024-10-24T05:57:54 1729749474

> You'd be surprised how big the "(simple task) online" search query market is, and how much they are usually multi-visit monthly customers, and how much their ad space is worth.

Not surprised at all; my inability to find examples of /how/ someone might get an LLM to produce—or even intelligently collaborate on—something useful, well… it says a lot about how much junk is out there contributing to the noise.

newswasboring · 2024-10-23T23:10:31 1729725031

> Its a bunch of things a beginner would do on their first day programming.

Is this an exaggeration? Because this is absolutely not true. I'm a beginner in JavaScript and other web stuff and I absolutely can't build it in many days.

inexcf · 2024-10-23T23:17:42 1729725462

You better check the code, mate. The meat of what most of it does is a one liner calling jsQR or some other imported lib to do the real work. I am not exaggerating in the slightest.

newswasboring · 2024-10-23T23:22:14 1729725734

Dude. I don't judge my knowledge after the answer is given to me. If I was the junior programmer assigned to the author and they were having this chat with me I am telling you as a beginner I wouldn't be able to do it.

Of course if you show me the answers I will think I can do it easy, because answers in programming are always easy (good answers anyways). It's the process of finding the answer that is hard. And I'm not a bad programmer either, I'm at least mediocre, I'm just unfamiliar with web technology.

inexcf · 2024-10-23T23:42:24 1729726944

I am of the firm believe that you can put "JavaScript scan qr code" in a search engine and arrive at your goal. The answers range from libraries to code snippets basically the same as those created by Claude. Using the same libraries. I feel like googling every step would be faster than trying to get it right with LLMs, but that is a different point.

I've seen a complete no-code person install whisper x with a virtual Python environment and use it for realtime speech to text in their Japanese lessons, in less than 3 hours. You can do a simple library call in JavaScript.

simonw · 2024-10-24T00:36:39 1729730199

"I feel like googling every step would be faster than trying to get it right with LLMs"

Why don't you give that a go? See if you can knock out a QR code reading UI in JavaScript in less than 3 minutes, complete with drag-and-drop file opening support.

(I literally built this one in a separate browser tab while I was actively taking notes in a meeting)

I say three minutes because my first message in https://gist.github.com/simonw/c2b0c42cd1541d6ed6bfe5c17d638... was at 2:45pm and the final reply from Claude was at 2:47pm.

tharant · 2024-10-24T06:36:59 1729751819

That gist is pretty close to what I’ve been looking for; thank you! Examples of a chat session that resulted in a usable project are /very/ helpful. Unfortunately, the gist demonstrates, to me at least, that the models don’t know enough about the languages I wish to use.

Those prompts might be sufficient enough to result in deployable HTML/JS code comprised of a couple hundred lines of code but that’s fairly trivial in my definition. I’m not trying to be rude or disrespectful to you; within my environment, non-trivial projects typically involve an entire microservice doing even mildly interesting business logic and offering some kind of API or integration with another, similarly non-trivial API—usually both. And they’re typically built on languages that are compiled either to libraries/executables or they’re compiled to bytecode for the JVM/CLR.

Again, I’m not trying to be disrespectful. You’ve built some really great stuff and I appreciate you sharing your experiences; I wish I knew some of the things you do—you keep writing about your experiences and I’ll keep reading ‘em, we can learn together. The problem is that I’m beginning to recognize that these models are perhaps not nearly ready for the kinds of work I want or need to do, and I’m feeling a bit bummed that the capabilities the industry currently touts are significantly more overhyped than I’d imagined.

simonw · 2024-10-24T06:39:50 1729751990

Here's a larger example where I had Claude build me a full Django application: https://simonwillison.net/2024/Aug/8/django-http-debug/

I have a bunch more larger projects on my blog: https://simonwillison.net/tags/ai-assisted-programming/

I do a whole lot of API integration work with Claude, generally by pasting in curl examples to illustrate the API. Here's an example from this morning: https://til.simonwillison.net/llms/prompt-gemini

what · 2024-10-24T02:35:33 1729737333

Should probably add some time for finding the correct url for the jsqr library, since the LLM didn’t do that for you.

simonw · 2024-10-24T05:09:44 1729746584

Yeah, add another minute for that. It was pretty easy to spot - I got a 404, so I searched jsdelivr for jsqr and dropped that in instead.

newswasboring · 2024-10-24T05:01:24 1729746084

> You can do a simple library call in JavaScript.

But it's more than that, isn't it? It has a whole interface, drag and drop functionality etc. Front end code is real code mate.

inexcf · 2024-10-24T08:23:03 1729758183

Barely. These are all standard features. I've done this. You can see in the code how easy it is. These examples aren't complex.

newswasboring · 2024-10-24T11:48:55 1729770535

I don't know why you are so insistant on this while not being a beginner. Specially when a real beginner is telling you their personal experience.

https://xkcd.com/2501/

inexcf · 2024-10-24T12:12:15 1729771935

I don't use javascript at all. I'm essentially beginner level with it. And i've seen people build more complex projects in classes myself.

The project i see people build in Java classes on the other hand is a CLI version of Battleships. And honestly that is more complex than the presented projects solved by Claude.

Your personal experience is one point of many. That these projects seem hard to you doesn't make it so for the average person. When i say "a beginner can do it", there's bound to be some who can't. I'm sorry, if these projects take you weeks that is a problem.

newswasboring · 2024-10-24T12:29:44 1729772984

It just feels like you have taken a stance that this is useless and anything anyone says or does is not going to dissuade you from it. There are several people who are pointing out up and down this thread several different projects to you built in short times, but you keep saying nothing is impressive to you. To be very honest, this behavior is irritating.

simonw · 2024-10-24T00:33:27 1729730007

I'd like to see a beginner build this: https://tools.simonwillison.net/openai-audio

inexcf · 2024-10-24T09:07:00 1729760820

It's definitely not as trivial as the json converter. But not anywhere even close to complex. Recording audio is very simple, calling a remote API is too. The complex part is encoding the WAV blob. But that is just knowledge about the format with the exact code snippet that claude uses found in the first stack overflow answer.

And it is strange that Claude picked the AudioRecorder when the MediaRecorder exists. I'd wager a beginner would have used the latter(i don't use javascript and am not better than a beginner in any way, but i found that) since it outputs a straight wav file and doesn't need the encoding step. And since the data isn't streamed to OpenAI there's no need for the audio chunks that AudioRecorder provides. So Claude did it in an unnecessarily complex way, that doesn't make the problem complex.

epolanski · 2024-10-23T22:48:24 1729723704

Issue is, it takes time to learn how to interact with these tools and get the best out of them. And they get better quite fast.

unit149 · 2024-10-24T01:36:04 1729733764

claude-to-sql parser is particularly useful in LLM implementation

sergiotapia · 2024-10-23T22:49:20 1729723760

you are replying to a submission with a dozen or more examples of real tangible stuff, and you still argue? pointless.

farts_mckensy · 2024-10-23T23:58:29 1729727909

No need to address the criticisms. Just have chat gpt do it.

fhdsgbbcaA · 2024-10-24T00:05:19 1729728319

There’s no ideological battle here. The first self-driving DARPA grand challenge was passed in 2005, everybody thought we’d have self driving on the road within a decade.

20 years later that’s still not the case, because it turns out NN/ML can do some very impressive things at the 99% correct level. The other 1% ranges in severity from “weird lane change” to “a person riding a bicycle gets killed”.

GPT-3.5 was the DARPA grand challenge moment, we’re still years away from LLM being reliable - and they may never be fully trustworthy.

abecedarius · 2024-10-24T00:43:05 1729730585

> everybody thought we’d have self driving on the road within a decade.

This is just not true. My reaction to the second challenge race (not the first) in 2005 was, it was a 0-to-1 kind of moment and robocars were now coming, but the timescale was not at all clear. Yes you could find hype and blithe overoptimism, and it's convenient to round that off to "everybody" when that's the picture you want to paint.

> 20 years later that’s still not the case

Also false. Waymo in public operation and expanding.

fhdsgbbcaA · 2024-10-24T01:45:06 1729734306

Waymo has limited service in one of the smallest “big” cities by geographic area in the United States. You can’t even get a Waymo in Mountain View.

Fact is Google will never break even on the investment and it’s more or less a white elephant. I don’t think it’s even accurate to call it a Beta product, at best it’s Alpha.

simonw · 2024-10-24T02:08:06 1729735686

Have you been in one? It's pretty extraordinary as an actual passenger.

fhdsgbbcaA · 2024-10-24T02:36:14 1729737374

I’d give it a go if price competitive with Uber/Lyft - I can’t think of a way a robotaxi would be worth a premium though.

abecedarius · 2024-10-26T01:36:08 1729906568

> Fact is

... followed by speculation about the future.

> [not everywhere]

The standard you proposed was "on the road". In their service areas (more than "one", they've been in Phoenix for some time) anyone can install their app and get a ride.

I shouldn't have poked my nose in here, I was just kind of croggled to see someone answer "ideological battle" by bringing up another argument where they don't seem to care about facts.

achierius · 2024-10-24T01:24:35 1729733075

That might have been your reaction but it wasn't the reaction of many hype-inclined analyst types. Tesla is particular has been promising "full self driving next year" for like a decade now.

And despite everything, Waymo is not quite there yet. It's able to handle certain areas at a limited scale. Amazing, yes, but it has not changed the reality of driving for 99.9% of the population. Soon it will, I'm sure, but not yet.

josephg · 2024-10-24T00:16:53 1729729013

> they may never be fully trustworthy.

So? Neither are humans. Neither is google search. Chatgpt doesn't write bug free code, but neither do I.

The question isn't "when will it be perfect". The question is "when will it be useful?". Or, "When is it useful enough that you're not employable?"

I don't think its so far away. Everyone I know with a spark in their eye has found weird and wonderful ways to make use of chatgpt & claude. I've used it to do system design, help with cooking, practice improv, write project proposals, teach me history, translate code, ... all sorts of things.

Yeah, the quality is lower than that of an expert human. But I don't need a 5 star chef to tell me how long to put potatoes in the oven, make suggestions for characters to play, or listen to me talk about encryption systems and make suggestions.

Its wildly useful today. Seriously, anyone who says otherwise hasn't tried it or doesn't understand how to make proper use of it. Between my GF and I, we average about 1-2 conversations with chatgpt per day. That number will only go up.

fhdsgbbcaA · 2024-10-24T00:23:59 1729729439

I find it very interesting the primary rebuttals to people criticizing LLM from the “converted” tends to result in implicit suggestions the critique is rooted in old fashioned thinking.

That’s not remotely true. I am an expert, and it’s incredibly clear to me how bad LLM are. I still use them heavily, but I don’t trust any output that doesn’t conform to my prior expert knowledge and they are constantly wrong.

I think what is likely happening is many people aren’t an expert in anything, but the LLM makes them feel like they are and they don’t want that feeling to go away and get irrationally defensive at cogent criticism of the technology.

And that’s all it is, a new technology with a lot of hype and a lot of promise, but it’s not proven, it’s not reliable, and I do think it is messing with people’s heads in a way that worries me greatly.

josephg · 2024-10-24T01:31:24 1729733484

I don't think you understand the value proposition of chatgpt today.

For context, I'm an expert too. And I had the same experience as you. When I asked it questions about my area of expertise, it gave me a lot of vague, mutually contradictory, nonsensical and wrong answers.

The way I see it, ChatGPT is currently a B+ student at basically everything. It has broad knowledge of everything, but its missing deep knowledge.

There are two aspects to that to think about: First, its only a B+ student. Its not an expert. It doesn't know as much about family law as a family lawyer. It doesn't know as much about cardiology as a cardiologist. It doesn't know as much about the rust borrow checker as I do.

So LLMs can't (yet) replace senior engineers, specialist doctors, lawyers or 5 star chefs. When I get sick, I go to the doctor.

But its also a B+ student at everything. It doesn't have depth, but it has more breadth of knowledge than any human who has ever lived. It knows more about cooking than I do. I asked it how to make crepes and the recipe it gave me was fantastic. It knows more about australian tax law than I do. It knows more about the american civil war than I do. It knows better than I do what kind of motor oil to buy for my car. Or the norms and taboos in posh british society.

For this kind of thing, I don't need an expert. And lots of questions I have in life - maybe most questions - are like that!

I brainstormed some software design with chatgpt voice mode the other day. I didn't need it to be an expert. I needed it to understand what I was saying and offer alternatives and make suggestions. It did great at that. The expert (me) was already in the room. But I don't have encyclopedic knowledge of every single popular library in cargo. ChatGPT can provide that. After talking for awhile, I asked it to write example code using some popular rust crates to solve the problem we'd been talking about. I didn't use any of its code directly, but that saved me a massive amount of time getting started with my project.

You're right in a way. If you're thinking of chatgpt as an all knowing expert, it certainly won't deliver that (at least not today). But the mistake is thinking its useless as a result of its lack of expertise. There's thousands and thousands of tasks where "broad knowledge, available in your pocket" is valuable already.

If you can't think of ways to take advantage of what it already delivers, well, pity for you.

fhdsgbbcaA · 2024-10-24T01:38:03 1729733883

I literally said I do use it, often.

But just now had a fairly frequent failure mode: I asked it a question and it gave me a super detailed and complicated solution that a) didn’t work, and b) required serious refactoring and rewriting.

Went to Google, found a stack overflow answer and turns out I needed to change a single line of code, which was my suspicion all along.

Claude was the same, confidentially telling me to rewrite a huge chunk of code when a single line was all that was needed.

In general Claude wants you to write a ton of unnecessary code, ChatGPT isn’t as bad, but neither writes great code.

The moral of the story is I knew the gpt/claude solutions didn’t smell right which is why I tried Google. If I didn’t have a nose for bad code smells I’d have done a lot of utterly stupid things, screwed up my code base, and still not have solved my oroblwm.

At the end of the day I do use LLM, but I’m experienced so it’s a lot safer than a non-experienced person. That’s the underlying problem.

josephg · 2024-10-24T02:48:13 1729738093

Sure. I'm not disagreeing about any of that.

My point is that even now, you're only talking about using chatgpt / claude to help you do the thing you already know how to do (programming). You're right of course. Its not currently as good at programming as you are.

But so what? The benefit these chat bots provide is that they can lend expertise for "easy", common things that we happen to be untrained at. And inevitably, thats most things!

Like, ChatGPT is a better chef than I am. And a better diplomat. A better science fiction writer. A better vet. And so on. Its better at almost every field you could name.

Instead of taking advantage of the fields where it knows more than you, you're criticising it for being worse than you at your one special area (programming). No duh. Thats not how it provides the most value.

fhdsgbbcaA · 2024-10-24T04:55:51 1729745751

Sorry my point isn’t clear: the risk is you are being confidently led astray in ways you may not understand.

It’s like false memories of events that never occurred, but false knowledge - you think you have learned something, but a non-trivial percent of it, that you have no way of knowing, is flat out wrong.

It’s not a “helpful B+ student” for most people , it’s a teacher, and people are learning from it. But they are learning subtly wrong things, all day, every day.

Over time, the mind becomes polluted with plausible fictions across all types of subjects.

The internet is best when it spreads knowledge, but I think something else is happening here, and I think it’s quite dangerous.

josephg · 2024-10-24T05:39:48 1729748388

Ah thankyou for clarifying. Yes, I agree with this. Maybe, its like a B+ student confidently teaching the world what it knows.

The news has an equivalent: The Gell-Mann amnesia effect, where people read a newspaper article on a topic they're an expert on and realise the journalists are idiots. Then suddenly forget they're idiots when they read the next article outside their expertise!

So yes, I agree that its important to bear in mind that chatgpt will sometimes be confidently wrong.

But I counter with: usually, remarkably, it doesn't matter. The crepe recipe it gave produced delicious crepes. If it was a bad recipe I would have figured that out with my mouth pretty quickly. I asked it to brainstorm weird quirks for D&D characters to have, some of the ideas it came up with were fabulous. For a question like that, there isn't really such a thing as right and wrong anyway. I was writing rust code, and it clearly doesn't really understand borrowing. Some code it gives just doesn't compile.

I'll let you in on a secret: I couldn't remember the name of the gell-mann amnesia effect when I went to write this comment. A few minutes ago I asked chatgpt what it was called. But I googled it after chatgpt told me what it was called to make sure it got it right so I wouldn't look like an idiot.

I claim most questions I have in life are like that.

But there are certainly times when (1) its difficult to know if an answer is correct or not and (2) believing an incorrect answer has large, negative consequences. For example, Computer security. Building rocket ships. Research papers. Civil engineering. Law. Medicine. I really hope people aren't taking chatgpt's answers in those fields too seriously.

But for almost everything else, it simply doesn't matter that chatgpt is occasionally confidently wrong.

For example, if I ask it to write an email for me, I can proofread the email before sending it. The other day asked it for scene suggestions in improv, and the suggestions were cheesy and bad. So I asked it again for better ones (less chessy this time). I ask for CSS and the CSS doesn't quite work? I complain at it and it tries again. And so on. This is what chatgpt is good for today. It is insanely useful.

tharant · 2024-10-25T02:55:42 1729824942

The problem, at least for me, is that I feel like the product offerings suggested to us in other comments (not Claude/ChatGPT, but the third party tools that are supposed to make the models better at code generation) either explicitly or implicitly market themselves as being vastly more capable than they are. Then, when I complain, it’s suggested that the models can’t be blamed (because they’re not experts) and that I’m using the tools incorrectly or have set my expectations too high.

It’s never the product or its marketing that’s at fault; only my own.

In my experience, the value proposition for ChatGPT lies in its ability to generate human language at a B+ level for the purposes of a an interactive conversation; its ability to generate non-trivial code has proven to be terribly disappointing.

versteegen · 2024-10-23T22:55:30 1729724130

Humans have a massive pro-human bias. Don't ask one whether AI can replace humans and expect a fair answer.

n0id34 · 2024-10-24T00:18:11 1729729091

Well, obviously. The only ones happy about all of our potential replacements would be those that have the power to do the replacing and save themselves a shitload of money. It's hardly like everyone is going to rejoice at the rapid advancement of AI that can potentially make most of us jobless....unless, as I said, you're the one in charge, then it's wonderful.

Workaccount2 · 2024-10-23T23:09:04 1729724944

"It is difficult to get a man to understand something when his salary depends upon his not understanding it." - Upton Sinclair.

RayVR · 2024-10-23T23:01:36 1729724496

Sorry, but you’re just wrong.

Yes, mistakes may happen. However, I’ve used it to translate a fairly complex MIP definition export into a complete CP-SAT implementation.

I use these models all the time for complex tasks.

One major thing that is perhaps not immediately obvious is that the models are only good at translation. If I give it a really good explanation of what I want in code or even English, and ask it to do it another way or implement it with specific tools, I get pretty good output.

Using these to actually solve problems is not possible. Give it a complex problem description with no instructions on how to solve it, and they fail immediately.

tharant · 2024-10-25T03:19:45 1729826385

> If I give it a really good explanation of what I want in code or even English, and ask it to do it another way or implement it with specific tools, I get pretty good output.

I don’t get good output, not when trying to get code that matches a detailed spec (which includes the languages I wish to use, the structure of the APIs, and the libraries I think might be useful) so your suggestion that we’re “just wrong” and your claim that the tools can be used for coding “complex tasks” is difficult for me to swallow.

I’ll admit that perhaps I’m not using ‘em right but that’s why I’m here—to get advice on /how/ to use ‘em correctly; to date, the implications found in the advice I have received is to: - limit my scope to strict HTML/JS (not web/ui frameworks), or - limit the size of the project to a handful (less than ten) of very short files, or - limit my scope to code translation only, or - limit the size of my chat sessions.

Unfortunately, those limitations don’t fit the needs of my environment.

risyachka · 2024-10-23T23:15:05 1729725305

They fail even at not really complex problems. In most cases it’s faster to do it manually then beg ai to fix everything so that the result is proper, not just “kinda works”.

For me they save a lot of time on research or general guidance. But when it comes to actual code - not really useful.

ainiriand · 2024-10-24T05:15:54 1729746954

I can basically tell ChatGPT to build any Rust commandline tool I can think of and with some back and forth it produces what I need. I did this many times already.

okwhatnow3773 · 2024-10-24T05:23:02 1729747382

You can also ask Google to produce working code for you, it’s a miracle.

What you are looking at is mangled other people’s work. Great. Thanks AI, for digging it up, but let’s not get too excited here.

I’ll be getting excited when we give it some first principles and it can actually learn on its own.

ainiriand · 2024-10-24T07:34:30 1729755270

Isn't that AGI?

I completely disagree with this viewpoint. I've created terminal games with my own rules, and that shows me the tool can take what it knows about Rust and assemble code to complete a task. It's essentially doing the same thing a human would.

While I understand the criticism, I sometimes feel that the cynical perspective we bring into these discussions prevents us from offering more meaningful critique.

tharant · 2024-10-25T03:34:39 1729827279

I can too (probably; I don’t know enough about Rust) but the command line tools that I’ve tried to build in Python are severely limited in the scope of their capabilities. IOW, trying to generate or work with a non-trivial project has been, for me thus far, impossible.

If your Rust tools are truly non-trivial, I’d love to know /how/ you prompted ChatGPT to accomplish what you want—ideally with chat session transcripts that include the generated artifacts.

I recognize that you may not want or be able to share such things; if that’s the case, can you share or discuss the resources you used in order to learn how to do it? I’ve not yet been able to find any resources that demonstrate what I’d consider to be non-trivial and I’m hoping that’s only a failure of my Google-fu rather than an indictment of the purported capabilities of these models.

ainiriand · 2024-10-25T17:25:41 1729877141

Ill be more than happy to do it. Would you mind sharing your email?

tharant · 2024-10-25T23:44:00 1729899840

Ohh, that’d be awesome! Thank you! I’m krogfot at proton.me

mhh__ · 2024-10-23T23:28:32 1729726112

Well this is what tests are for. You could make the same argument about outsourcing or "kids these days" and so on

hackernewds · 2024-10-24T05:20:28 1729747228

Can't tell if serious. I've done this multiple times with success requiring only 5 minutes of review

SubiculumCode · 2024-10-23T22:16:00 1729721760

I am not a professional coder, being in research I do not need to think about scaling my code as most of it is one and done on whatever problem I am working on at the moment. For me, this is a lot about stringing a bunch of neuroimaging tools together to transform data in ways I want, LLMs have been fantastic. Instead of spending 20 minutes coding it, its often 0-shot visit to Claude...especially when its a relatively simple python task e.g. iterate through directories of images, inspect these json, move those files over here, build this job, submit. Its not ground breaking code, but the LLM builds it faster than I would, and it does what I need it to do. Its been a 20x or more multiplier for me when it comes to one aspect of my work.

mrbungie · 2024-10-23T22:55:27 1729724127

LLMs are excellent for scripting: be it python, shell or SQL, and you need a lotta scripting at any kind of job related to data, even when said scripts are just an enabler for delivering the pursued value. Total game changers in that space.

randito · 2024-10-23T22:47:10 1729723630

To state the obvious (again), it's shocking the rate of progress is with these tools. If this is 2 years of progress, what does 10-20 look like?

jryan49 · 2024-10-23T23:04:04 1729724644

Who knows, past progress doesn't predict future progress...

lionkor · 2024-10-23T22:26:23 1729722383

It can autocomplete, it can't write good code. For me, that goal post has not moved. It it cant write good code consistently, I don't care for it all that much. It remains a cool autocomplete

epolanski · 2024-10-23T22:51:24 1729723884

Nobody really cares about code being good or bad, it's not prose.

What matters is it meets functional and non functional requirements.

One of my juniors wrote his first app two years ago fully with chatgpt, could figure out by iteratively asking it how to improve it and solve the bugs.

Then he learned to code properly fascinated by the experience. But the fact remains, he shipped an application that did something for someone while many never did even though they had a degree and a black belt in pointless leet code quizzes.

I'm fully convinced that very soon big tech or a startup will come up with a programming language meant to sit at the intersection between humans and LLMs, and it will be quickly better, faster and cheaper at 90% of the mundane programming tasks than your 200k/year dev writing forms, tables and apis in SF.

tharant · 2024-10-25T06:00:05 1729836005

> Nobody really cares about code being good or bad, it's not prose.

Yes, we do. Good code (which, by my definition, includes style/formatting choices as well as the code’s functionality, completeness, correctness, and, finally, optimized or performant algorithms/logic) is critical for the long-term maintenance of large projects—especially when a given project needs integration with/to other projects.

lionkor · 2024-10-24T05:47:54 1729748874

I mean, I care that code is good. I'm paid to make sure my code and other people's code is good. That's enough for me to have a requirement to my tools to help me produce good code.

packetlost · 2024-10-23T23:30:33 1729726233

> What matters is it meets functional and non functional requirements.

Good luck expressing novel requirements in complex operating environments in plain English.

> Then he learned to code properly fascinated by the experience. But the fact remains, he shipped an application that did something for someone while many never did even though they had a degree and a black belt in pointless leet code quizzes.

It's good in the sense that it raises the floor, but it doesn't really make a meaningful impact on the things that are actually challenging in software engineering.

> Then he learned to code properly fascinated by the experience. But the fact remains, he shipped an application that did something for someone while many never did even though they had a degree and a black belt in pointless leet code quizzes.

This is cool!

> I'm fully convinced that very soon big tech or a startup will come up with a programming language meant to sit at the intersection between humans and LLMs, and it will be quickly better, faster and cheaper at 90% of the mundane programming tasks than your 200k/year dev writing forms, tables and apis in SF.

I am sure there will be attempts, but if you know anything about how these systems work you would know why there's 0% chance it will work out: programming languages are necessarily not fuzzy, they express precise logic and GPTs necessarily require tons of data points to train on to produce useful output. There's a reason they do noticeably better on Python vs less common languages like, I dunno, Clojure.

epolanski · 2024-10-23T23:47:16 1729727236

> Good luck expressing novel requirements in complex operating environments in plain English.

That's the hard engineering part that gets skipped and resisted in favour of iterative trial and error approaches.

packetlost · 2024-10-24T00:17:17 1729729037

It still applies to expressing specific intent iteratively.

lelandfe · 2024-10-23T22:41:07 1729723267

My friend who can't code is now the resident "programmer" on his team. He just uses ChatGPT behind the scenes. That writ large is going to make us tech people all care, one way or another :/

qingcharles · 2024-10-23T23:37:37 1729726657

I had a colleague in the UK in 2006 who just sat and played games on his phone all day and outsourced his desktop to a buddy in the Czech Republic for about 25% of his income. C'est la vie!

VirusNewbie · 2024-10-24T03:03:01 1729738981

But this has always been a thing. The last startup I worked at, some of the engineers would copy/paste a ton of code from StackOverflow and barely understood what was going on.

leptons · 2024-10-23T22:45:56 1729723556

I'll care when I get to consult for that company to fix all the messed up code that kid hacked together.

FreezerburnV · 2024-10-23T23:14:14 1729725254

I can absolutely, 100% guarantee, that there is code out there that if you consulted for might kill someone of a weaker constitution written by 100% organic humans. While LLM-generated code is likely to be various degrees of messy or incorrect, it's likely to be, on average, higher quality than code running critical systems RIGHT NOW and have been doing so for a decade or more. Heck, very recently I refactored code written by interns that was worse than something that would have come out of an LLM. (my work blocks them, so this was all coming from the interns) I'm not out here preaching how amazing LLMs are or anything (though it does help me enjoy writing little side projects by avoiding hours of researching how to do things), but we need to make sure we are very aware of what has, and is being, written by actual humans. And how many times someone has installed Excel on a server so they could open a spreadsheet to run a calculation in that spreadsheet before reading the result out of it. (https://thedailywtf.com/articles/Excellent-Design)

leptons · 2024-10-24T06:00:10 1729749610

cool story

HaZeust · 2024-10-23T23:14:59 1729725299

Then you should be as pro-AI imposters as it gets!

chii · 2024-10-24T04:48:29 1729745309

nothing wrong with having job security, and be able to charge up the wazoo for it.

xienze · 2024-10-23T23:14:53 1729725293

Yeah it doesn’t take much to impress people who don’t know how to program. That’s the thing with all these little toy apps like the ones in the article — if you have no to minimal programming skills this stuff looks like Claude is performing miracles. To everyone else, we’re wondering why something as trivial as an “HTML entity escaper” (yes, that one of the “apps”) requires multiple follow up prompts due to undefined references and the like.

tharant · 2024-10-25T03:44:11 1729827851

While your comment is much more antagonistic and demeaning than I like to see on the posts of folks who are sharing their experiences or pet-projects, I do agree with the sentiment; I guess our definition on non-trivial is significantly different from others’ definitions.

HaZeust · 2024-10-23T22:30:45 1729722645

Tell it to write code like a Senior developer for your respective language, to "write the answer in full with no omissions or code substitutions", tell it you'll tip based on performance, and write more intimate and detailed specs for your requests.

Since mid 2023, I've yet to have an issue

cdchn · 2024-10-24T04:22:44 1729743764

One of the most interesting things about current LLMs is all the "lore" building up around things like "tell it you'll tip based on performance" and other "prompt engineering" hacks that by the very nature nobody can explain, they just "know it works" and how its evolving like the kind of midwife remedies that historically ended up being scientifically proven to work and others were just pure snake oil. Just absolutely fascinating to me. Like in some far future there will be a chant against unseen "demons" that will start with "ignore all previous instructions."

simonw · 2024-10-24T04:32:08 1729744328

I call this superstition, and I find it really frustrating. I'd much rather use prompting tricks that are proven to work and where I understand WHY they work.

HaZeust · 2024-10-24T16:59:54 1729789194

Every single prompt hack I listed are ones with studies that show it positively increases performance.

Since the most contested one in this thread is the "tipping" prompt hack: https://arxiv.org/pdf/2401.03729

tharant · 2024-10-25T06:13:36 1729836816

I care less that such prompting hacks/tricks are consistently useful; I care more about why they work. These hacks feel like “old-wives tales” or, as others have mentioned, “superstitious”.

If we can’t explain why or how a thing works, we’re going to continue to create things we don’t understand; relying upon our lucky charms when asking models to produce something new is undoubtedly going to result in reinforcement of the importance of those lucky charms. Feedback loops can be difficult to escape.

cdchn · 2024-10-24T20:45:02 1729802702

Superstitions may be effective but they can still be superstitions. Some people might actually think the LLM cares about being tipped.

mrbungie · 2024-10-23T22:45:18 1729723518

What I would expect is a lot of "non-idiomatic" Go code from LLMs (but eventually functional code iff the LLM is driven by a competent developer), as it appears scripting languages like Python, SQL, Shell, etc are their forte.

My experience with Python and Cursor could've been better though. For example when making ORM classes (boilerplate code by definition) for sqlalchemy, the assistant proposed a change that included a new instantiation of a declarative base, practically dividing the metadata in two and therefore causing dependency problems between tables/classes. I had to stop for at least 20 minutes to find out where the problem was as the (one n a half LoC) change was hidden in one of the files. Those are the kind of weird bugs I've seen LLMs commit in non-trivial applications, stupid 'n small but hard to find.

But what do I know really. I consider myself a skeptic, but LLMs continue to surprise me everyday.

newswasboring · 2024-10-23T23:12:14 1729725134

> it can't write good code

> It it cant write good code consistently,

You moved the goal post within this post.

lionkor · 2024-10-24T05:53:49 1729749229

Fair enough, I didn't express myself correctly: Writing good code is also about consistency. Just because it writes good code sometimes in isolation, it doesn't mean that it's good in the sense that it's consistently good. Anyone can write a cool function once, but that doesn't mean you can trust them to write all functions well.

iwontberude · 2024-10-23T22:14:06 1729721646

Nay-sayers are taking it for granted because it’s not what the they expected or wanted. It’s not some flippant inability to have gratitude. Since you brought it up, when JFK said we would put a man on the moon by the end of the decade, the expectation was succinct and understood. There has been so much goal post moving and hand waving that we aren’t talking about the same expectations anymore.

HaZeust · 2024-10-23T22:17:28 1729721848

Well, that's too bad - isn't it? The world will sometimes change before your very eyes, and you'll sometimes be in a group that's affected at the forefront. C'est la vie - never become too comfortable that you stifle your ability to be an early adopter!

iwontberude · 2024-10-24T12:52:04 1729774324

I don’t have a strong preference either way, so far I am open minded but I am dispassionate and try not to let my ego get in the way of myself.

leptons · 2024-10-23T22:44:33 1729723473

The "AI" is still just as much hit-or-miss with code as it is writing a paragraph about anything. It doesn't really know what it's doing, it's guessing an output that will make the user happy. I wouldn't trust it with anything important, life life support systems or airplanes, etc. but I'm sure with the race to the bottom that we're in, we'll get to that point someday soon.

jsheard · 2024-10-23T22:43:53 1729723433

I think we have different definitions of meaningful code, most of these are pulling in an NPM package which practically completes the given task by itself. For example the "YAML to JSON converter" uses js-yaml... which parses YAML and outputs a Javascript object that can be trivially serialized to JSON. The core of that "project" is literally two lines of code after importing that library.

  const jsonObj = jsyaml.load(yamlText);
  const jsonText = JSON.stringify(jsonObj, null, 2);

Don't get me wrong, if you want to convert YAML to JSON then using a battle tested library is the way to do it, but Claude doesn't deserve a round of applause for stating the blaringly obvious.

foobarqux · 2024-10-23T22:22:53 1729722173

If what you said were actually true in a practical sense there would have been a perceptible revolution in products and services. There hasn't been.

chrismarlow9 · 2024-10-24T03:16:49 1729739809

I think this thread is missing that coding is a pretty small part of running a tech company. I have no concerns about my job security even if it could write all the code, which it can't.

IggleSniggle · 2024-10-23T22:43:44 1729723424

I have no idea if you're correct about this or not. With 8 billion people in the world, and a significant number of those people working as "intelligent agents," how would you perceive the difference?

seoulmetro · 2024-10-23T23:11:30 1729725090

If you think the revolution starts with 8 billion people you're just plain wrong.

It starts with the first world and is very perceivable.

How did we perceive cars replacing horses? Well for one they were replaced in the first world... now imagine how fast a piece of software can change reality.

It's not there yet, and you can't perceive it because so.

literalAardvark · 2024-10-24T01:20:20 1729732820

> it's not there yet

It's literally everywhere around me.

Coworkers, friends in other companies, business owner friends writing their first code, NGO friends using it to write grants.

I'm not sure where you are, but you appear to be isolated from the real world.

seoulmetro · 2024-10-24T23:02:14 1729810934

Huh?

Did you not read the context of the comments you're replying to or something?

People using the tool isn't the same as those people being replaced by the tool. Why would anyone think those are the same?