Hacker News new | past | comments | ask | show | jobs | submit login
An example of LLM prompting for programming (martinfowler.com)
546 points by mpweiher on April 18, 2023 | hide | past | favorite | 261 comments



The article shows everything that works for this approach. But it's a bit disingenuous. At the end:

> Once this is working, Xu Hao can repeat the process for the rest of the tasks in the master plan.

No, he can't. After that much back and forth and getting it to fix little things where it gives responses with the full code listing again, he would have easily hit the token limit (at least with any chat LLM capable of this quality code and conversation - ChatGPT). The LLM will start hallucinating the task list, the names of functions it wrote earlier etc. and the responses would get less and less useful with more and more "this doesn't work, can you fix X".

So anyone following this approach will hit a footgun after task 1.

For anyone that really wants to follow this approach, the next step is to start a new chat and copy/paste the inital requirement prompt, put the task list in there, any relevant code, adjust the instruction (ie "help me with task 2") and go from there.

It is of limited utility though. By step 3 (or even 2) you end up with so much code that you're at the token limit anyway and it can't write code that fits together.

Where I've found ChatGPT 4 useful is getting me going on something, providing boilerplate, and unblocking me.

If you don't know how to approach a problem like the "awareness layer" (like I didn't before reading the post), you can get a great breakdown and starting point from ChatGPT. Similarly, if you're not sure how to approach that view model, or write tests etc. And if you want a first draft of code or tests.

All that said, I'm looking forward to much larger and affordable token limits in future.


Your experience matches mine closely. I've had ChatGPT-4 do great and then it just gets confused after a while. I can literally tell it "task X is done" and it'll apologise and show me a list of tasks where X is still not done - this is clearly not just a context window issue, as I have repeated variations of my statement over and over in the same session and the issue persists.

I have ended up using it the same way you have - it's honestly the best anti-procrastination tool I've ever used because I can tell it my intentions, what I've thought of so far... and it'll spit out a list of bite-sized chunks that get me going. I find myself looking forward to telling the AI I've completed a task.

Similarly, if I'm facing a tricky design decision, I find that just writing it out for ChatGPT is extremely helpful for clarifying my thought process. I actually used to do this conversational decision making process in a text editor long before ChatGPT, but when I know there's an AI on the other end my thinking becomes clearer and more goal-oriented. And unlike talking to myself or a human friend, it's happy to just say "well if these are your concerns, let's start HERE and then see what happens".


Good rule of thumb with ChatGPT: you can’t exit loops. Once you’ve gone A > B > A, your best move is to start a new chat. Even then it may reproduce and you should do some similar but different task. Remember that it’s a prediction engine, weighing heavily on the existing prompt. So you say B again, or B1 and it’s like, I know what to do! A! Cause last time was A->B so let’s do it again.

In your case this would be “[]Task1”, “Task1 is done”, “[]Task1”, [here is where you start a new chat or fix it yourself if possible].


Instead of starting a new chat, why not change the prompt higher up in the conversation with the relevant detail that you have gained through the responses?


I think that would work and be better in some cases. The thing is you want to change things up, lose some context. I may not be able to go far enough back without losing progress to do that. But I could copy just the current state of the task (last response) to a new chat.


Ooh! That's a really good point - ChatGPT is effectively rubber-ducky as a service =)


This is exactly how I've been explaining LLM tech to my "non-geek" friends and family. I start by explaining rubber ducking, and how I now use chatgpt as a more advanced version of the process.


Hmm, I also use ChatGPT as an anti-procrastination tool and task manager, and it's never made a mistake with keeping track of my task list (except that when it sums the estimated times of subgroups of tasks, sometimes those sums are wrong).

Note that it outputs my updated task list every time I add or remove a task (I only asked it to do that one time), so even if old messages go outside of the context window, it's not a big deal because the full updated state of the list is output basically every other message.


Interesting. What's your prompt?



Here's a web app I found recently that should work way better for you. Idk what model it uses (its also free, it feels like chatgpt3.5, so I guess they are funding it out of pocket?)

https://goblin.tools/


You iterate on your plan after it is generated step by step. You go and edit the prompt chain you started working on step 1 on, and modify it to start working on step 2 (including any ideas or fixes you have identified while implementing step 1. Repeat until complete.

You can still absolutely hit the context limit, but you are far less likely to do so if you go back and start a new prompt chain for each different thought process you are going through with it.


Great idea. But does it get hard to navigate back to something in older chat histories though?

I find a new separate chat with the revised initial prompt to be easier.


I’ve been using another call to an LLM to write or rewrite code that is separate from the main “conversation”.

What I mean is that I’ve got a dialog going with an LLM and I’ve trained it to call a build() function with instructions that then returns the function, with the text of the function kept out of the dialog with the main thread.


It's great to see that there's now a term for the type of prompting, “generated knowledge”. I've been experimenting with this technique since the beginning, and I've noticed a significant improvement in version 4. The process involves outlining the project, creating tasks, and feeding them back to chatGPT as you progress. This approach has helped me complete projects that would have otherwise taken me much longer to finish.

It's also useful for creating practical tutorials. While there are plenty of tutorials available online, sometimes you need guidance on a specific set of technologies. By using generated knowledge prompts, you can get a good outline and tasks to help you understand how these technologies interact.

One thing to keep in mind is to avoid derailing the conversation with questions that are not relevant to the core tasks. If you get stuck on something and need to debug, it's best to use a separate conversation to avoid derailing the project's progress and the allucinations & forgettingness


Something must be wrong with me. I could never get anything useful from Martin Fowler's writings, and coincidentally I cannot get any functional code out of ChatGPT. Even the boilerplate it produces for me needs to be corrected. I still use chatGPT to produce examples of abstract things but was not able to get any working code that matches concrete problems or even compiles.


Are you using the GPT4 model? There's a very significant improvement between 3.5 (the free one) and 4.


I am supposedly on GPT4 via GPT+. I try using it for boilerplatey things, like terraform, and the results are simply incorrect. It seems more helpful in providing examples, even for some far more complex tech - like rust code.


Does it say GPT-4 at the top of the screen?


It does. One example of incorrect TF got produced is splitting the dynamo table, and it's gsi into two distinct resources


Absolutely, and same here. I've done multiple tools that would have taken 2-3 days each in 2-3 hours each.

> One thing to keep in mind is to avoid derailing the conversation with questions that are not relevant to the core tasks. If you get stuck on something and need to debug, it's best to use a separate conversation to avoid derailing the project's progress and the allucinations & forgettingness

Definitely. Great advice.

Another tip: don't bother asking it to fix small things. Just mention you fixed it in the next reply and move on.


he would have easily hit the token limit

That was my first question. Do all these tasks fit within a 4K or 8K buffer?

Wouldn't be surprised, though, if it works in a 32K GPT4 token limit. Amazing things are possible.


This may be a dumb question, but do you know if this something that LLM frameworks like LangChain can (or others) can help with? Aren't they designed to help with more complex prompts/logic/outputs? Or will they run into the same token limits?


I completely agree with this take.


If somebody thinks an LLM is coming for everybody's coding job, I'd say this article is a great counterpoint just for existing.

You could tell someone from decades ago that we now use a very high level language for complex tasks in complex code ecosystems, never even mention AI, explain that the parser is really generalist-biased, and this article would make perfect sense as an example of exemplary code by a modern coder working for a living.

That's code in there, the stuff Xu Hao is writing.

And also, that's not even getting into the debugging part... Which will be about other code, that looks different.


Yeah, I think there's a "stone soup" effect going on with AI.

It's the same sort of thing you see happening with the customers of psychics. People often have poor awareness of how much they're putting in to a conversation. Or it's a bit like the way Tom Sawyer tricks other kids into painting the fence for him. For me a lot of the magic here is in knowing what questions to ask and when the answers aren't right. If you have those skills, is pounding out the code that hard?

The interesting part for me is not generating new bits of code, but the long-term maintenance of a whole thing. A while back there was a fashion for coding "wizards", things that would ask some questions and then generate code for you. People were very excited, as they saw it as lowering the barrier to entry. But the fashion died out because it just pushed all the problems a bit further down the road. Now you had novice developers trying to understand and improve code they weren't competent to write.

I suspect that in practice, anything a person can get a LLM to wholly write is also something that could be turned into a library or framework or service or no-code tool that they can just use. That, basically, if the novelty is low enough that an LLM can produce it, the novelty is low enough that there are better options than writing the code from scratch over and over.


I mostly agree except one critical detail: LLMs are the low code/no code service. You literally tell them what you want and if they’re fine tuned on the problem domain, you’re all set. Microsoft demo’d the office 365 integration and if it works half as well in practice they’ll own the space as much as they have in 1997.


Maybe they will be, but that's not proven yet. We'll see! If anything, the article we're looking at suggests that the "tell them what you want" step is not obviously much less rigorous or effortful than coding. Tuning could make the difference, or it could be one of those things that produces better demos than results.


One strange coincidence with the emergence of ChatGPT is that at almost the exact same time, Google became practically unusable as a search engine. Like at least an order of magnitude worse.

People used to use Google the same way they use ChatGPT. They would ask a question in plain English, and get sent back a list of relevant links to blog posts, articles, stack overflow, or whatever that had answers to their questions, including example code.

Sometimes that code or information was outdated or completely wrong and sometimes it was too basic to be useful, or just the code-generated docs.

Google has been getting gradually worse over the years due to spam, algorithm gaming, and ads, but circa late November 2022 it became practically worthless.


Great points (and after checking your user name, I’ve been nodding my head to posts of yours for about a decade now).

This is a bot tangential - your reference to stone soup is a wonderful example of the information density possible with natural language. And all the meaning and story behind the phrase is accessible to LLMs.

I’ll have to start experimenting with idiom driven development, especially when prompt golfing.


I believe the Model Driven Architecture fad (https://en.wikipedia.org/wiki/Model-driven_architecture) is a better analogy than wizards. Back then the holy grail of complete round trip UML->code->UML didn't get practical enough to justify the effort.


Definitely also a good analogy. But MDA at least was about supporting users in understanding what's going on. Wizards and "AI" both are about users avoiding the work of understanding.


The problem is that it's not quite code. It's almost code, but without the precision, which puts it into a sort of Uncanny Valley of code-ness. It's detailed instructions for someone to write code, but the someone in this case is an alien or insane or on drugs so they might interpret it the way you meant it or they might go off on some weird tangent. You never know, and that means you'll need to check it with almost as much care as you'd take writing it.

Also, having it write its own tests doesn't mean those tests will themselves be correct let alone complete. This is a problem we already have with humans, because any blind spot they had while writing the code will still be present for writing the tests. Who hasn't found a bug in tests, leading to acceptance of broken code and/or rejection of correct alternatives? There's no reason to believe this problem won't also exist with an AI, and they have more blind spots to begin with.


I think often of the adage "it's harder to read code than write it". GPT gives you a lot to read. definitely a better consultant than coder imo. I've also had GPT write entirely false things, then I say "isn't that false?" and it says, "yes sorry about that" . very uncanny


And the code that GPT does write, if it is even close to correct, must be code that exists in many places already, and usually (like the case of so much react code) doesn’t need to exist at all.


The opposite might be true, and here’s why - 1) by using English as spec, the barrier of entry has gone lower, 2) LLMs can also write prompts and self introspect to debug.


I think English as a spec actually makes the barrier of entry higher, not lower. Code itself is far easier to understand than an English description of the code.

To understand an English description of code you already have to have a deeper understanding of what the code is doing. For code itself you can reference the syntax to understand what's going on.

The prompt in this case is using very technical language that a beginner will have no idea about. But if you gave them the code they could at least struggle along and figure it out by looking things up.


Exactly. So much that often with a tricky problem or discussion, I ask the person to sit down with me and just code the most relevant parts without implementation, just the type signatures.

That is so much more productive because it immediately removes or highlights all the ambiguity which you can sugar coat with English but not in a programming language.


This reminds me of rubber ducking[0] in how it necessitates a certain understanding. If one is able to explain it in plain English it's because it is understood.

[0] https://en.wikipedia.org/wiki/Rubber_duck_debugging


Yes but LLMs can also be used by laypeople to explain the issue in plain English too. That’s the problem. Not that LLMs would need human to guide the debugging process anyways (at least in a few years)


You still have the same problem... You cannot describe a technical field with plain English. If you did so the semantics would be incorrect. There is a reason jargon exists.

The first two paragraphs alone are absolutely chock with terms that would not be easily explained to a layperson:

"The current system is an online whiteboard system. Tech stack: typescript, react, redux, konvajs and react-konva. And vitest, react testing library for model, view model and related hooks, cypress component tests for view.

All codes should be written in the tech stack mentioned above. Requirements should be implemented as react components in the MVVM architecture pattern."

What is every library in that list? What is a model? What is a view model? What is a hook, component test, view, MVVM, etc?

If a layperson could understand explanations for all these things then they would not be a layperson.


> by using English as spec, the barrier of entry has gone lower,

I'm not sure that is true. The level of back and forth and refinements needed indicate to me that the "English" used is not the normal language I use when talking to people.

It's almost like a refined version of cucumber with syntax that is slightly more forgiving.

Maybe I'm being a codger, but LLMs seem (at least for now) far better for summarizing and giving high level overviews of concepts rather than nailing precise code requirements.


> It's almost like a refined version of cucumber with syntax that is slightly more forgiving.

I don't know if "cucumber" is autocorrupt or an actual non-vegetable thing; can you clarify?


Not a typo.[0]

In the 00s/early 10s, software went through a fad phase where people earnestly thought that by implementing Gherkin frameworks like Cucumber, you'd be able to hand off writing tests to "business people" in "plain English." It went about as well as you'd expect.

[0] https://cucumber.io/docs/gherkin/


Thanks!

Despite that period being when I finished my Software Engineering degree, got my first job, and then attempted self-employment, I'd never heard of it before.

Looking at the book titles — "Cucumber Recipes" in particular — even if I had encountered it, I might have assumed the whole thing was a joke.


I still think the basic idea of Cucumber and similar tools is sound. It just doesn't match how 99% of companies operate.

On the other hand, many developers write shitty tests where it's absolutely unclear what the test is even trying to achieve, so trying to find some sort of framework which tries to forcibly decouple the what of the test from the how maybe isn't the worst idea.


That's exactly what I tried to do with this framework:

https://github.com/hitchdev/examples

Rather than trying to force your testers and stakeholders to adapt to the DSL, the templated story->documentation generation lets the dev or tester adapt the DSL to whatever the stakeholders want to see while keeping the story strictly about behavior.


https://cucumber.io/

That "did they actually mean that or was it autowrong?" feeling is going to get worse I fear.



But you can't determine if a statement is true by simply reading more words.

It's also not efficient for doing higher level work. There was a time before we had algebra where people were still expressing the same ideas but the notation wasn't there. Mathematics was expressed in "plain language." It's extremely difficult to read for us. For mathematician's of the time there was no other way to explain algorithms or expressions.

For simple programs I have no doubt that these tools enable more people to generate code.

However it's not going to be helpful for people working on hypervisors, networking stacks, operating systems, distributed databases, cryptography, and the like yet. For that you need a more precise language and an LLM that can reason about semantics and generate understandable proofs: not boilerplate proofs either -- they have to be elegant so that a human reading them can understand the problem as well. We're still a ways from being able to do that.


Arguably reading code can’t lead to definitive conclusions about its bug-free-ness


Reading and proving a spec can though. LLMs are in principle capable of doing that. (If your objection is that the spec might have bugs then "bug free" is subjective and nothing at all can ever lead to definitive conclusions about it)


Precisely! And neither can generating a handful of unit tests. As EWD would say, they only prove the existence of one error. Not that there are no errors.

If we want more programs that are correct with respect to their specifications we need to write better, precise specifications… not wave our hands around.

However for a lot of line-of-business tasks we’re generally fine with ambiguous, informal specifications. We’re not certain our programs are correct with respect to the specifications, if we had written them out formally, but it’s good enough.

I think most businesses that are writing software that needs to be reliable and precise are not going to benefit from these kinds of tools for some time.


This is true in aerospace software. Lots of process, lots of specification, lots of verification. I wouldn't want to say that GPT-seque tools would be useless here, but I really don't see them offering the same kind of magic leverage that they might offer on some other projects.

And vice-versa! Most software projects do not benefit from the rigor used in aerospace, because it's just not needed, and would be a waste of time.

I am definitely seeing ways that GPT tools could speed up some aerospace work, but we need to be really really sure that things are being done correctly... not just mostly correct, or seemingly correct.


> LLMs can also write prompts and self introspect to debug.

Why should we assume that won't lead to a rabbit hole of misunderstanding or outright hallucination? If it doesn't know what "correct" really is, even infinite levels of supervision and reinforcement might still be toward an incorrect goal.


To which the normal response[0] is: that's just like humans.

Of course, it's still bad that humans do it; but despite the scientific method etc., even successful humans often work towards an incorrect goal.

[0] I am cultured, you're quoting memes, that AI is just a stochastic parrot: https://en.wikipedia.org/wiki/Emotive_conjugation


But it's not just like humans. For one thing it's built differently, with a different relationship between training and execution. It doesn't learn from its mistakes until it gets the equivalent of a brain transplant, and in fact extant AIs are notorious for doubling down instead of accepting correction. Even more importantly, the AI doesn't have real-world context, which is often helpful to notice when "correct" (to the spec) behavior is not useful, acceptable, or even safe in practice. This is why the idea of an AI controlling a physical system is so terrifying. Whatever requirement the prompter forgot to include will not be recognized by the AI either, whereas a human who knows about physical properties like mass or velocity or rigidity will intuitively honor requirements related to those. Adding layers is as likely to magnify errors as to correct them.


> But it's not just like humans. For one thing it's built differently

I'm referring to the behaviour, not the inner nature.

> in fact extant AIs are notorious for doubling down instead of accepting correction.

My experience suggests ChatGPT is better than, say, humans on Twitter.

I've had the misfortune of several IRL humans who were also much, much worse; but the problem was much rarer outside social media.

> Even more importantly, the AI doesn't have real-world context, which is often helpful to notice when "correct" (to the spec) behavior is not useful, acceptable, or even safe in practice.

Absolutely a problem. Not only for AI, though.

When I was a kid, my mum had a kneeling stool she couldn't use, because the woodworker she'd asked to reinforce it didn't understand it and put a rod where your legs should go.

I've made the mistake of trying to use RegEx for what I thought was a limited-by-the-server subset of HTML, despite the infamous StackOverflow post, because I incorrectly thought it didn't apply to the situation.

There's an ongoing two-way "real-world context" miss-match between those who want the state to be able to pierce encryption and those who consider that to be an existential threat to all digital services.

> a human who knows about physical properties like mass or velocity or rigidity will intuitively honor requirements related to those

Yeah, kinda, but also no.

We can intuit within the range of our experience, but we had to invent counter-intuitive maths to make most of our modern technological wonders.

--

All that said, with this:

> It doesn't learn from its mistakes until it gets the equivalent of a brain transplant

You've boosted my optimism that an ASI probably won't succeed if it decided it preferred our atoms to be rearranged to our detriments.


> I'm referring to the behaviour, not the inner nature.

Since the inner nature does affect behavior, that's a non sequitur.

> we had to invent counter-intuitive maths to make most of our modern technological wonders.

Indeed, and that's worth considering, but we shouldn't pretend it's the common case. In the common case, the machine's lack of real-world context is a disadvantage. Ditto for the absence of any actual understanding beyond "word X often follows word Y" which would allow it to predict consequences it hasn't seen yet. Because of these deficits, any "intuitive leaps" the AI might make are less likely to yield useful results than the same in a human. The ability to form a coherent - even if novel - theory and an experiment to test it is key to that kind of progress, and it's something these models are fundamentally incapable of doing.


> Since the inner nature does affect behavior, that's a non sequitur.

I would say the reverse: we humans exhibit diverse behaviour despite similar inner nature, and likewise clusters of AI with similar nature to each other display diverse behaviour.

So from my point of view, that I can draw clusters — based on similarities of failures — that encompasses both humans and AI, makes it a non sequitur to point to the internal differences.

> The ability to form a coherent - even if novel - theory and an experiment to test it is key to that kind of progress, and it's something these models are fundamentally incapable of doing.

Sure.

But, again, this is something most humans demonstrate they can't get right.

IMO, most people act like science is a list of facts, not a method, and also most people mix up correlation and causation.


It’s like when you continually refine a Midjourney image. At first refining it gets better results, but if you keep going the pictures start coming out…really weird. It’s up to the human to figure out when to stop using some sort of external measure of aesthetics.


I mean sure if the world were to run on basic code. Perhaps wordpress developers may feel slightly threatened by even that is well above all examples of a"i" code i've seen.


English as a spec is incredibly "fuzzy", there are many valid interpretations of intent. I don't think that can be avoided?


It can't. Legalese is an attempt to do so, and it's impenetrable by non experts and still frequently ambiguous.


But there's still going to have to be a human who has the ability to form a mental model of the thing that's needing to be implemented. Functionally and technically. The results of the LLM will vary depending on the level of know-how the human instructor has.


Exactly, I actually liked the systematic approach in the article, but it seemed pretty labor-intensive and ... not that much different from other types of programming


To me, that's the whole point of this. I think it is directly analogous to the jump between assembly and higher level compiled languages. You could have said about that, "it still seems pretty labor intensive and not that much different than writing assembly", and that's true, but it was still a big improvement. Similarly, AI-assisted tools haven't solved the "creating software requires work" problem. But I think they're in the process of further shifting the cost curve, making more software possible to make.


‘Artists' jobs are safe because AI is bad at hands.’


Artists' jobs are safe in part because they can also use AI, and most already use relevant ecosystems that now incorporate AI.

Consumers who can operate AI for clip art purposes are simply still part of the same non-artist-paying demographic they always were.

Same with code


As farmers' jobs were safe because farmers can use farming tools.

These arguments don't track even vaguely. You are doing the equivalent of analyzing the future of solar power by assuming solar will cost the same in 10 years as it does today, and that each new watt of solar is matched 1:1 with new units of demand. Neither of these are sensible.

It may be that ML code tools never displace many people, or even that they supercharge demand, but you don't get to justified conclusions by assuming the future is just the present but with a bigger UNIX timestamp.


Industrialization has made farming tools incredibly complex, so I believe the statement "farmers' jobs were safe because farmers can use farming tools" is correct. You still need a farmer to farm, but you now need less manpower to farm. The specialist is secure while the untrained laborer is at risk.


And yet, despite there being more complex machinery in farming there is also much more untrained labor. People just don’t notice it because the untrained labor (millions of people) that plant and harvest are an underclass that does not speak their language or associate with them.


Sadly I don't think this is true for art:

https://restofworld.org/2023/ai-image-china-video-game-layof...

I really hope it doesn't end up being the same with code :|


Yeah, even if ChatGPT could perfectly understand the prompts you'd still run into major issues with token limits. I tried to get it to rebuild a single page for me (to move from one UI framework to another) and I couldn't fit the existing code in the token limit. I might be able to get it to do a chunk of the initial work for a greenfield project if I perfected the prompts, but it's structurally incapable of maintaining existing code.


I was thinking that draft of the master plan ... you can't really just write it up this clear and easy.

Overall, I don't think a 95% auto pilot GPT model would provide more efficiency than a 80% one.


Except you now have a way “upwards” from an abstraction POV. Regular code is severely limited and highly surgical, by design. This is not.

All these abstraction layers were invented to serve old style manual coders. Why bother explaining in great detail about “Konva” layers and react anymore? Give it a few years and let it finetune on IT tech and I see this being reduced to “I want whiteboard app with X general characteristics” at which point I’d no longer speak about “programming”.


That "upwards" excludes a lot of relevant systems design logic that won't go away though, insofar as it is abstraction ad infinitum in the direction of fewer-relevant-details.

What'll happen is, details will continue to be relevant as tastes adjust to the new normal.

Like for my work, today, React is enterprise-ready, which is not good for me. It means it will likely dip my projects in unnecessary maintenance costs as compared to another widget of its type that does what I want in a lightweight manner. When I troubleshoot something of React's complexity, even my prompts will likely need to be longer.

But also, that's just one component of one component. And you have to experience this stuff in the first place, to know that you should pay attention to these details and not those other ones, for a given job, for a given client, in a given industry, with given specs.

So, if I was able to wave my hands I'd simply have all the problems I had back when I was a beginner. Ergo, it comes back to the clip art problem: Being able to buy clip art never made anyone a designer. But it made a lot of designers' jobs way easier.

We are simply regressing toward the mean with regard to programming. It was never about computers in the first place, never so concerned with syntax.

Anyway, back to browsing my theater program...


Fair enough, but don’t we abstract “upwards” all the time? Assembly won’t go away, but do you deal with it?


For one, assembly ceases to be a relevant detail and is replaced by other relevant details.

So, I can't code fast games in a 1984 workplace, currently, being too out of touch with assembly on a given chipset. But I also can't wave my hands at an LLM and expect a modern, fast game of the desired quality to code itself. (Even though a clip art-style result is possible, the requirements are always going to be special details)

The upwards direction example is also interesting because it's foundational to the cognitive functionality of one of the Jungian personality types. But other personality perspectives also apply to coding, which means in part that the directional, metaphorical-abstraction view can effectively be a blind spot if we map it as the preferred view on outcomes.

The most common blind spot for this personality involves questions of relevant details, and their intersection with planning for yet-unknowns. There is a tendency to hand-wave which ends up being similar to prophetic behavior. Jung called this the "voice in the wilderness" noting that it can easily detach from sensibility (rationality) by departing from life details. Kind of interesting stuff.

(Ni-dominant type)


Now you got me on the edge of my seat. What is this personality type?


Ni-dominant. It exists nowadays in various post-Jungian models, many of which are really fascinating, having fleshed it out a lot.

The opposing function to Ni is Se, which creates a dichotomy of planning/foreseeing vs. doing/performing. The functions oscillate as a kind of duty cycle, so a lot of sages out there have hobbies as musicians, stage magicians, etc.

This dichotomy also effectively shuts out detail memory for context, dealing mostly with present vs. future. Even nostalgia is often ignored on the daily. So a Ni-dom will usually describe their memory as pattern-based, gestalt, more vague or general, etc.


I couldn't quite tell if you found a beautiful way to insult me, but it is fascinating indeed. I am hand wavey and I understand its failure modes quite well, unfortunately. It's cool to talk about it at this level of abstraction.


No insult intended... I don't really know how much it applies in your case, but since you really took on that viewpoint, that's when the personality theory side of me goes, "well if this is a favored viewpoint then there IS this idea about the population that favors this viewpoint" :-) And thoughts about GPT are generally crafted from general personality positions, in the absence of other relevant self-development experience.

I agree, it's cool stuff


I would like to subscribe to your newsletter.

Even if approximately 75% of that sailed right over my head.


Best I can do is RSS!


There's an unfortunately common take on AI that goes basically like this:

"I tried it and it didn't do what I wanted, not impressed."

My suggestion is to tune out the noise and really try experimenting with these tools – and know that they're rapidly improving. Even if ultimately you have criticisms or decide one way or another, at least really investigate them for your own use-cases rather than jumping on a bandwagon that's either "AI is bad" or the breathless hype-machine at the other end.


I agree it's a good idea to take a moderate approach. The hype that LLMs are going to replace SWEs is clearly just that, hype, if you've done any real work trying to get GPT4 to give you the code you want. But it's also clearly a very useful tool. I think it'll absolutely destroy Stack Overflow.


I am very critical of the LLM hype, but the threat to stackoverflow is evident. Like stackoverflow, I never write code verbatim that comes from even GPT4. I frequently find issues in the output, as the code I write is generally very context-specific. However, I find the back-and-forth with interesting tidbits of info dropped here-and-there amounts to something like rubber duck debugging on steroids.


> destroy Stack Overflow

It'll be interesting to see how future training data is sourced.


You simply need the system to train itself on its own interactions, like how search engines improve results by counting clicks.


I'm not wondering about how the system will determine what's most helpful but instead determining what's even "correct". A model will learn what's "correct" from Stack Overflow by finding accepted or highly-voted answers but when it can't find such content anymore (in this case because Stack Overflow is hypothetically gone) then what would even exist to generate these discussions to be used as training data?

Github, per the sibling comment, is a good example because projects will have issues (tied to the individual repository of source code to be seen as a working implementation of the idea) which will be where such discussions happen.


When Google search became important, people structured their information so that Google could best index it. When AIs become important in the same way, people will start to structure their information so that a particular class of AI can best index it. If that involves API documentation, perhaps there will be a standard format that AIs understand the best.


The difference is that folks had an incentive to make their pages easily indexable: drive page views.

With becoming training data the incentive to the creators is a little less clear


Yea right now sure. Later, there will be a clear incentive. Or not, if AI fails to become useful.


look the condition of artists, haha :(


Those topics that AI replaces the forums for won't need discussion. People won't be confused about that thing because the coding AI knows the details of it. Soon that'll be most syntax questions, soon simple to mid-level algorithms, etc.

People will move on to higher-level questions.


Github would be my first guess.


That does seem like a likely option. Discussions on issues alongside the actual working (and not working) code.


Nobody who professionally designs and writes software AND has used LLM code generation tools sees this as a drop-in replacement for developers, generally, anytime soon. That stance is for overeager, credulous enthusiasts and doomsdayers jumping to conclusions.

Similarly, nobody who professionally designs and creates complex art products sees this as a drop-in replacement for commercial artists anytime soon. That stance is for people dazzled by their new image-generation superpower who don't know how little they know about professional creative work.

I doubt the markets for utility-grade code work (e.g. customizing existing WordPress themes) or low-effort, high-volume creative assets (template-based logos, lightly customized game sprites) will survive. They're still real people with lives and families and medical bills and mortgages and we really ought to get serious about worker protections in this country. Seriously.


In a few years, as mid-level cognitive tasks get automated by LLMs, resulting in elimination of some percent of well-paying white-collar jobs, there will be economic dislocation and social disruption.

Oligarchic capitalist societies with a hypocritical philosophy of free market economics (such as the USA) will experience social unrest and civil strife.

In the meantime, social-democratic societies that have effective governance and can grow their safety net with universal basic income will be advantaged in this new economic order. Thinking Scandinavian and some Asian economies.

The geopolitical balance of power will shift toward stable societies that are able to make the conceptual leap to UBI. Others who follow the primitive fantasy of free market economics will crumble and get left behind.

At least that is how it looks right now.


Maybe. I don't see any reason to assume the US won't continue to successfully a) escalate police power to keep citizens under control, b) continue giving people just enough crumbs to keep them from getting violent, and c) propagandize the hell out of the idea that societal mismanagement is a matter of personal responsibility. China has done pretty well while clamping down on civil liberties, and the US has always been better at obfuscating and spinning similar tactics to be more palatable to its citizens. And you only need to look at the rust belt to see what happens when US business leadership decides to follow the cheaper option.

Though, regardless of ensuing social unrest, the fact that the people involved are human beings should be enough to not treat them like used condoms when computers figure out how to do their jobs. Should be.


> The hype that LLMs are going to replace SWEs is clearly just that, hype

LLMs cannot replace anyone, but it is clear that engineers which master LLMs usage might multiply their productivity by a lot.

The question is: If one LLM assisted engineer can work 10x faster, will companies reduce their engineer staff by 90%?


I've worked at far more companies with miles of product idea backlog we never get to than ones with nothing for engineering to do.

Now product will be able to use an LLM to come up with feature proposals and design docs even faster! :o

So: are you working at a company where engineering is a cost center or a revenue center? The latter wants to get more done at the same cost much more than it wants to just cut spend.


I'm working at a company where we perpetually don't have enough engineers and we've learned (the hard way) that adding more tends to make the problem worse (too many cooks).

Copilot helps a lot with that. I can write `// check if em` and it will finish the comment (email is valid) and write the actual check.

It's a massive productivity boost. I'm spending less time looking stuff up (what was that function name again?) and less time fixing typos/basic syntax errors.

Copilot doesn't always know the function name, but it does more often than me. And it makes errors too, but again less often than me (also, because I'm reading the code, not typing it, I tend to see the error immediately without needing to test it).


To answer your question with a question if I may -- when did productivity increase in software ever result in headcount reduction? The competition also will have similar productivity gain.


>when did productivity increase in software ever result in headcount reduction? The competition also will have similar productivity gain.

The average AI company has like 1 employee per $25M valuation. That's around 25x fewer employees than the typical tech company.


There used to be a profession called "typist" where a skilled person would type up your dictation quickly and without wasting sheets of paper from typewriter error. Now it's all software

There used to be mailrooms filled with people sorting and delivering physical envelopes. Now it's all software.


You are right. Sorry, i meant headcount of software staff.


> The question is: If one LLM assisted engineer can work 10x faster, will companies reduce their engineer staff by 90%?

What did you think “replacing SWEs” means?


Yet, the whole movement of getting blue collar workers to code seems to have lost its steam.


Probably because “graduating” bootcamps doesnt make one a swe and people figured out it’s a scam?


I'm a fullstack dev of 10+ years experience. We have people on our team who've done almost nothing more than a react bootcamp. They can't do what I do, even with chatGPT, but they can do a lot of things I don't want to do.


Also it requires an inherent level of talent which gets turned into skill.

Not everyone who takes piano and music harmony lessons can become a jazz composer, especially in 6 weeks or 6 months.


Sadly I dont think this can happen. There is a load of trash answers on SO, and you bet ChatGPT is trained on that.

So you get not only the good of SO, you also get the worst of SO, and theres no way to tell.

Just a downgrade for me, plus for most things I do you are better of reading the source code or the documentation (however lackluster) than fumbling with chatgpt and getting an answer that may or may not be right.

I might as well ask someone else who doesnt know any programming to search for the answer for me - they wont be able to tell a trustworthy answer from another one.

There are so many SO answers (esp. on C++) which look good, but one of the comments points out some edge case in which it breaks.

Remember, not everyone does copy paste programming, some of us have to sit there and think of a solution and work it out over hours, because its not been done before publically


Of course, there's the issue that a lot of the info for useful LLMs probably comes from places like Stack Overflow


People also forget that the model is trained on older data. At first, it will default to referencing out of date frameworks and solutions, but if you tell it that its code isn't working, it will usually correct itself.


I was very impressed when it showed me the different techniques for deep reinforcement learning. However, where it struggles is when building an agent. Because you will need a high amount of tokens to template a prompt, in the case of langchain or AutoGPT.


You may be underestimating how much meaning people derive from jumping on bandwagons and having a simple to understand group identity.

Your suggestion would make many people unhappy. They can't win the competence game and hence 'really investigating' is a losing proposition for them. What they can do is jump on bandwagons very quickly, hoping to score a first mover advantage.

How much of an advantage would one get from taking a couple of years to really investigate Bitcoin and the algorithms involved, vs buying some as early as possible and telling everyone else how great it is? :)


For me chatGPT or phind (which is based on chatGPT4, if I understood right) are great documentation tools and also general productivity tools, nothing to say about it.

The main issue is that sometimes they really f** it up bad, they make you rethink your knowledge quite deeply (do I remember wrong? did I maybe understand this wrong? is chatGPT wrong?) and this is for me something that can be worse than having to do it myself, because it creates some sort of insecurity, as you always have to challenge your self thinking, and this is not how we work in our daily job, isn't it? At least this doesn't happen so frequently to me - from time to time we have arguments in the team, but this kind of "wrong information" feels more like "hidden" traps than someone else arguing (with valid arguments, of course).


One thing that really bothers me is that I want it to use best practices and it doesn't really know which ones I'm talking about, and then I realize they are _my_ set of best practices, made from others' nameless best practices.

So I have to decide if it's just a matter of manually converting the 5-10 little things like using `env bash` in the header, etc. Or do I ask it to remember that and proceed to the next layer of the project, and feel like Katamari Coder, which is quite a feeling of what-is-this-fresh-encumbrance at times.

There is a nascent sense that the interface is not even close to where it needs to be to efficiently support that kind of recall for working memory on the coder's end.

I can definitely see a new LLM relativistic-symbolic instruction code & IDE-equivalent (with yet-unseen presentational and let's even say modal editing factors) being extremely useful, which is a bit funny but also that's what those things are good for... Right now I can scroll up through my prompts to supplement my working memory, but that's another place where the whole activity starts to seem very tedious.

(Is the LLM coming for the coders, or are coders coming for the LLM?)


I think that Copilot is much better/more promising for this kind of thing because it's looking at the code you've already written without you having to constantly prompt it.

I had a lot of the same hangups as you when I had played around with ChatGPT. How do I get it to handle the monotonous stuff without me having to spend all my time teaching it?

I finally tried Copilot the other day and it was stunning. I had a half-written golang client that was a wrapper around an undocumented and poorly structured API for a tool we use. I had written the get and create methods. Then I added a comment with an example URL for delete and Copilot auto-completed the entire method in the same style as the two methods I had already written. In some cases, like formatting & error handling, it was exactly the same as what I'd written, but other cases, like variable naming, string templating, etc., it replicated the spirit of my style but adapted for this new "delete" method.

I think ChatGPT is just the wrong interface for this kind of thing (at least right now).


They’re complementary, I’d say. GPT-4 handles greenfield development better; you can tell it to write a quick script, and usually it more or less works. Copilot doesn’t do much when you’re looking at a blank page.

This would make copilot the better tool in 90% of cases, but I’ve been using GPT-4 to script a lot of things I previously would never have scripted at all. It reduces the cost to where even one-off scripts for a twenty minute job are usually worth writing.


> Or do I ask it to remember that and proceed to the next layer of the project

I think this could be solved with a good browser extension. Something that provides an easy to access (e.g., keyboard-only) way to paste customized prompt preludes that enforce your style (or styles if, say, you're using multiple languages).

It looks like Maccy could do the job, albeit not as an extension. I haven't tried it yet.


I tried one kinda like this. Setting aside the extension feel of it, what I'd like to see is a move from prompt-helper to pattern language for visually reporting the process of working with the LLM, to which the LLM has parsing access.

So, let's say you can see your conversation as normal, but you can also see your actual code project as a node-based procedural design layout in an editable window. The relevant conversation details are used to draw the nodes.

You go to one node representing a bash script and click its Patterns tab and search-to-type for the community pattern, "Joe's Best Bash Practices". It's added to your quick palette and LLM offers to add similar patterns to other nodes in Nim and Pascal and ABS, but actually for ABS there's a "concept" symbol that indicates it's only going to be able to guess what you would want based on the others.

Then it offers to gradually teach you node-shorthand as you edit the project, so eventually you don't need to write any prompts, just basic shorthand syntax. Where the syntax gets clunky, or when you buy a custom keyboard just for this syntax but with a few gotchas, you can work together and change syntax to fit.

Nbdy hus lrnd shrtnd nos knda whr m gng wths.


One thing ChatGPT (specifically, the GPT4 version) keeps doing to me is confidently lying, and when I call it out, apologizing and spitting out another response. Sometimes the right answer, sometimes another wrong one (after a couple tries it then says something like "well, I guess I don't have the right answer after all, but here is a general description of the problem")

Part of me laughs out loud (literally, out loud for once) when it does that. But the other part of me is irritated at the overconfidence. It is a potentially handy tool but keep the real documentation handy because you'll need it.


Honestly to me it happens more than it doesn't - but maybe that's because I've tried it in cases where I've already used traditional approaches to come up with the answer and going to GPT and phind to benchmark their viability.

I've mentioned it on other thread, but phind's "google-fu" is weak, it does a shallow pass and bing index (I'm assuming) is worse than google. It's also slow as hell with GPT4 which makes digging deeper slower than just manually going in.


To me, this is a great illustration of why chat is a terrible interface for a coding tool. I've gone down this path as well, learning that you need to have a detailed prompt that establishes a lot of context, and iteratively improve it to generate better code. And yup, generating a task list and working from that is definitely a key strategy for getting GPT to do anything bigger than a few paragraphs.

But compare that to Copilot: Copilot doesn't help much when you're starting from scratch, and there's nothing for it to work with. But once you have a bit of structure, it starts to make recommendations. Rather than generating large chunks of code, the recommendations are small, chunks of a few lines or maybe even one line at a time. And it's sooooo good at picking up on patterns. As soon as you start something with built-in symmetries, it'll quickly generate all the permutations. It's sort of prompting by pointing.

This is so. much. better. than writing prompt for the chat interface. I'm really excited to see where these kinds of tools lead.


I've noticed that after using copilot on a code base for a while, you can effectively prompt the AI just by creating a descriptive comment.

// This function ends the call by sending a disconnection message to all connected peers

Bam, copilot will recommend at least the first line, with subsequent lines usually being pretty good, and more and more frequently, it will recommend the whole function.

I still use GPT-4 a lot, especially for troubleshooting errors, but I'm always pleasantly surprised at how good copilot can be.


Copilot is a game-changer and very underrated IMO. GPT4 is smart but not really used in production yet. Copilot is reportedly generating 50% of new code and I can't imagine going without it.


Where do you get that 50% number? Do you mean 50% of all new code in the industry? That seems beyond extremely unlikely.


The number is 40%, and it's 40% of code written by Copilot users. It's also just for Python:

> In files where it’s enabled, nearly 40% of code is being written by GitHub Copilot in popular coding languages, like Python—and we expect that to increase.

https://github.blog/2022-06-21-github-copilot-is-generally-a...


I wonder if this properly counts cases where copilot writes a bunch of code and then I delete it all and rewrite it manually.


From what I remember they check in at a few intervals after the suggestion is made and use string matching to check how much of the Copilot-written code remains.


It's all about the denominator!


There was some discussion by the copilot team that x% of new code in enabled IDEs was generated by copilot.

It varies, but here's one post with x=46 from last month. So, very close to half.

https://github.blog/2023-02-14-github-copilot-for-business-i...


Measuring output by LOC is not a very useful metric. The sort of code that’s most suited to ai is closer to data than code.


(I read it as 50% of their code)


I would really love to see that. So far, all I've seen is cookie cutter code to reduce a bit of typing time. Everything else was more or less hot garbage that just stood in the way of typing. Maybe in a few iterations or years. So far, personally, I haven't seen anything useful. Not saying there isn't anything, just that I haven't seen any use and code offered by it stank. Is there a demo of someone using it to showcase this game-changing power?


Copilot only writes boilerplate, it can't really handle anything non-trivial. But I write a lot of boilerplate, even using abstraction and in decent programming languages. A surprising amount of code is just boilerplate, even just keywords and punctuation; and there's a lot of small, similar code snippets that you could abstract, but it would actually produce more code and/or make your code harder to understand, so it isn't worth the effort.

Plus, tests and documentation (Copilot doubles as a good sentence/"idea" completer when writing).


It surprises me to hear this. Have you used it as I described by writing a descriptive comment first then waiting to see its response?

I only noticed it getting good at this after I was somewhat far along on a project, so I assume it requires an overall knowledge of what you're trying to do first.


For my side projects, copilot easily generates 80% of the code. It snoops around the local filesystem and picks up my naming schemes and style to help recommend better. It makes me so much more productive.

For work projects, I tried it on some throwaway work because we're still not allowed to use it for IP reasons, but it is very good at finding small utility functions to help with DRY, and can help with step by step work, but can't generate helpful code quite as easily since some of our API and codebase just doesn't follow its own norms or conventions, and it seems to me that copilot makes a lot of guesses based on its detected conventions.


> It snoops around the local filesystem and picks up my naming schemes and style to help recommend better.

Are you sure about this? It doesn't seem to work on my machine. I think it will infer things that might be in other modules, but only based on the name. I'm basing this on the fact it assumes my code has an API shape that's popular but that I don't write (eg free functions vs methods).


It looks at your recently-viewed files in your IDE. I don't think it looks at anything outside your open workspace but maybe...


Absolutely. People will quickly realize that for coding, the natural language part of LLMs is a distraction. Copilot is much better for someone actually writing code, but unfortunately doesn't get as emphasized due to the narrative surrounding LLMs right now.


> Copilot is much better for someone actually writing code

I haven't used copilot yet, but I'm using occasionally chatgpt with prompts such as "write a bash/python script take takes these parameters and perform this tasks". Then I iterate if needed, and usually, i can get what i want faster than without using chatgpt. It's not a game changer, but it's a performance boost.

How natural language is a distraction here? and how copilot would do much better for the same task?


Try not using natural language and just type what you'd type into Google. You'll get the same results and realize that all of the natural language fluff is totally unnecessary. I just typed in "bash script recursive chmod 777 all files" (as a dumb toy example) and got a resulting script back. It was surrounded by two natural language GPT comments:

> It's generally not recommended to give all files and directories the 777 permission as it can pose a security risk. However, if you still want to proceed with this, here's a bash script that recursively changes the permission of all files and directories to 777: [...] Make sure to replace "/path/to/target/directory" with the path of the directory you want to modify. To run the script, save it as a file (e.g., "chmod_all.sh"), make it executable with the command "chmod +x chmod_all.sh", and then run it with "./chmod_all.sh".

It's up to the reader to decide if those are necessary, but I'd lean towards no.


I tried this with the following:

"Bash script to add a string I specify to the beginning of every file in a directory, unless the file begins with “archive”"

I tried looking for this on Google and didn't find anything that did this -- although I could cobble together a solution with a couple of queries.

The interesting thing is that I wanted ChatGPT to append the string to the filename -- that's what I meant. But it actually append the string to the actual file. That's actually what I said, so I give it credit for doing what I said, rather than what I meant. And honestly my intent isn't necessarily obvious.

I definitely see this as a value add over just searching with Google.


> Try not using natural language and just type what you'd type into Google. You'll get the same results and realize that all of the natural language fluff is totally unnecessary.

I can get similar results with Google sometimes and I can put together what I learned from different places.

But I can get scripts that meet my exact requirements with ChatGPT. Most of my ChatGPT related code is scripting AWS related code and CloudFormation templates.

I’ve asked it to translate AWS related Python code to Node for a different projects and a bash shell script. It’s well trained on AWS related code.

I don’t know PowerShell from a hole in the wall. But I needed to write PS scripts and it did it. I’ve also used it to convert CloudFormation to Terraform


I think you (and kenjackson above) are misinterpreting what I was saying. I'm not saying use Google instead of ChatGPT; I'm saying pretend ChatGPT is Google and interact with the ChatGPT text prompt the same way. You don't need fully formed coherent sentences like you would when talking to a person; just drop in relevant keywords and ChatGPT will get you what you want.


Isn’t that the game changer though that you can use natural language and treat it like the “worlds smartest intern” and I can just give it the list of my requirements?

It’s the difference between:

“Python script to return all of the roles with a given policy AWS” (answer found on StackOverflow with Google)

And with ChatGPT

“Write a Python script that returns AWS IAM roles that contain one or more policies specified by one or more -p arguments. Use argparse to accept parameters and output the found roles as a comma separated list”


> “Write a Python script that returns AWS IAM roles that contain one or more policies specified by one or more -p arguments. Use argparse to accept parameters and output the found roles as a comma separated list”

Again, this is completely unnecessary. This is like in the old days when technically illiterate people would quite literally Ask Jeeves[0] and search for full questions because they didn't know how to interface with a search engine.

A prompt that does exactly what you're asking: "python script get AWS IAM roles that contain a policy, policy as -p command line argument, output csv"

We'll see more of that terse, efficient, style as people get more comfortable, similar to how people have (mostly) stopped using full questions to search on Google. The "talk to ChatGPT like a human" part is entirely a distraction from taking advantage of the LLM for coding purposes. Perhaps more importantly, the responses being humanized is a distraction, too.

[0] https://en.wikipedia.org/wiki/Ask.com


At first, when I didn’t specify “use argparse” it would use raw argument parsing

It also thought I actually wanted a file called “output.csv” based on your text and gave me an actual argument to specify the output file that I didn’t want.

There is a lot of nuance to my requirements that ChatGPT missed with your keywords.

Sidenote: there is a bug in both versions and also when I did this for real. Most AWS list APIs use pagination. You have to tell it that “this won’t work with more than 50 roles” and it will fix it.


you can always include the instruction to only return the code and no other text


Sure, but I want a system built for coding that does that by default... like Copilot.


... and it'll describe the code anyway, at least to me.


No script needed:

  chmod ugo+rwX . -R
(This is for GNU chmod like in Linux, BSD will be slightly different)

Of course, that's not exactly what you asked for (it's better, read the chmod man page: X applies executable only to directories) but you could just replace ugo+rwX with 777 or 0777.


> It's not a game changer, but it's a performance boost.

The story of all AI in 2023 - maybe 2x performance improvement, maybe a bit less. The big problem is that you can't trust it on its own, so it doesn't improve productivity 100x. Not even a receipt reader is good enough to reach 100%, you got to check the total, maybe it missed the dot and you get the 100x boost after all.


Has the Copilot backend been updated to use anything more advanced yet? I tried it out when it was new and free for a while and it really struggled with anything that wasn't incredibly common. GPT 4 in its chat form works a whole lot better for niche stuff than that one did.


It's definitely far better than when it was free but not GPT4 yet for most people.

It's the opposite of chatGPT: it takes more time to produce useful output but it gets much better in more complex programs while ChatGPT gets worse.


Copilot's original underlying model is currently deprecated, if I remember correctly


Idk for sure autocomplete is a great interface for someone in the ide coding, but LLM can understand requirements whole and spit out full classes and validate that the output from the server matches the specs, they work great from outside an ide.


Either way, you’re sending your companys biggest asset to another company, aren’t you? I’ll try these tools when they start being able to run locally


No or no company would be able to use it. As you type fragments of code are sent and discarded after use. You need to trust Microsoft to actually do the discarding but contractually they do and you can sue them if they accidentally or deliberately keep your code around or otherwise mismanage it.


They are obligated to give data to the government, and government took part of spying in Brazil for Boeing in the past, but I guess they are using this capability only for a few strategic companies, and most companies are not that.


> government took part of spying in Brazil for Boeing in the past

Do you have more details? Please elaborate.


But that is naive, isn't it? Who has the money and time in their life, to actually sue MS? Even if "you" is a business, few will have the resources for that.


Individuals do not (although a class action would be feasible), but large companies that use Github and other Microsoft products, of course they have both the means to sue Microsoft and the motivation should their business be impacted.


Exactly


I sort of disagree that code is the biggest asset. Take the Yandex leak. What can you do with it? Outcompete them?


> Take the Yandex leak. What can you do with it?

Obviously, add it to the big training set of the next code model.


I surely hope they use my copyrighted code and make millions out of it. Ideal case for me to sue them for lots of money.


How would you ever know? It will come in chunks of a dozen or less lines at a time and it will be written into your competitor's proprietary codebase (that you don't have access to).


> GitHub Copilot [for business] transmits snippets of your code from your IDE to GitHub to provide Suggestions to you. Code snippets data is only transmitted in real-time to return Suggestions, and is discarded once a Suggestion is returned. Copilot for Business does not retain any Code Snippets Data.

Likely, some employee would whistleblow that they're not complying with their privacy policy, and either government litigation or a class action lawsuit would ensue. That legal process would involve subpoenas and third-party auditors being granted access to GitHub/Microsoft's internal code and communications history, which makes it pretty hard to hide something as big as collecting, storing, and then training from a huge amount of uploaded code snippets they promised not to.

It's not inconceivable that they're noncompliant, but my bet would be that if they are collecting data they explicitly promise not to it's an accidental or malicious action by an individual employee, and they will freak out when they discover it and delete everything as soon as they can. If they intended to collect that data, it would be much easier to write that into the policy than deal with all the risk.

Notably, this applies to Copilot for Business, which is presumably what you're using if you are at work.


Couldn't it happen more subtly, without having the code lying around for long? The model could be doing online-learning (ML term) and only then they discard code that they get send. This means your code could appear in other people's completions/suggestions, without it having to lie anywhere. It is basically learned into the model. The code could appear almost or even completely verbatim on someone else's machine, possibly working for a competitor. Even that it is your code would not be obvious, because MS could claim, that Copilot merely accidentally constructed the same code from other learned code.

Not sure that this is how the model works, but it is conceivable.


Right.

If you are building something truly valuable locally, and it is innovative or otherwise disruptive and relies on being a first mover, centrally hosted LLMs are a non-starter.

Most software corps have countless millions of lines of code. You'd be spending lifetimes tracing where someone ripped your "copyrighted" techniques and methods.

The complete lack of security awareness and willingness to compromise privacy for convenience in people deeply saddens me.


> willingness to compromise privacy for convenience

I have to ask: do you carry a cellphone?


This is not really a valid comparison.

The cellphone is not a compromise for convenience. It allows me to make a living, providing internet connectivity and lets me keep in contact with friends and family. Without it, my freedoms would be drastically diminished.

With software we develop with, we have choices. We can use OSS. We can try to use open hardware. If we are working on sensitive things, we can use an airgapped system with vim.

When you practice these kinds of routines, they are not a burden. Actually, using vim instead of something like vscode increases productivity eventually. It does take a little bit of time.

When we couple our productivity with centrally hosted services, we greatly diminish our freedom to be productive on a wide range of problem areas. I don't say this to brag, it is to maximize all of our freedom.

In my view, most of us SHOULD be working on "sensitive" things. There is so, so much work to be done for the cause of freedom and liberty in software. We need to reserve that capability in us, we cannot let nameless people have an inside access to our expression.


A cellphone literally tracks your every move. If that's not a privacy concern then I don't know what is. Maybe a device with a microphone that's constantly on you. Oh no wait, that's also a cellphone.

I was born in the 70's, and I can tell you, you can survive just fine without a cellphone.

All of what you describe can be done on a desktop. But hey, if you want to compromise your privacy for some convenience, that's your choice.


Are you going to carry your desktop into the forest on a hammock and work? How about on a plane to other countries?

Will you carry your desktop in your car while living on the road? In the middle of forests and on top of mountains?

Will you work from a campsite with your desktop while not connected to the internet?

Can you have a meeting via your desktop from a rocky beach and no internet service?

A cellphone can't track what I type on my laptop, and it can't read encrypted comms my laptop makes to remote systems. I can put a cellphone in a distant location and use a portable, open source router with a VPN on the router, with encrypted, private DNS.

Not everyone lives inside a comfortable little box. There are all kinds of ways to do life.


Sure, if you are willing to compromise your privacy, which you clearly are.


In these companies, people are not permitted to carry their cellphone into workspace.


I was talking about someone stealing my codebase.

Talking about integrating the code into the LLM, others get the same benefit that you are getting, so I don't really see the issue.

So you can either develop everything on your own, or you can leverage LLM's, helping both yourself and others.


As a hobbyist developer with no formal training, I wish Copilot had a 'teaching' or "Senior Dev" mode, where I can play the role of the Junior Dev. I'd like it to pick up on what I'm trying to write, and then prompt me with questions or hints, but not straight up give me the code.

Or, if that's too Clippy-like annoying, let me prompt it when I'm stuck, and only then suggest hints or ask suggestive questions that guide me to a solution.

I agree, very exciting to see where all this goes.


The Github Copilot Labs extension has "codebrushes" that can transform and explain existing code instead of generating new code, but none of it only gives "hints". Maybe one of the codebrushes can take a custom prompt.


You can create custom brushes, or open the "CoPilot Labs" panel and "explain" with a custom prompt.


One thing you might try with Copilot is to ask it to explain the code. It can often give insight, even on code that you yourself wrote a few minutes ago.


Exactly this. I've tried to implement ChatGPT into may daily workflow, but you have to give it an excruciating level of detail to get something that remotely resembles real code I'd use, and even then you have to hold its hand to guide it in the correct direction, and still have to make some manual final touches at the end.

This is why I'm looking forward to Copilot X so much. It will hold much more context than the current implementation, and integrate the Chat interface that's so natural to us.


People have different preferences and habits. Having tried both models I much prefer having a conversation in one window and constructing my code from that in another. Although copilot is about to add some interesting features that may win me back.


How to overengineer with an LLM, don't state clearly the requirements, shove your pet patterns first, it is more important to follow the slice redux awareness hook than to have working solution, never trust your developers to make decisions, worry more how it is built than building a solution.

My way to work with an LLM is to have a good, clear requirement and make the LLM write a possible file organization and query the contents of each file, just the code no comments and assemble a working prototype fast, then you can iterate over the requirements and evolve from there.


Generally, I agree that approach works well. It’s going to perform better if it’s not trying to fulfill your teams existing patterns. On the other hand, allowing lots of inconsistencies in style in your large code base seems like a quick way to create a hot mess. Chat prompts seem like a really difficult way to communicate code style and conventions though. A sibling comment to yours mentions that a copilot autocomplete seems like a much better pattern for working in an existing code base, and I tend to agree that’s much more promising. Read the existing code, and recommend small pieces as you type


How often do you get working code that way ? Unless it's something trivial that fits in it's scope I'd say that's going to produce garbage. I've seen it steer into garbage on longer prompt chains about a single class (of medium complexity) - I doubt it would work project level. Mind sharing the projects ?


I work only with closed source codebases and this approach for prototypes, but, using the same example as the blog i prompt: "the current system is an online whiteboard system. Tech stack: react, use some test framework, use konva for the canvas, propose a file organization, print the file layout tree. (without explanations)." The trick is that for every chat the context is the requirement+the filesystem + the specific file, so you don't have the entire codebase in the context, only the current file, also use gpt4, gpt3 is not good enough.

My main point is that the blog post final output is mock test awareness hook redux, where an architect feels good to see his patterns, with my approach you have a prototype online whiteboard system,


I feel like this is a bunch of ceremony and back and forth, and also considering GPT-4 speed - I feel like I would fly past this approach just using copilot and coding.

I look forward to offloading these kinds of tasks to LLMs but I'm not seeing the value right now. Using them feels slow and unsatisfying, need to triple check everything, specify everything relevant for context.

Also maybe it's just me but verbalizing requirements unambiguously can often be harder than writing code for it. And it's not fun. If GPT4 was GPT3.5 fast it would probably be a completely different story.


The article stresses to never put anything that may be confidential into the prompt. Yet, chatGpt offers to out-out from using your data for training.

For most purposes that seems to be sufficient doesn't it? Or are there reasons not to trust OpenAi on this one?


I will never have full trust in an assertion unless (a) it's included in a contract that binds all parties, (b) the same contract includes a penalty for breaking the assertion that's severe enough to discourage it, and (c) I know the financial and other costs of litigation won't be severe for me.

In short, unless my large employer will likely win in punishing OpenAI should they break a promise, that promise is just aspirational marketing speak.

For data retention and usage, I'd also need a similar contractual agreement to tie the hands of any company that would acquire them in the future.


Copilot for individuals stores code snippets by default according to their TOS. Sure, you can probably find a way to opt out of that somewhere as well, but you'd have to read the TOS for every plugin and service you use, find the opt-out links and make sure you don't opt-in again via some other route (such as not Copilot but ChatGPT proper or some other Github, VSCode or some other plugin or service button or knob).


> Or are there reasons not to trust OpenAi on this one?

Yes, more related to general tech history and not a dig on OpenAI though.


There was a bug where chat history of some users were visible to others


From a GDPR or commercial confidentiality perspective, it doesn't matter what OpenAI say they'll do with your data, you can't share it with them.

Let's say your doctor enters sensitive info about you, and despite having told OpenAI not to train data with it, they use it anyway due to a bug. A year from now, ChatGPT is generating personal information tells everyone and anyone about your sensitive info.

Would you exclusively blame ChatGPT?


> are there reasons not to trust OpenAi on this one?

Yes, the fact that they are closed, not open, for one. And that they switched from open to closed the moment it benefited them to do so.


Maybe this will get people to finally sit down and do some thinking, planning, pseudo-code, etc. before diving in and starting to code.


I guess this is neat but I’d rather write code myself.


I'd rather farm all my own food, build my own house, and teach my own kids, but I don't have infinite time each day.


The prompt was almost as much work as the code, and there was no way to write that prompt without a CS education and/or years of development experience.


I have found that this applies to many Crafts.

Cutting wood is easy. Simple really. Crafting an attractive and functional chair requires discipline. Designing it? Brilliance.


Presumably you have time to write your own code as a developer, since you're not being paid to be a farmer or carpenter?


If you are optimizing for an end goal rather than enjoying the process... What is that goal? Why does it matter?


Yes. I would rather write the code myself too. But it's a good idea to use it to explore solutions or alternate implementations


It's like talking to a student or an intern. Which is not bad normally because we are also educating them.


I feel like from an information theory perspective there is a lower bound on how little we can write to get a sufficiently specific spec for the AI to generate correct code.

This example seems like almost as much work as just writing the code myself. I think English is just too fuzzy, maybe eventually we will get a language tailored to AI that will put more specific limits on the meanings of works. But then how is it all that different from Python?


> a lower bound on how little we can write to get a sufficiently specific spec for the AI to generate correct code.

Interesting though. I believe a lower bound for the number of bits must be at least the log (in base 2) of the probability of such code to appear "in the wild", and larger if the training set is biased and/or the model not fully trained


A useful approach, but this is a tiny green field project. I'm not so sure it would work in a large existing proprietary system, where you shouldn't describe too much of the "NDA protected context"...


To me, this is likely an area where we'll see future coders tested:

Interviewer: Here is a very specific project. And this part here is NDA covered. We have provided a context prompt with all the generals. Let's say you are new here and we need you effective today. Show us how you'll cover the last mile with the LLM by writing prompts that do not violate NDA but get the needed work done. Then whiteboard for your team a prompt schema & policy that you think will work for this project.

I.e. a creativity exercise at the very least. You want someone who can code _for a prompt, to solve coverage problems_, and this is still coding.

For now I think a lot of people will hoard this kind of prompt info/leverage-pattern stuff when they discover it. It's not about the individual prompts.


Is it me or does this just create a bunch of extra steps and gratuitous complexity? These tools are not so efficient or make anything easier, it seems. I'm sorry to the enthusiasts here - I am usually excited about AI and a student of Computational Linguistics, but I think this emperor is naked.


I have been trying to find a use case for these LLMs and I continue to keep an eye just in case someone figures out a way to use them that I find useful in my workflow. My only use for them so far is as an explorative tool for tasks I'm not familiar such as when having to work with programming languages I never use. For such things its great as not only do I not have to go digging through the documentation, I also do not have to then search the web for examples on how its actually used.

This is taking into account that I have removed the cost of using it as much as possible since I do not have to switch to a browser tab, asks my question, wait for reply and then copy any useful text to my editor. I have it setup as a function call inside my repl along with saved history to a local file in case I need it.

Even with this convenient way of using it I notice that pretty much the only time I use it when working on my actual projecs is just to save me the trouble of doing a google search for trivial things such as looking up word definitions/synonyms for naming things or for anything else where I would expect to find the answer with just a bit of googling. I can just quickly do my request and continue with whatever I was doing and then return for my answer later.


What I want is a prompt that continuously copies whatever I'm doing, so I can ask to complete the task.

For example, say I'm converting all identifiers in a file from lowercase to CamelCase. Then after doing like 3 of them, I can ask the LLM to take over and do the remainder.


I mean, that kind of task is more than easy to do today. You could probably just create a VS Code extension that you type "convert all identifiers in this file that match this pattern from lowercase to camel case" and pipe that to the GPT API to instantly do it (without even needing to give it the first 3 examples).


Sometimes just doing stuff takes less energy than thinking about how it can be automated.


Great example of how a GPT can reason on your behalf and dramatically improve your performance. For instance, it could watch for inconsistent approaches to design or even continue an complex implementation you’ve started just from examining context signals.


For some reason, this reminds me of how we used to give instructions to Indian coders in the 90s and early 2000s. You would have to spell out everything. What you got back was nearly there, but some back-and-forth was involved.

This brings back some terrible memories.


The big difference is that you get the results immediately and iterations take minutes not days


Yes, you can a ton more code that you have to check over with a fine toothed comb in much less time! Is that a win?


And no time zone differences!


One initial reaction to the prompting style is how similar it is to a human-to-human interaction. For example, a team lead communicating requirements to a wider team composed of less experienced engineers may also follow this type of iterative exchange, continuing until he or she is satisfied that the team understands the work to be done and has the guide rails to be successful.

I recently heard a description about the way this technology will change technical work that resonated: we will become more like the movie director, and less like the actors.


> He's using a generic application example in here: one thing to be wary of when interacting with ChatGPT and the like is that we should never put anything that may be confidential into the prompt, as that would be a security risk. Business rules, any code from a real project - all these must not enter the interaction with ChatGPT.

Remember, when storing your business code on Github servers hosted by Microsoft, it is important to not place real code from a project into OpenAI servers hosted by Microsoft. That would be a security risk.


The hosting is not the issue. Github would have different security requirements for code hosted in a private repo for a paying org than OpenAI would for free users sending prompts to an LLM. It can and should be assumed anything you type into ChatGPT is being logged to be potentially read by a human.


I can’t help but thinking: this is way more work than just code it myself? Anybody has the same thought?


It depends how you use it. I've been using it to skip boilerplate coding and get straight to the meaty bits. It took me a few days to sketch out an application using ChatGPT to handle the boilerplate, including dependency management (python, poetry, etc.).

I've had to handle the specific pieces of implementation myself. Especially unit testing new pieces of code. When asked to generate unit tests, it does ok, but it doesn't get the spirit of the code (my intended purpose) and so I'm left filling in a bunch of blanks.


> It depends how you use it.

How about specifically how they use it in this article? It doesn't seem faster than writing it themselves?


This is interesting. As a developer I haven't thought about using GPT to give me a task list instead of code right away.

Right now it seems like hit and miss still. The examples can be impressive but also have to be very generic - since GPT doesn´t have access to your codebase.

The real game changer will be a few years from now when something like GPT is an addon in VSCode... and it will know about your entire codebase.

In fact a wrapper around GPT would tell GPT about all the packages you use, the language, framework etc so much of the prep work this article documents, would be done in the background.

Then you could straight up ask the bot to give suggestions to refactor your code, wide sweeping changes. Or ask it to uprgade eg. your Symfony php app from version X to version Y including major upgrade changes... and even if it gets only 90% of the way, you can review everything in git.

LLMs will never be embodied but we are so it's like teaching a robot and having it do a lot of grunt work for us.

If only we didn't have to listen to all the hubris in Big Tech right now about the metaverse and robots taking over humanity /smh


Useful for form of learning and experimentation. Not applicable, in my view, at all due to lack of ownership of the generated code. There is no ability to copyright and protect the intellectual output from generative AI processes.

Even when your prompts are clearly the pseudocode which creates the scope of the generated response. Until this situation is legally cleared, I will be very cautious to include LLM's outside rapid prototyping and conceptual phase. Not to mention the madness of AutoGPT or more realistic approach of LangChain.

It is early in the game and the Hype train is riding more rapidly than crypto and web3 combined.

I see a lot of AI startups introducing the same capabilities through OpenAI API and prompts, without consideration of prompt injection risk. So we will see who will survive.


What I would really love is if we had a broader linting tool built on this sort of tech that could go the other way.

So often we are halfway through refactoring the code from a bad pattern that has a proven track record of issues, to one that at least prevents the worst ravages of the old one. There are never any guarantees that you will get everyone on board for this. Someone will defect, and they will keep copying and pasting the old pattern and if they code faster than you then you never get to the end.

Give me a way to mark a bunch of code as 'the old way' and hook that information into autocomplete or even just a linter that runs at code review time.


I started a bit of an exploration around prompts and code a week or three back. I want to figure out the down/up-sides and create tools for myself around it.

So, for this project (a game), I decided "for fun" to try to not write any code myself, and avoid narrow prompts that would just feed me single functions for a very specific purpose. The LLM should be responsible for this, not me! It's pretty painful since I still have to debug and understand the potential garbage I was given and after understanding what is wrong, get rid of it, and change/add to the prompt to get new code. Very often completely new code[1]. Rinse and repeat until I have what I need.

The above is a contrived scenario, but it does give some interesting insights. A nice one is that since here is one or more prompts connected to all the code (and its commit), the intention of the code is very well documented in natural language. The commit history creates a rather nice story that I would not normally get in a repository.

Another thing is, getting an LLM (ChatGPT mostly) to fix a bug is really hit and miss and mostly miss for me. Say, a buggy piece comes from the LLM and I feel that this could almost be what I need. I feed that back in with a hint or two and it's very rare that it actually fixes something unless I am very very specific (again, needing to read/understand the intention of the solution). In many cases I, again, get completely new code back. This, more than once, forced my hand to "cheat" and do human changes or additions.

Due to the nature of the contrived scenario, the code quality is obviously suffering but I am looking forward to making the LLM refactor/clean things up eventually.

On occasion ChatGPT tells me it can't help me with my homework. Which is interesting in itself. They are actually trying (but failing) to prevent that. I am really curious how gimped their models will be going forward.

I've been programming for quite long. I've come to realize that I don't need to be programming in the traditional sense. What I like is creating. If that means I can massage an LLM to do a bit of grunt work, I'm good with that.

That said, it still often feels very much like programming, though.

[1] The completely new code issue can likely be alleviated by tweaking transformers settings

Edit: For the curious, the repo is here: https://github.com/romland/llemmings and an example of a commit from the other day: https://github.com/romland/llemmings/commit/466babf420f617dd... - I will push through and make it a playable game, after that, I'll see.


That is really interesting experiment! I have so many questions.

- do you feel like this could be a viable work model for real projects? I recognize it will most likely be more effective to balance LLM code with hand written code in the real world.

- some of your prompts are really long. Do you feel like the code you get out of the LLM is worth the effort you put in?

- given that the code returned is often wrong, do you feel like you could feasible for someone who knows little to no code?

- it seems like you already know well all the technology behind what you are building (I.e. you know how to write a game in js). Do you think you could do this without already having that background knowledge?

- how many times do you have to refine a prompt before you get something that is worth committing?


I think it could be viable, even right now, with a big caveat, you will want to do some "human" fixes in the code (not just the glue between prompts). The downside of that is you might miss out on parts of the nice natural language story in the commit history. But the upside is you will save a lot of time.

Down the line you will be able to (cheaply) have LLMs know about your entire code-base and at that point, it will definitely become a pretty good option.

On prompt-length, yeah, some of those prompts took a long time to craft. The longer I spend on a prompt, the more variations of the same code I have seen -- I probably get impatient and biased and home in on the exact solution I want to see instead of explaining myself better. When it's gone that far, it's probably not worth it. Very often I should probably also start over on the prompt as it probably can be described differently. That said, if it was in the real world and I was fine with going in and massaging the code fully, quite some time could be saved.

If you don't know how to code, I think it will be very hard. You would at the very least need a lot more patience. But on the flip side, you can ask for explanations of the code that is returned and I must actually say that that is often pretty good -- albeit very verbose in ChatGPT's case. I find it hard to throw a real conclusion out there, but I can say that domain knowledge will always help you. A lot.

I think if you know javascript, you could easily make a game even though you had never ever thought about making a game before. The nice thing about that is that you will probably not do any premature optimization at least :-)

All in all, some prompts was nailed down on first try, the simple particle system was one such example. Some other prompts -- for instance the map-generation with Perlin noise -- might be 50 attempts.

A lot of small decisions are helpful, such as deciding against any external dependencies. It's pretty dodgy to ask for code around some that (e.g. some noise library) that you need to fit into your project. I decided pretty early that there should be no external dependencies at all and all graphics would be procedurally generated. It has helped me as I don't need to understand any libraries I have never used before.

Another note that is related to the above, there are upsides and downsides with high-ish temperature is you get varying results. I think I should probably change my behaviour around that and possibly tweak it depending on how exact I feel my prompt is.

I find myself often wondering where the cap of today's LLM's are, even if we go in the direction of multi-models and have a base which does the reasoning -- and I have to say I keep finding myself getting surprised. I think there is a good possibility that this will be the way some kinds of development will be. But, well, we'd need good local models for that if we work on projects that might be of a sensitive nature.

Related to amount of prompt attempts: I think the game has cost me around $6 in OpenAI fees so far.

One particularly irritating (time consuming) prompt was getting animated legs and feet: https://github.com/romland/llemmings/commit/e9852a353f89c217...


That's a beautiful readme, starred!

Out of curiosity, right now would you say you have saved time by (almost) exclusively prompting instead of typing the code up yourself? Do you see that trending in another direction as the project progresses?


It was far easier to get a big chunks of work done in the beginning, but that is pretty much how it works for a human too (at least for me). The thing that limit you is the context-length limit of the LLM, so you have to be rather picky on what existing code you feed back in. With this then comes the issue with all the glue between the prompts, so I can see that the more polished things will need to become, the more human intervention -- this is a trend I already very much see.

If there is time saved, it is mostly because I don't fear some upcoming grunt work. Say, for instance, creating the "Builder" lemming. You know pretty much exactly how to do it but you know there will be a lot of one-off errors and subtle issues. It's easier to go at it by throwing together some prompt a bit half-heartedly and see where it goes.

On some prompts, several hours were spent, mostly reading and debugging outputs from the LLM. This is where it eventually gets a bit dubious -- I now know pretty much exactly how I want the code to look since I have seen so many variants. I might find myself massaging the prompt to narrow in on my exact solution instead of making the LLM "understand the problem".

Much of this is due to the contrived situation (human should write little code) -- in the real world you would just fix the code instead of the prompt and save a lot of time.

Thank you, by the way! I always find it scary to share links to projects! :-)


No worries, going to check out some of the commits when I get a bit more free time as well. The concept is intriguing!

The usefulness of LLMs for engineering things is very hard to gauge, and your project is going to be quite interesting as you progress. No doubt they help with writing new things, but I spend maybe ~15% of my time working on something new, vs maintenance and extensions. The more common activities are very infrequently demonstrated, either the usefulness diminishes as the context required grows, or they simply make for less exciting examples. Though someone in my org has brought up an LLM tool that tries to remedy bugs on the fly (at runtime), which sounds absolutely horrific to me...

It sounds similar to my experience with Copilot then. In small, self-contained bits of code -- much more common in new projects or microservices for example -- it can save a lot of cookie cutter work. Sometimes it will get me 80% of the way there, and I have to manually tweak it. Quite often it produces complete garbage that I ignore. All that to say, if I wasn't an SE, Copilot brings me no closer to tackling anything beyond hello world.

One big benefit though is with the simpler test cases. If I start them with a "GIVEN ... WHEN ... THEN ..." comment, the autocompletes for those can be terrific, requiring maybe some alterations to suite my taste. I get positive feedback in PRs and from people debugging the test cases too, because the intention behind them is clear without needing to guess the rationale for the test. Win win!


Just curious, you’re using which version?


I have experimented quite a bit with various flavours of LLaMa, but have had little success in actually getting not-narrow outputs out of them.

Most of the code in there now is generated by gpt-3.5-turbo. Some commits are by GPT-4, and that is mostly due to context length limitations. I have tried to put which LLM was used in every non-human commit, but I might have missed it in some.


If you look at what the prompter had to know in order to get a useful output you can see how far away we are from replacing that individual with a business stakeholder.

That’s why I view these tools as “productivity enhancements” rather than a straight replacement of a job. In some cases maybe, but not for coders just yet…

I think the most underrated and useful parts of this process is the ability to get going.

For me the starting energy of a project is the thing that blocks me. With chatgpt, it’s a simple prompt to get the conversation going. Once in motion I can put the puzzle pieces together while chatgpt can help me keep momentum


Asking LLM to write complex code could approach the speed of writing the code yourself. Having it plan it out could however kick start a nice direction. LLMs are great for single clear function code.


do people enjoy working this way? wasting time verbalizing your thoughts, stating the obvious, wordsmithing to get the thing to "understand" what you _actually_ want?


Sometimes I don't know what I want, or I don't know if the way I want to do it is possible. For these paths, using ChatGPT is very useful. So far my use cases are 'taking care of boilerplate' and 'finding ways to do things'.


Public service announcement that myself and others are actively trying to poison the training data used for code generation systems. https://codegencodepoisoningcontest.cargo.site/

See previous discussion here: https://news.ycombinator.com/item?id=35545442


If an AI could write your tests, doesn’t that suggest the test are just checking implementation details? Unless you describe your business rules in detail.. as a test for example! How could the AI then write those rules as tests.

Perhaps you could automate the boring parts I guess. But maybe that suggests we are working with a poor abstraction that doesn’t allow us to be as terse and precise as we want to be?


This got me wondering about best techniques for integrating LLM code assistants into day-to-day software development, and hence Ask HN: What is your GitHub Copilot (code LLM assistant) workflow?

Please share your experience here: https://news.ycombinator.com/item?id=35613576

I'd like to learn what is working and useful.


I haven't used ChatGPT myself, so it's hard for me to deduce from the article by itself, but what is the advantage supposed to be here? At first sight it doesn't appear to be faster or require less knowledge than writing it yourself does - is that wrong, or is there another advantage that I'm missing?


A bit off topic but this article made think of a wierd future.

LLM's writing ad-hoc code, handling message passing, and a self contained llm functions in software systems. Ushering in a whole era of non-deterministic computing. I thought our infrastructure was shaky now...


That's refinement-style programming from a novel angel, but still clearly refinement-style.


Last month I had the task to write lots of different regexes to extract pieces of information. Tried Chatgpt for many of them. But I found the results really weak. Somehow it's not able to even generate a simple regex. Always something wrong.


Is there a tool that allows to do this within a text editor (for instance VS Code). Using selection instead of copy pasting. Having the LLM store its output directly within local files. Maybe giving it access to a shell to run the tests on its own ?


If there isn't a tool currently...then give it time, and eventually there will be tools similar to what you dewscribed. I'd guess they'll be called something like prompt editors (like text editors, etc.). ...Or, maybe they'll be called chatditors...no, no, prompt editors is better. ;-)


Copilot X will include that capability.


Isn’t the point of code to express what we want in a succinct and expressive way.

If we’re needing all this software to help us, maybe we should look at the languages we’re using and make better more intuitive ones.


The time spent on the prompt and the really not sophisticated outcome don’t balance each other out, in my opinion.


I've tried using ChatGPT for writing Vitest tests, and it can't do it, full stop.

If you look at the end, it parroted out some tests for jest. True, the APIs are mostly compatible and you can probably change that to Vitest with a couple of lines of code changed, but for more advanced tests, that won't necessarily work.

Really disappointed to see this so highly upvoted, when it's pure garbage


That library doesn’t even appear to have a stable release yet, and was at v0.0.x as of a year or so ago… you also may be using chatGPT 3.5 which may predate this library. As a dev with 15 years of experience I haven’t even switched over from jest (but plan to)… all this to say, maybe we can give the bot some slack here. It should be possible to include vitest docs and examples in your prompts to teach it in context, did you try that?


Sure, I realize it's unsuccessful at using vitest because it's (relatively) new.

I'm just saying, this was a really telling example of how to use it for prompting.

A very large chunk of the tools I use in Javascript-land are "too new" for ChatGPT to work with properly.

Giving context unfortunately doesn't really work as ChatGPT usually prioritizes what it's absorbed through the corpus over anything you tell it.

To be clear, it does fine with new information if the things you ask it for don't match token sequences it's already been trained on, so if you give it a fictional library and ask it to perform some task with it, that doesn't seem too much like the things it might do with another library that accomplishes a similar thing with a similar API, it will actually use the custom code more successfully.

But for Vitest, it can't accept enough of the docs you might provide for it to be useful to you (though admittedly, sometimes it will show how to do something with jest that at least makes finding the right thing in vitest easier).

By the way, if you are planning to switch over in the future, the path for doing that is seemingly well documented by vitest and seems to be pretty straightforward as well, though I haven't meaningfully used Jest for comparison

edit: to be clear, I'm very impressed with ChatGPT's capabilities, and I think there are good examples of prompting where it does meaningful work in tandem with the human driver exercising their own judgment.

This was an example of a person asking it for things while not pointing out its limitations, which downplays the extent to which one needs to exercise one's judgment when using it. If they failed to point out the things ChatGPT got wrong which I know about, why would I trust that the other things I don't know it got wrong are accurate.


This is an amazing demonstration, but I'm worried that when this goes mainstream, we'll inherit a ton of baggage from today's programming. Specifically:

* The tests are written in BDD style "it('should xyz')", which programmers do in code like this for convenience. But if we're automating their creation, then actual human-readable Cucumber clauses would be more useful. Maybe the tests can be transpiled. This isn't the AI's fault, but more of a symptom of how the original spirit of BDD as a means for nonprogrammers to test business logic seems to have been lost.

* React hooks and Redux syntax are somewhat contrived/derivative. The underlying concepts like functional reactive programming and reducers are great, but the syntax is often repetitive or verbose, with a lot of boilerplate to accomplish things that might be one-liners in other languages/frameworks. This is more of a critique of the state of web programming than of the AI's performance.

* MVVM is a fine pattern, but at the end of the day, it's an awful lot of handwaving to accomplish limited functionality. What do I mean by that? Mainly that I question whether the frontend needs models, routes, controllers (which I realize are MVC), etc. I mourn that we lost the idempotent #nocode HTML of the 90s and are back to manually writing app interfaces by hand in Javascript (like we did for native desktop apps in the C++ OOP days) when custom elements/components would have been so much easier. HTMX combined with some kind of distributed serverless lambda functions (that are actually as simple as they should be) would reduce pages of code to a WYSIWYG document that nonprogrammers could edit.

What I'm really getting at is that I envisioned programming going a different direction back in the late 90s. We got GPUs/TensorFlow and Docker and WebAssembly and Rust etc etc etc. And these things are all fine, but they're contrived/derivative too. More formal systems might look like multicore/multimemory transputers (or Lisp machines), native virtual machines with full sandboxing built in so anything can run anywhere, immutable and auto-parallelized languages like HigherOrderCO/HVM or true vector processing with GNU Octave (MATLAB) so that we don't have to manually manage vertex buffers or free memory, etc.

I've had architectures in mind for better hardware and programming languages for about 25 years (that's why I got my computer engineering degree) but I will simply never have time to implement them. All I do is work and cope. I just keep watching as everyone reinvents the same imperative programming wheel over and over again. And honestly it's gone on so long that I almost don't even care anymore. It feels more appealing in middle age to maybe just go be a hermit, get out of tech. I've always known that someday I'd have to choose between programming and my life.

Anyway, now that I'm way too old to begin the training, I wonder if AI might help to rapidly prototype truly innovating tools. Maybe more like J.A.R.V.I.S. where it's just on all of the time and can iterate on ideas at a superhuman rate to assist humans in their self-actualization.

Then again, once we have that, it becomes trivial to implement the stuff that I rant about. Maybe we only have about 5-10 years until all of the problems are solved. I mean all of them, everywhere, in physics/chemistry/biology/etc. Rather than automating creative acts and play as AI is doing now. If the Singularity arrives in 2030 instead of 2040, that also seems like a strong incentive to go be a hermit.

Does any of this resonate with anyone? That somehow everything has gone terribly wrong, but it's more of a hiccup than a crisis? That maybe the most impactful thing that any of us can do is.. wait for things to get better?


This is deeply resonant with me for the following reasons:

1) Age

2) BDD-style or what I call a madlib proxy for playing cucumber on TV. Not a fan having used it in an RoR context I can only call hipster-engineering, not what DHH described.

3) I just had the discussion on redux vs. datomic vs. riak with friends yesterday.

4) Ditto the conversation on MVVM and the implied constraint complexity of putting nodejs and chromium in the same deployment package and calling it electron while carrying on how simple it is relative to... a world where everything is actually native all the way down?

5) Me too on the CASE era.

6) Cue Donald Knuth on literate programming. One thing that cucumber is not, but I think taking another iteration at literate programming in light of GPT or LLMs is a good idea since Knuth is never wrong just 50 years ahead of time, but we needed a collaboration of human-computer agents that is patterned on a sensemaking protocol that can resolve subjective truth by consensus of man and machine. How else could you possibly resolve the fact that the SOTA lies to me on a daily basis while defending itself and its lack of veracity with force in what can only be seen as emulating the culture of one's parents.

7) Yes, AI should help on the iterations. Those short design sketch-to-demo we used to do at the design studio with sketch on Monday and demo on Friday should be much easier today to go from breakfast sketch to dinner demo, but I don't think they are. The tooling is radically better but that better has come at the cost of complexity and going sideways, neither of which are being fully felt and accounted for reflectively, i.e. they're not how you get to typing less and having the tools do the work because when they break, the debugging is mind-crushing.

8) I think the thing that's missing in the trivial part is that it's not actually trivial, but particularly because the software is the message and that insight stems from the fact that software has emergent properties such as extensibility, composability, and a resultant rate of change that make it very difficult to compare from decade to decade because software's fundamental disequilibrium stems from the fact that the full stack is in constant flux from a mad hatter's pop culture where we never sing the same song twice. There's value in theme and variations if it can be modeled as improvisational human-computer design pairing rather than yet another orchestration. Joe Beda was as right about improvisation as Knuth is about the art of computer programming.

9) I guess the t-shirt is: I'm not waiting...

10) In the immortal words of Raymond Loewy: Never leave well enough alone.

If there's a set of artifacts in software that achieve what I hope for with AI, it's somewhere between Bret Victor and https://iolanguage.org/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: