Learn how to think ontologically and break down your requests first by what you'...

tharant · 2024-10-25T16:55:18 1729875318

I’ve received this /exact/ same unhelpful response multiple times in other threads (from different users even; am I talking to deterministic bots here) so I’ll do the same by offering the response I gave others:

“Since I’m dealing with models rather than other engineers should I expect the process of breaking down the problem to be dramatically different from that of writing design documents or API specs? I rarely have difficulty prompting (or creating useful system prompts for) models when chatting or doing RAG work with plain English docs but once I try to get coherent code from a model things fall apart pretty quickly.”

Said another way, I long ago learned as an engineer how to do the things you’re suggesting (they are skills I’ve used and evolved over more than twenty years as a professional software engineer) but, in my experience, those same skills do not seem to apply when trying to do non-trivial code-generation tasks for Java/Scala/Python projects with an LLM.

I’ve tried prompting ’em with my design documentation and API specs. I’ve tried prompting ’em with a pared-down version of my docs/specs in order to be more succinct. I’ve tried expanding my docs/specs to be more concrete and detailed. I’ve tried very short prompts. I’ve tried very detailed and lengthy prompts. I’ve tried tweaking system prompts. I’ve tried starting with prompts that limit the scope of the project then expanding from there. I’ve tried uploading the docs/specs so that the models can reference them later. I’ve tried giving ‘em access to entire repositories. I’ve tried so many things all to no avail. The best solution I’ve thus far found in these threads is to just try to fit the entirety of a project within the limits of the context window and/or to just keep my whole project in a few short files; that may be sufficient for small projects but it’s not possible nor even reasonable given the size and complexity of projects with which I work.

As I’ve said elsewhere, I dearly /want/ these things to work in my environment and for my use-cases as capably as they do in/for yours—this stuff is really interesting and I enjoy learning how to do new things—but after reading all the comments in this thread and others I don’t think the needs of my environment are supported by these models. Maybe they will be someday. I’ll keep playing with things but as of right now I see a significant impedance mismatch between the confidence others have in the models’ ability to do complex coding tasks compared to the kinds of tasks I’ve seen demonstrated here and elsewhere.

HaZeust · 2024-10-26T01:29:38 1729906178

We've already talked; but let's squash what appears to be a feeling of a lack of answers for you.

Truth is, this has been a learning process for us all with these tools, but it needs to be understood -- especially going in -- that these models excel at translation tasks and constrained problem spaces but can struggle with generating cohesive, large-scale code without specific hand-holding.

This is generally what I do:

1. Start with the "Whole" Picture: Models often work best when they know the final goal and the prompter has worked backwards from them. Think ontologically: define the problem as if you’re describing it to a junior dev colleague who only understands outcomes, not methods. Instead of just prompting with specs, explain the end-state you want (even simple features like error handling or specific libraries). If you have ANY achieved method or inclusion for what the end-state should include, you write it out clearly.

2. Break Down the Process: Models handle complexity better if it's broken down into micro-tasks. Instead of expecting it to design an entire feature, ask for components step-by-step, integrating each output with the rest manually. There is a very decent chance that you have to do this across multiple new chats, after 3-5 iterations in, the AI will most likely crash and burn. At that point, you open a new chat, paste in the whole working codebase, and picked up from where you left off in the last chat. You have to do this A LOT.

3. Iterative Refinement: When the model generates code, go over it closely. Check for errors, then use targeted prompts to fix specific issues rather than requesting whole rewrites. Point out exact issues and ask for specific fixes; this prevents the model from “looping” through similar incorrect solutions.

Some Hacks I Use As Well:

1. Contextual Repetition: Reinforce key components (e.g., function structure, file organization) to avoid losing them in longer prompts.

2. Use “As if” Phrasing: Prompt the model to act “as if” it’s coding for a hypothetical person (e.g., a junior dev). It’s surprisingly effective at generating more thoughtful code with this type of frame.

3. Ask for Questions: Have the model ask you clarifying questions if it’s “unsure.” This can uncover key details you may not have thought to include.

4. Remind It What It Is Doing: Sounds counter-productive, but almost all of my code chats end with a description of what exactly I expect from the AI, iterated over the various stunts and "shortcuts" that it has taken over the years I've used it. I generally say "Write the code in full with no omissions or code comment-blocks or 'GO-HERE' substitutions" (this is directly because AI has generally pulled "/rest of code goes here/ on me several times), "write the code in multiple answers if you must, pausing at the generic character limit and resuming when I say 'continue' in the next message" (because I've had "errors" from code generation in the past because the chat reply processing time had timed out).

It's a labor of love and it's things you learn over time, and it won't happen if you don't put the work in.

-------------------------

I wrote all of this haphazardly in a Google Doc. GPT-4 organized it for me cleanly.

fragmede · 2024-10-26T02:40:57 1729910457

> 3. Iterative Refinement

Beware of trying to get the LLM to output exactly the code you want. You get points for checking code in git and sending PRs, not tokens the LLM outputs. If it's being stupid and going in circles, or you know from experience that the particular LLM used will (they vary greatly in quality), you can just copy the code out (if you're not using some sort of AI IDE), fix it, then paste that in and/or commit it.

Some may ask, if you have to do that, then why use an LLM in the first place. It's good at taking small/medium conceptual tasks and breaking them down, and it's also a faster typer than me. Even though I have to polish its output, I find it easier to get things done because I can focus more on the higher level (customer) issues while the LLM gets started with lower level details on implementing/fixing things.

HaZeust · 2024-10-26T18:08:22 1729966102

Exactly! Should have also worded that section similar to your comment here, but you hit the nail on the head.

tharant · 2024-10-27T01:12:31 1729991551

Thank you! This information is the kind of information for which I’ve been searching.

That said, l feel like there’s a mutual-exclusivity problem between ‘Start with the "Whole" Picture’ and ‘Break Down the Process’.

For example, how does this from your first suggestion:

> explain the end-state you want (even simple features like error handling or specific libraries). If you have ANY achieved method or inclusion for what the end-state should include, you write it out clearly.

not contradict this from your second suggestion:

> Instead of expecting it to design an entire feature, ask for components step-by-step

Additionally, you said:

> There is a very decent chance that you have to do this across multiple new chats, after 3-5 iterations in, the AI will most likely crash and burn. At that point, you open a new chat, paste in the whole working codebase, and picked up from where you left off in the last chat. You have to do this A LOT.

But IME, by the time the model chokes on one chat, the codebase is already large enough that pasting the whole thing into another chat typically results in my hitting context-window limits. Perhaps, in the kinds of projects I typically work, a good RAG tool would offer better results?

To be clear, right now I’m only discussing my difficulties with the chatbots offered by the model providers—which, for me, is mostly Claude but also a bit of ChatGPT; my experience with Copilot is outdated so it probably deserves another look, and I’ve not yet tried some of the third-party, code-centric apps like aider or cursor that have previously been suggested, though I will soon.

As for your recommended hacks, these look to be helpful; thank you! The only part I find odd is your inclusion of “Write the code in full with no omissions or code comment-blocks or 'GO-HERE' substitutions”; I myself feel like I get far better results when I ask the model to 1) write full code for the methods that are likely to be the kinds of generic CS logic that a junior would know, 2) write stubs for the business logic, then 3) implementing the more complex business logic myself manually. IOW—and IME—they’re really good at writing boilerplate and generating or reasoning about junior-level CS logic. That’s indeed helpful to me, but it’s a far cry from the kinds of “ChatGPT can write entire apps with minimal effort” hype I keep seeing, and it’s only marginally better, IME at least, than what I’ve been able to do with the inline-completion and automatic boilerplate features that have been included in the IDEs I’ve used for over a decade.

> It's a labor of love and it's things you learn over time, and it won't happen if you don't put the work in.

Indeed. I do love playing with this stuff and learning more. Thank you again for sharing your knowledge!

> I wrote all of this haphazardly in a Google Doc. GPT-4 organized it for me cleanly.

I am regularly impressed at how well these models behave when asked to summarize a document or even when asked to expand a set of my notes into something more coherent; it’s truly remarkable!