I just tell GPT to return complete code, and tell it that if any section is omit...

bamboozled · on Feb 25, 2024

I wonder if there is a hard coded prompt somewhere prompting the model to be "lazy" by default, to save money on inference, or something like this. Maybe not how it works?

When you ask if to write the complete code, it just ignores what it was originally told and does what you want.

CuriouslyC · on Feb 25, 2024

It's not a prompt thing, they've aligned it to be lazy. The short-form article style and ~1000 word average length are almost certainly from RLHF and internal question answering fine tuning datasets. The extreme laziness (stuff like, "as a large language model, I have not been built with the capabilities for debugging", or "I don't know how to convert that json document to yaml") is pretty rare, and seems to be a statistical abnormality due to inherent variation in the model's inference more than anything else.

astrange · on Feb 25, 2024

IIRC they did amend their prompt to tell it not to quote long books/articles/recipes verbatim for copyright reasons, no matter how much the user asks, and that might not help.

taneq · on Feb 25, 2024

“If you’re asked for a summary longer than 100 words, generate an 80 wire word summary” or words to that effect.

smokel · on Feb 25, 2024

Let's save this thread for posterity, because it's a very nice and ironic example of actual humans hallucinating stuff in a similar way that ChatGPT gets accused of all the time :)

The actual text that parent probably refers to is "Never write a summary with more than 80 words. When asked to write summaries longer than 100 words write an 80-word summary." [1]

Where did the word "wire" enter the discussion? I don't really trust these leaked prompts to be reliable though. Just enjoying the way history is unfolding.

[1] https://news.ycombinator.com/item?id=39289350

astrange · on Feb 26, 2024

The system prompts are reliable and not "leaked". It's not leaking if you just ask and it answers. It's not trying to hide it.

smokel · on Feb 27, 2024

I could simply reply with "The system prompts are not reliable".

Several people in the original thread have tried to replicate the prompts, and the results differ in wording, so it may definitely be hallucinating a bit.

If you just ask for the system prompt, ChatGPT does not respond with that. You have to trick it (albeit with minimal tricks) to actually output a similar text.

fastneutron · on Feb 25, 2024

100% this. I’ve been party to RLHF jobs before and the instructions nearly always state to prioritize conciseness in the model response.

In aggregate, this is how you wind up with stub functions and narrative descriptions rather than full working implementations. The RLHF is optimizing for correctness within some constrained token count.

vineyardmike · on Feb 25, 2024

It's probably just a result of the training data. I bet its not explicitly "trained" to reply with 400 loc for a complete file, but its trained to return a few dozen lines of a single method.

anotherpaulg · on Feb 25, 2024

I mean, of course I tried just asking GPT to not be lazy and write all the code. I quantitatively assessed many versions of that approach and found it didn't help.

I implemented and evaluated a large number of both simple and non-trivial approaches to solving the coding laziness problem. Here's the relevant paragraph from the article I linked above:

Aider’s new unified diff editing format outperforms other solutions I evaluated by a wide margin. I explored many other approaches including: prompts about being tireless and diligent, OpenAI’s function/tool calling capabilities, numerous variations on aider’s existing editing formats, line number based formats and other diff-like formats. The results shared here reflect an extensive investigation and benchmark evaluations of many approaches.

CuriouslyC · on Feb 25, 2024

Did you try telling it that being lazy is futile in the manner I described? That is a major improvement over just telling it to return complete code. I've gotten chatgpt to spit out >1k lines of complete code with that, using just "return complete code" will cause it to try and find ways to answer a subset of the question "completely" to appease its alignment.