Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT-5 simply sucks at some things. The very first thing I asked it to do was to give me an image of knife with spiral damascus pattern, it gave me an image of such a knife, but with two handles at a right angle: https://chatgpt.com/share/689506a7-ada0-8012-a88f-fa5aa03474...

Then I asked it to give me the same image but with only one handle; as a result, it removed one of the pins from a handle, but the knife had still had two handles.

It's not surprising that a new version of such a versatile tool has edge cases where it's worse than a previous version (though if it failed at the very first task I gave it, I wonder how edge that case really was). Which is why you shouldn't just switch over everybody without grace period nor any choice.

The old chatgpt didn't have a problem with that prompt.

For something so complicated it doesn't surprise that a major new version has some worse behaviors, which is why I wouldn't deprecate all the old models so quickly.





The image model (GPT-Image-1) hasn’t changed

Yep, GPT-5 doesn't output images: https://platform.openai.com/docs/models/gpt-5

Then why does it produce different output?

It works as a tool. The main model (GPT-4o or GPT-5 or o3 or whatever) composes a prompt and passes that to the image model.

This means different top level models will get different results.

You can ask the model to tell you the prompt that it used, and it will answer, but there is no way of being 100% sure it is telling you the truth!

My hunch is that it is telling the truth though, because models are generally very good at repeating text from earlier in their context.


Source for this? My understanding was that this was true for dalle3, but that the autoregressive image generation just takes in the entire chat context — no hidden prompt.

Look at the leaked system prompts and you'll see the tool definition used for image generation.

I stand corrected! Thanks.

You know that unless you control for seed and temperature, you always get a different output for the same prompts even with the model unchanged... right?

Somehow I copied your prompt and got a knife with a single handle on the first try: https://chatgpt.com/s/m_689647439a848191b69aab3ebd9bc56c

Edit: chatGPT translated the prompt from english to portuguese when I copied the share link.


I think that is one of the most frustrating issues I currently face when using LLMs. One can send the same prompt in two separate chats and receive two drastically different responses.

It is frustrating that it’ll still give a bad response sometimes, but I consider the variation in responses a feature. If it’s going down the wrong path, it’s nice to be able to roll the dice again and get it back on track.

I’ve noticed inconsistencies like this, everyone said that it couldn’t count the b’s in blueberry, but it worked for me the first time, so I thought it was haters but played with a few other variations and got flaws. (Famously, it didn’t get r’s in strawberry).

I guess we know it’s non-deterministic but there must be some pretty basic randomizations in there somewhere, maybe around tuning its creativity?


Temperature is a very basic concept that makes LLMs work as well as they do in the first place. That's just how it works and that's how it's been always supposed to work.

To ensure that GPT-5 funnels the image to the SOTA model `gpt-image-1`, click the Plus Sign and select "Create Image". There will still be some inherent prompt enrichment likely happening since GPT-5 is using `gpt-image-1` as a tool. Outside of using the API, I'm not sure there is a good way to avoid this from happening.

Prompt: "A photo of a kitchen knife with the classic Damascus spiral metallic pattern on the blade itself, studio photography"

Image: https://imgur.com/a/Qe6VKrd


Yes, it sucks

But GPT-4 would have the same problems, since it uses the same image model


The image model is literally the same model

So there may be something weird going on with images in GPT-5, which OpenAI avoided any discussion about in the livestream. The artist for SMBC noted that GPT-5 was better at plagiarizing his style: https://bsky.app/profile/zachweinersmith.bsky.social/post/3l...

However, there have been no updates to the underlying image model (gpt-image-1). But due to the autoregressive nature of the image generation where GPT generates tokens which are then decoded by the image model (in contrast to diffusion models), it is possible for an update to the base LLM token generator to incorporate new images as training data without having to train the downstream image model on those images.


No, those changes are going to be caused by the top level models composing different prompts to the underlying image models. GPT-5 is not a multi-modal image output model and still uses the same image generation model that other ChatGPT models use, via tool calling.

GPT-4o was meant to be multi-modal image output model, but they ended up shipping that capability as a separate model rather than exposing it directly.


That may be a more precise interpretation given the leaked system prompt, as the schema for the tool there includes a prompt: https://news.ycombinator.com/item?id=44832990



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: