This is a great example of goalposts shifting. Even having a model that can engage in coherent conversation and synthesize new information on the fly is revolutionary compared to just a few years ago. Now the bar has moved up to creativity without human intervention.
But isn't this goalpost shifting actually reasonable?
We discovered this nearly-magical technology. But now the novelty is wearing off, and the question is no longer "how awesome is this?". It's "what can I do with it for today?".
And frustratingly, the apparent list of uses is shrinking, mostly because many serious applications come with a footnote of "yeah, it can do that, but unreliably and with failure modes that are hard for most users to spot and correct".
So yes, adding "...but without making up dangerous nonsense" is moving the goalposts, but is it wrong?
There are a lot of things where being reliable isn’t as important (or it’s easier to be reliable).
For example, we are using it to do meeting summaries and it is remarkably good at it. In fact, in comparison to humans we did A/B testing with - usually better.
Another thing is new employee ramp. It is able to answer questions and guide new employees much faster than we’ve ever seen before.
Another thing I’ve started toying with it with, but have gotten incredible results so far is email prioritization. Basically letting me know which emails I should read most urgently.
Again, these were all things where the state of the art was basically useless 3 years ago.
IMO it’s not wrong to want the next improvement (“…but without making up dangerous nonsense”), but it is disingenuous to pretend as if there hasn’t already been a huge leap in capabilities. It’s like being unimpressed with the Wright brothers’ flight because nobody has figured out commercial air travel yet.