It's definitely making some errors

kandesbunzler · 2025-01-31T21:50:05 1738360205

Like?

aprilthird2021 · 2025-01-31T23:32:20 1738366340

Even though it was told that it MUST quote users directly, it still outputs:

> It’s already a game changer for many people. But to have so many names like o1, o3-mini, GPT-4o, & GPT-4o-mini suggests there may be too much focus on internal tech details rather than clear communication." (paraphrase based on multiple similar sentiments)

It also hallucinates quotes.

For example:

> "I’m pretty sure 'o3-mini' works better for that purpose than 'GPT 4.1.3'." – TeMPOraL

But that comment is not in the user TeMPOraL's comment history.

Sentiment analysis is also faulty.

For example:

> "I’d bet most users just 50/50 it, which actually makes it more remarkable that there was a 56% selection rate." – jackbrookes – This quip injects humor into an otherwise technical discussion about evaluation metrics.

It's not a quip though. That comment was meant in earnest

breakingcups · 2025-02-01T12:25:30 1738412730

In addition to that, it has a section dedicated all to Model Naming and Branding Confusion, but then it puts the following comment in the Performancce and Benchmarking section, even though the value of the comment is ostensibly more to do with the naming being a hindrance rather than make a valuable remark on the benchmarking, which is more of a casualty to the naming confusion:

"The model naming all around is so confusing. Very difficult to tell what breakthrough innovations occurred." – patrickhogan1"

romanhn · 2025-02-01T01:18:58 1738372738

That's funny, the quote exists, but it got the user wrong.