Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People could be flipping a coin and the score would be the same.


A 12% margin is literally the opposite of a coin flip. Unless you have a really bad coin.


You're being downvoted for 3 reasons:

1) Coming off as a jerk, and from a new account is a bad look

2) "Literally the opposite of a coin flip" would probably be either 0% or 100%

3) Your reasoning doesn't stand up without further info; it entirely depends on the sample size. I could have 5 coin flips all come up heads, but over thousands or millions it averages to 50%. 56% on a small sample size is absolutely within margin of error/noise. 56% on a MASSIVE sample size is _statistically_ significant, but isn't even still that much to brag about for something that I feel like they probably intended to be a big step forward.


I'm a little puzzled by your response.

1. The message was net-upvoted. Whether there are downvotes in there I can't tell, but the final karma is positive. A similarly spirited message of mine in the same thread was quite well receive as well.

2. I can't see how my message would come across as a jerk? I wrote 2 simple sentences, not using any offensive language, stating a mere fact of statistics. Is that being jerk? And a long-winded berating of a new member of the community isn't?

3. A coin flip is 50%. Anything else is not, once you have a certain sample size. So, this was not. That was my statement. I don't know why you are building a strawman of 5 coin flips. 56% vs 44% is a margin of 12%, as I stated, and with a huge sample size, which they had, that's massive in a space where the returns are deep in "diminishing" territory.


I wasn't expecting for my comment to be red so literally but ok.

We're talking about the most cost-efficient model, the competition here is on price, not on a 12% incremental performance (which would make sense for the high end model).

To my knowledge deepseek is the cheaper service which is what matters on the low-end (unless the increase in performance was in such magnitude that the extra-charge would be worth the money).


What does deepseek have to do with a comparison between o1-mini and o3-mini?


I'm not sure I follow - your assertion was that 12% is significative.

I personally chose for price on a low-cost model (unless the improvement is to significant that it justifies the higher price).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: