No, because solving a well defined problem with well defined right or wrong is generally not what people use llm for. Most of the times my query to llm is underspecified, and lot of time I figure out the problem when chatting with LLM.
And benchmark by definition only measures just right/wrong answer.
While I don't think it's technically hate speech, but yeah bluesky is either just unanimous negativity or positivity for different things, and it feels like groupthink. There is no middle ground.
It hasn't avoided that fate. If you doubt me, go into a thread about US politics and praise Donald Trump and watch as your comments get not just downvoted, but flagged so that they are hidden. This will happen no matter how good your arguments are.
Make no mistake, this site is ideologically polarized just like all the others. The only saving grace is that the vast majority of topics are about tech, not politics, so the polarization is usually hidden.
Even in tech, there are various repeated themes you can see here, which are much more prevalent than say general tech spheres like the hate against AI, VCs, MongoDB or the love for Rust, privacy, open source, postgres etc.
There’s arguably a moral difference between having your stuff stolen, and giving it up as part of the TOS. Most artists, you’d assume, would prefer the former.
I don't know why HN has this wild idea that a potential WW3 costing millions of lives would be any way altered because TSMC is ~3-4 years ahead of Samsung/Intel. Entire world would lag by years if not decade if US defends Taiwan in case of full on attack.
I tracked ELO rating in Chatbot Arena for GPT-4/o series models over around 1.5 years(which are almost always highest rated), and at least on this metric it not only seems to be not stagnated, but also growth seems to be increasing[1]
Something seems quite off with the metric. Why would 4o recently increase on itself at a rate ~17x faster than 4o increased on 4 in that graph? E.g. ELO is a competitive metric, not an absolute metric, so someone could post the same graph with the claim the cause was "many new LLMs are being added to the system are not performing better than previous large models like they used to" (not saying it is or isn't, just saying the graph itself doesn't give context that LLMs are actually advancing at different rates or not).
Chatbot arena also has H2H win rate for each pair of models for non tied results[1], so as to detect the global drift. e.g the gpt-4o released on 2024/09/03 wins 69% of the times with respect to gpt-4o released on 2024/05/13 in blind test.
I tracked ELO rating in Chatbot Arena for GPT-4/o series models over around 1.5 years(which are almost always highest rated), and at least on this metric it not only seems to be not stagnated, but also growth seems to be increasing[1]
GPT-4 was released on March 2023. Before this there was almost no good instruction tuned models except 3.5 which was a different class of model, so nothing to compare to.
GPT has some self understanding. On asking why it uses that name, it at least gave the type of qualities correctly.
> It sounds like you're referring to a story or narrative that I've generated or discussed involving a character named Aldric. If this is the case, Aldric would likely be used as a character who embodies leadership, wisdom, or noble traits due to the name's meaning and historical connotations. Characters named Aldric might be portrayed as experienced leaders, wise sages, or key figures in a fantasy or historical context.
For me, everything important I have could be accessed from browser(as I do full system backup) and the cookie I have in browser could allow the app to access my data. How does QubesOS help in this scenario?
Such a great video. Changing from sphere to cone in proving Monge's theorem makes the proof so much better, and way easier to visualize. I guess the proof hasn't caught up in other places is because if the proof is in writing sphere could be visualized first or the sphere gives more aha feeling.
reply