This just feels like an old man yelling at clouds, trying to convince himself that the grass in greener in the other field.
> No chemist wakes up and decides to call it “Steve” because Steve is a funny name and they think it’ll make their paper more approachable.
This happens every day. In every scientific field there is a technical name and then the name everyone remembers. Nobody will understand if I speak about ENSG00000164690, but if I say it's the Sonic HedgeHog gene then it starts to make sense, because funny names are memorable.
> awk (Aho, Weinberger, Kernighan; the creators’ initials)
I'd like to see anyone try to defend how using the creator's initials in a tool name describe its function. Unless you researched the tool's history, there is no way to know that.
Yet another "why the tools I use are the best and the tools you use suck", with a weird focus on naming instead of function.
That's why I put all my projects as subdomains of the only one I've bought, this way they just hang in there. So even though rumengol.net has nothing (procrastination, yay), there are 5 (soon 6) subdomains with active or dead side projects in them.
The issue is that they claim that you don't need an extensive amount of data to do efficient reasoning. But that alone is a bit misleading, if you need a massive model to fine tune and another one to piece together the small amount of data.
I've seen the textbook analogy used, but to me it's like a very knowledgeable person reading an advanced textbook to become an expert. Then they say they're better than the other very knowledgeable persons because he read that manual, and everyone can start from scratch using it.
So there's nothing wrong with making a more efficient model from an existing one, the issue is concluding you don't need all the data that made the existing one possible in the first place. While that may be true, this is not how you prove it.
> The issue is that they claim that you don't need an extensive amount of data to do efficient reasoning.
they claim that efficient reasoning can be achieve by applying a small set of SFT samples. how that sample set is collected/filtered is irrelevant here. they just reported the fact that this is possible. this by itself is a new and interesting finding.
I completely agree with the point made here.
Apart from the research controversial in the paper, however, from an engineering practice perspective, the methodology presented in the paper offers the industry an effective approach to distill structural cognitive capabilities from advanced models and integrate them into less competent ones.
Moreover, I find the Less-Is-More Reasoning (LIMO) hypothesis particularly meaningful. It suggests that encoding the cognitive process doesn't require extensive data; instead, a small amount of data can elicit the model's capabilities. This hypothesis and observation, in my opinion, are highly significant and offer valuable insights, much more than the specific experiment itself.
A basic webring which has two purpose: improve my rust with a simple project and bring together an online community. It's not live yet, but I expect it to be by the end of the week.
It's not, but it's accepted as it is the theory that best fits the observations. It has holes, but not as much as others. It will continue to be the accepted model until another one is an even better fit to the data or we can prove/disprove the existence of dark matter.
> No chemist wakes up and decides to call it “Steve” because Steve is a funny name and they think it’ll make their paper more approachable.
This happens every day. In every scientific field there is a technical name and then the name everyone remembers. Nobody will understand if I speak about ENSG00000164690, but if I say it's the Sonic HedgeHog gene then it starts to make sense, because funny names are memorable.
> awk (Aho, Weinberger, Kernighan; the creators’ initials)
I'd like to see anyone try to defend how using the creator's initials in a tool name describe its function. Unless you researched the tool's history, there is no way to know that.
Yet another "why the tools I use are the best and the tools you use suck", with a weird focus on naming instead of function.