Hacker Newsnew | past | comments | ask | show | jobs | submit | rosstaylor90's commentslogin

Thanks for feedback! Yes, we’re looking to improve quality in the coming months. Couple of notes:

- The initial use of data is distillation so we’re less bound by question quality (anything that evinces output diversity is good).

- But moving onto RL, we’ll need stronger quality. We have much better things planned both on data filtering and verification!

- Surprisingly, a lot of ML datasets actually look like this when you look under hood. We’re hoping having more eyeballs on it will help improve quality in long run over less transparent status quo!


I still don't understand why all the datasets have so many general knowledge questions and so much math, when so few people can do any of that stuff.

It makes sense for ASI research I suppose, but why are we trying to teach small models to do stuff almost no humans even try to do?

What happens if you train them with RAG context in the prompts and calculator calls in the CoT?


Many math questions are easy to verify, and it's a classic benchmark for reasoning -> so it's a good hill to climb.

I agree with your meta-point that better benchmarks testing more types of task would be good!


Happier now, upgraded the backend :) (co-creator here)


What's your AIME 2025 score? https://gr.inc/RJT1990/AIME2025/


The is the point of the AIME, it is a 3 hour closed book examination in which each answer is an integer number from 0 to 999 and should only depend on pre-calc...for a human with no calculator, notes, or internet access.

The concepts are heavily covered in the training corpus, and if people were allowed to take it more than once, with even a book let alone access to the internet it wouldn't be very hard.

Examples:

1) Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b.$

In the corpus: https://www.quora.com/In-what-bases-b-does-b-7-divide-into-9...

And one more:

3) https://artofproblemsolving.com/wiki/index.php/2025_AIME_I_P...

Is just the the number of ways to distribute k indistinguishable balls (players) into n distinguishable boxes (flavors, without exclusion, in such a way that no box is empty.

Thus in the corpus for any courses that need to cover combinatorial problems including physics, discreet math, logistics etc...

IMHO these concept classes from a typical AIME are so common, the scores you gave demonstrate that those models are doing no "general reasoning" at all and are actually failing at approximate retrieval.


I disagree, 10 years ago AIs nailing these types of competition would have been seen as very impressive. The fact goal posts can move on this now shows how much AI has progressed.

(Also the term “approximate retrieval” is a bad one - reasoning is inherently a process of chaining together associations. What matters is whether the reasoning reaches the right conclusions. Still some way to go, but already very impressive in tasks traditionally considered harbours of human reasoning!)


I disagree, 10 years ago AIs nailing these types of competition would have been seen as very impressive.

It would have been seen as witchcraft.


> What matters is whether the reasoning reaches the right conclusions

no, it doesn't. a broken clock is right twice a day, reasoning is about the journey more than the destination


RL has more than two steps...


Point is that reasoning is more about the conclusions. if your steps are wrong, your reasoning is wrong regardless of the conclusion. Poor reasoning is what could make an LLM conclude that 1 + 2 = 3 but what 2 + 1 = [some number other than 3]


No, the primary reason is that the UK is a heavily service-based economy. Quarantine impacts the service-based subset of the economy the most. Therefore it makes sense that the UK would be hard hit. A “nation of shopkeepers” suffers when the shops are closed...


Really? I'd have thought that stuff like financial, accounting and legal services (things the UK is good at) can be done via work from home rather better than e.g. car and machine parts manufacturing...


Yes, but all the restaurants (lunches), cafe shops (for their coffees during breaks), etc aren't frequented as much because everyone is working from home.

Probably the more clientele in the employee's living neighbourhood doesn't equal it out. While I shop more at the local diary store I don't buy lunches etc.


The UK doesn't have a uniquely large number of restaurants and cafes, nor are the UK's cafes and restaurants uniquely affected by the lockdowns.

When they say the UK has a strong service industry that drives exports etc they don't mean cafes and bars they mean financial, legal etc services. Those should have stayed strong. I suspect some of the decline is actually Brexit related decline in financial services.


Brexit has been in the works for years and the economy was growing before COVID. The obsession with blaming all misfortune on Brexit is tiring.

The UK is hard hit because it went from a fairly moderate response to swinging hard into a deliberate policy of panicking the population as much as possible. The turning point was Imperial College / Ferguson predicting millions of deaths if there wasn't hard lockdown immediately. The UK's SAGE committee is largely responsible for the resulting devastation, including deaths: they literally told people they had a quasi-patriotic duty to stay away from hospitals. This is almost certainly why England is one of the few places in the world to report excess mortality amongst young adults. They have been opening and re-closing parts of the country with only hours of notice ... given on Twitter. Even Trump hasn't done that.

They even used imagery straight from 28 Days Later:

https://cdni0.trtworld.com/w960/h540/q75/76203_20200408T1313...

This was largely a consequence of SAGE "experts" telling the government they had to deliberately create fear in the population.

Is it any wonder the economy has been crushed when the government deliberately set out to make the population think COVID is the Rage virus?


A no deal hard Brexit was not obvious before the last election.

I said a percentage of the decrease. Obviously Brexit has not caused a 20% economic drop.


In the UK the hospitality industry as whole contributes 3.9% of GDP. Which is higher than similar European economies.


Ed Thorp’s Princeton/Newport Partners was the first great quant hedge fund.


Highly recommend Poundstone’s Fortune’s Formula. Claude Shannon and Ed Thorp. This is the book (along with A Non-Random Walk Down Wall Street by Lo and MacKinlay) that got me interested in finance.


Jim Simons/Renaissance were the first great quant hedge fund. For the last 30 years their annualized returns, after fees, is 39%.


The medallion fund began at Axcom under Elwyn Berlekamp and RenTech bought it.

https://en.wikipedia.org/wiki/Elwyn_Berlekamp https://en.wikipedia.org/w/index.php?title=Axcom_Trading_Adv...


Rather more precisely, Simons hired Ax to pursue a dream he'd first had with Baum in the 70s, using math to predict markets. Ax had some success, but the Medallion Fund didn't take off until after he left. Berlekamp gets a little credit, but wasn't really there long enough to have major impact. If you want to know who did the real work, figure out who Simons paid the most.


Who did he pay the most?


And before fees, like 80-100%. But PNP came before RenTec and Medallion.


Yeah, came hear to post this. Recently read his autobiography "A Man For All Markets." Simply an incredible and creative man. From beating blackjack to roulette, to discovering Black Scholes before Black and Sholes.

PNP changed the game like no other. Really sad Giuliani and his thugs took him down.

Thorpe is epitome of the best of humanity: creative, fearless, and tenacious. His life story was really inspiring to me.

The book comes highly recommended.


> Thorpe is epitome of the best of humanity: creative, fearless, and tenacious.

I guess we see what we want to see.

There are a lot of little red flags littered throughout the book. One thing that rankled me was his assertion that he independently discovered Black-Scholes before Merton/Black/Scholes and offered as evidence a chart that post-dated the original paper. There's a lot of stuff like that in the book.

Don't get me wrong, this is a person with many admirable characteristics. Just don't drink the Kool-Aid.


Ed Thorp's "Beat the Dealer: A Winning Strategy for the Game of Twenty-One" is one of the best books ever written about quant trading...despite not being about quant trading.


The guy whose math provided the nucleus of RenTech and the guy who invented the game of life wrote this:

https://www.amazon.com/Winning-Ways-Your-Mathematical-Plays/...

Vol. 4 begins with the study of the solitaire game that is on every Cracker Barrel table.


I'm from Suffolk - my school used to go here on school trips. The funny thing about the site is that it's basically just a hill in a field now, so needless to say it was a massive letdown for eight-year-old me who was expecting to see a massive Viking longboat. The tourist centre is nice though.


The helmet seems pretty impressive.


ATLAS ML | London, UK | Full Stack Developer | Remote | FULL-TIME

Our mission is to systemise the world's deep learning knowledge.

We are building a discovery platform and tooling for deep learning that links ideas with implementations, and allows everyone to access and apply the state of the art in AI. We are a team of researchers and engineers from the University of Cambridge, and our founders include an early core developer at Wikipedia and an open source machine learning developer.

We are looking for a full stack developer to help us develop our popular and growing platform. Our tech is built on Python (Django) and React. Additionally we have a number of other roles - including product and advocacy roles.

Learn more here www.atlasml.io or apply at https://angel.co/atlas-ml/jobs


I think your conclusion is too strong. Yes, we know bigger models and data generally lead to better performance (e.g. BigGAN results last year) but progress in architecture can still speed up progress in ML tasks. If we were still stuck using RNNs and LSTMs for language modeling we wouldn't be talking about this news today.


Philosophical Investigations - Wittgenstein. Just stunningly thoughtful and changed the way I interpret most things around me.


Cool example for a music video here: https://www.youtube.com/watch?v=-k9qDxyxS3s.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: