Thanks for feedback! Yes, we’re looking to improve quality in the coming months. Couple of notes:
- The initial use of data is distillation so we’re less bound by question quality (anything that evinces output diversity is good).
- But moving onto RL, we’ll need stronger quality. We have much better things planned both on data filtering and verification!
- Surprisingly, a lot of ML datasets actually look like this when you look under hood. We’re hoping having more eyeballs on it will help improve quality in long run over less transparent status quo!
The is the point of the AIME, it is a 3 hour closed book examination in which each answer is an integer number from 0 to 999 and should only depend on pre-calc...for a human with no calculator, notes, or internet access.
The concepts are heavily covered in the training corpus, and if people were allowed to take it more than once, with even a book let alone access to the internet it wouldn't be very hard.
Examples:
1) Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b.$
Is just the the number of ways to distribute k indistinguishable balls (players) into n distinguishable boxes (flavors, without exclusion, in such a way that no box is empty.
Thus in the corpus for any courses that need to cover combinatorial problems including physics, discreet math, logistics etc...
IMHO these concept classes from a typical AIME are so common, the scores you gave demonstrate that those models are doing no "general reasoning" at all and are actually failing at approximate retrieval.
I disagree, 10 years ago AIs nailing these types of competition would have been seen as very impressive. The fact goal posts can move on this now shows how much AI has progressed.
(Also the term “approximate retrieval” is a bad one - reasoning is inherently a process of chaining together associations. What matters is whether the reasoning reaches the right conclusions. Still some way to go, but already very impressive in tasks traditionally considered harbours of human reasoning!)
Point is that reasoning is more about the conclusions. if your steps are wrong, your reasoning is wrong regardless of the conclusion. Poor reasoning is what could make an LLM conclude that 1 + 2 = 3 but what 2 + 1 = [some number other than 3]
No, the primary reason is that the UK is a heavily service-based economy. Quarantine impacts the service-based subset of the economy the most. Therefore it makes sense that the UK would be hard hit. A “nation of shopkeepers” suffers when the shops are closed...
Really? I'd have thought that stuff like financial, accounting and legal services (things the UK is good at) can be done via work from home rather better than e.g. car and machine parts manufacturing...
Yes, but all the restaurants (lunches), cafe shops (for their coffees during breaks), etc aren't frequented as much because everyone is working from home.
Probably the more clientele in the employee's living neighbourhood doesn't equal it out. While I shop more at the local diary store I don't buy lunches etc.
The UK doesn't have a uniquely large number of restaurants and cafes, nor are the UK's cafes and restaurants uniquely affected by the lockdowns.
When they say the UK has a strong service industry that drives exports etc they don't mean cafes and bars they mean financial, legal etc services. Those should have stayed strong. I suspect some of the decline is actually Brexit related decline in financial services.
Brexit has been in the works for years and the economy was growing before COVID. The obsession with blaming all misfortune on Brexit is tiring.
The UK is hard hit because it went from a fairly moderate response to swinging hard into a deliberate policy of panicking the population as much as possible. The turning point was Imperial College / Ferguson predicting millions of deaths if there wasn't hard lockdown immediately. The UK's SAGE committee is largely responsible for the resulting devastation, including deaths: they literally told people they had a quasi-patriotic duty to stay away from hospitals. This is almost certainly why England is one of the few places in the world to report excess mortality amongst young adults. They have been opening and re-closing parts of the country with only hours of notice ... given on Twitter. Even Trump hasn't done that.
They even used imagery straight from 28 Days Later:
Highly recommend Poundstone’s Fortune’s Formula. Claude Shannon and Ed Thorp. This is the book (along with A Non-Random Walk Down Wall Street by Lo and MacKinlay) that got me interested in finance.
Rather more precisely, Simons hired Ax to pursue a dream he'd first had with Baum in the 70s, using math to predict markets. Ax had some success, but the Medallion Fund didn't take off until after he left. Berlekamp gets a little credit, but wasn't really there long enough to have major impact. If you want to know who did the real work, figure out who Simons paid the most.
Yeah, came hear to post this. Recently read his autobiography "A Man For All Markets." Simply an incredible and creative man. From beating blackjack to roulette, to discovering Black Scholes before Black and Sholes.
PNP changed the game like no other. Really sad Giuliani and his thugs took him down.
Thorpe is epitome of the best of humanity: creative, fearless, and tenacious. His life story was really inspiring to me.
> Thorpe is epitome of the best of humanity: creative, fearless, and tenacious.
I guess we see what we want to see.
There are a lot of little red flags littered throughout the book. One thing that rankled me was his assertion that he independently discovered Black-Scholes before Merton/Black/Scholes and offered as evidence a chart that post-dated the original paper. There's a lot of stuff like that in the book.
Don't get me wrong, this is a person with many admirable characteristics. Just don't drink the Kool-Aid.
Ed Thorp's "Beat the Dealer: A Winning Strategy for the Game of Twenty-One" is one of the best books ever written about quant trading...despite not being about quant trading.
I'm from Suffolk - my school used to go here on school trips. The funny thing about the site is that it's basically just a hill in a field now, so needless to say it was a massive letdown for eight-year-old me who was expecting to see a massive Viking longboat. The tourist centre is nice though.
ATLAS ML | London, UK | Full Stack Developer | Remote | FULL-TIME
Our mission is to systemise the world's deep learning knowledge.
We are building a discovery platform and tooling for deep learning that links ideas with implementations, and allows everyone to access and apply the state of the art in AI. We are a team of researchers and engineers from the University of Cambridge, and our founders include an early core developer at Wikipedia and an open source machine learning developer.
We are looking for a full stack developer to help us develop our popular and growing platform. Our tech is built on Python (Django) and React. Additionally we have a number of other roles - including product and advocacy roles.
I think your conclusion is too strong. Yes, we know bigger models and data generally lead to better performance (e.g. BigGAN results last year) but progress in architecture can still speed up progress in ML tasks. If we were still stuck using RNNs and LSTMs for language modeling we wouldn't be talking about this news today.
- The initial use of data is distillation so we’re less bound by question quality (anything that evinces output diversity is good).
- But moving onto RL, we’ll need stronger quality. We have much better things planned both on data filtering and verification!
- Surprisingly, a lot of ML datasets actually look like this when you look under hood. We’re hoping having more eyeballs on it will help improve quality in long run over less transparent status quo!