Show HN: Hlb-CIFAR10 0.2.0: New world record (~<12.38s) on single-GPU CIFAR10

karmakaze · on Jan 16, 2023

For anyone else who isn't familiar with what CIFAR-10[0] is about.

I imagine this means that the entire training set was ingested in 12.4 sec.

[0] https://en.wikipedia.org/wiki/CIFAR-10

tysam_and · on Jan 16, 2023

My apologies! Titles are hard. Indeed, this is training to 94% accuracy in only ~12.3-~12.4 seconds on a single GPU (so human-affordable), which gets us closer to massive scaled testing for neural network phenomenology.

Previously a number of years ago (maybe 4-5), the world record was ~10 minutes, so quite a lot of increase! I've picked up the torch as nearly everyone has moved on from it, but finishing a near-optimal version of this benchmark will transfer to transformers and whatever training techniques come after them in the future.

I hope to do similarly for language models in the future, but writing it from scratch (I've only started) is a haul. But this is also an entirely hackable codebase, so you can change anything from the dataloader to the final SGD step call within a few minutes (without having to do several minutes - hours of debugging or rewriting).

I really appreciate you linking the CIFAR10 explanation.

markisus · on Jan 16, 2023

It looks like not only was the data ingested, but the neural network was trained from scratch up to 94% (?) accuracy in that time.

tysam_and · on Jan 16, 2023

For the original competition, you can check out https://dawn.cs.stanford.edu/benchmark/#cifar10 . Special note is that these are multi-GPU setups which are much more complicated and out of the reach of most average Joes. Also, they're a pain to engineer around (though necessary at scale, I suppose).

96% is another candidate but much harder, and I think robs the 4-6x speedup in experimentation cycle times that 94% requires.

karmakaze · on Jan 16, 2023

Yes, sorry I assumed it would be understood that it did the training and maintained quality.

The irony of pointing out missing context and providing insufficient context myself.