Very excited about this. I was programming an ESP32 for a project recently and was like, computer chips are fast enough, why can't I just write TypeScript?
Speculative decoding does this to an extent - using a smaller model to generate its own predictions and putting them in the batch of the bigger model until they diverge
It doesn’t. It simply trades compute efficiency by transposing matrix multiplications into “the future.” It doesn’t actually save FLOPs (uses more) and doesn’t work at large batch size
Does anyone even care? Really, who cares? The truth is nobody cares. Saving FLOPs does nothing if you have to load the entire model anyway. Going from two flops per parameter to 0.5 or whatever might sound cool on paper but you're loading those parameters anyway and gained nothing.
FUTO is an organization dedicated to the development of software that returns control of computers and technology to the people. We’re particularly interested in giving people more privacy and control from big tech. We also give grants to open source projects in line with our mission (see our site).
We just launched Grayjay, an Android app with a universal subscription feed for all creator video platforms that lets creators and audiences be less reliant on a single platform, and makes directly supporting creators easier than ever.
We're hiring an electron/vue and/or android engineer to help bring the product to more people
FUTO | https://futo.org | Austin, TX or Remote | Full time and interns
FUTO is an organization dedicated to the development of software that returns control of computers and technology to the people. We’re particularly interested in giving people more privacy and control from big tech. We also give grants to open source projects in line with our mission (see our site).