(there is still a reduction in memory usage though (just not 24x):
> "Furthermore, Bop reduces the
memory requirements during training: it requires only one real-valued variable per weight, while the
latent-variable approach with Momentum and Adam require two and three respectively.")
This may be the status quo because of the so called "hardware lottery" which has historically been optimized for floating point. I'm speculating, but if hardware designers were instead only concerned about raw xnor density and throughput, we might end up with chips powerful enough that giant 1-bit nets could be trained purely through evolution.