With the smaller model size, quantized models have less accuracy and less stable training, but large models take advantage of increased parallelism.
Binary search is more linear, circuits are a better lens than turing machines.
While it still needs to be verified, if you look into what a uniform consent depth threshold circuit is, that will help.
Although I guess binary weights may be in AC0 and not TC0, but that may not hold with billions of parameters.
With the smaller model size, quantized models have less accuracy and less stable training, but large models take advantage of increased parallelism.
Binary search is more linear, circuits are a better lens than turing machines.
While it still needs to be verified, if you look into what a uniform consent depth threshold circuit is, that will help.
Although I guess binary weights may be in AC0 and not TC0, but that may not hold with billions of parameters.