Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reduction to decision tree!

But I'm unclear how it actually run, and the article talks about the conversion and training but doesn't describe how it runs... I suppose because it's obvious to someone who has followed quantization.

Thinking out loud... if you have a model of just 1 and 0, my first thought is that the outputs are 1's and 0's but I think that's wrong. Instead it's a bunch of floats, and you multiply them by 1 or 0 (in a sense you are sampling the output of the previous layer?), add them up, and put the result through some activation function. And two-bit quantization sounds kind of similar, just with a _little_ scale, going from -1 to 2 instead of 0 to 1.

It seems kind of interesting that you now have a bunch of weights that are exactly 0, meaning you can assert something about what parameters and weights affect what parts of the output. Though in some sense the way they compress the weights down to one bit is also a heuristic you could use to interpret the original model... this just makes it clearer that in totality you are making a defensible simplification, because the end result is still a working model.

It also seems like you could make a lot of mathematical assertions about a one bit model that would be harder to make otherwise. Like maybe you could start thinking of a model as an equation, a _particular_ (though giant) equation, and look at its properties and consider symbolic transformations to that equation.



A comment I really liked on a previous post about ternary highlighted that what you are actually measuring with {-1, 0, 1} is inverse correlation, no correlation, and correlation.


I like the decision tree analogy




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: