I haven't yet read the Moller and Granlund paper, but its narrowing division usi...

I haven't yet read the Moller and Granlund paper, but its narrowing division using precomputed reciprocal would be a natural fit for libdivide. (libdivide does have a narrowing divide, but it is Algorithm D based).

Regarding the second question, it is possible to be off by 2. Consider (base 10) 500 ÷ 59. The estimated quotient qhat is 50 ÷ 5 = 10, but the true digit is 8. So if our partial remainder is 50, we'll be off by 2 in the second digit.