I don't see any loops, but there are a number of branches. The code could probably be generalized using loops to support arbitrary precision, but I think any optimized implementation for a specific precision will have unrolled them.
> I wonder how fast it'd be to convert to string and count digits.
When you convert the number to a string you're really transforming it to a decimal format. Which is the domain where you should be solving the problem. Otherwise you're doing some sort transformation in the binary domain and then hopping to pull the answer out of a hat when you do the final convertion to decimal.
Regardless whether they contain a logarithm instruction or not, how may architectures are there these days. Outside of truly embedded computing I can only come up with 2: Intel and ARM. Counting POWER and RISCV is probably a bit of a stretch already.
x86 has two logarithm instructions, FYL2X and FYL2XP1.
FYL2X takes two arguments, Y and X, and computes Y log2(X).
FYL2XP1 takes two arguments, Y and X, and computes Y log2(X+1).
As you note, x86 and ARM are by far the most used, and I'd guess that when it comes to Java you are more likely to be running on x86 than ARM, so I figured it was arguable to say "many" when the only one I was sure had a logarithm instruction was x86.
Those x86 instructions are “legacy floating point” instructions. As in, the x87 FPU. Benchmarks I’ve seen seem to indicate that the x87 “coprocessor” is slow compared to the SSE/AVX FPUs, and only exists for backwards compatibility. I don’t think SSE/AVX has a logarithm instruction, sadly, but there are intrinsics for them: `_mm256_log_pd` for example. Considering that intrinsic generates a “sequence” instead of a single instruction, I’d be curious how it compares to x87.