Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The irony is that a single log computation is going to take longer than the loop. (No idea if implementing a log approximation involves loops either.)


https://code.woboq.org/userspace/glibc/sysdeps/ieee754/dbl-6...

I don't see any loops, but there are a number of branches. The code could probably be generalized using loops to support arbitrary precision, but I think any optimized implementation for a specific precision will have unrolled them.


Sounds like textbook example of when theory is misaligned with reality.


Waiting for someone to post some fast-inverse-sqrt-esque hack to compute the logarithm. Although in Java that's probably not likely to be faster.

I wonder how fast it'd be to convert to string and count digits.


> I wonder how fast it'd be to convert to string and count digits.

When you convert the number to a string you're really transforming it to a decimal format. Which is the domain where you should be solving the problem. Otherwise you're doing some sort transformation in the binary domain and then hopping to pull the answer out of a hat when you do the final convertion to decimal.


Many architectures include a logarithm instruction. Does Java use that if available? Would it make a difference?


Many architectures? What would they be?

Regardless whether they contain a logarithm instruction or not, how may architectures are there these days. Outside of truly embedded computing I can only come up with 2: Intel and ARM. Counting POWER and RISCV is probably a bit of a stretch already.


x86 has two logarithm instructions, FYL2X and FYL2XP1.

FYL2X takes two arguments, Y and X, and computes Y log2(X).

FYL2XP1 takes two arguments, Y and X, and computes Y log2(X+1).

As you note, x86 and ARM are by far the most used, and I'd guess that when it comes to Java you are more likely to be running on x86 than ARM, so I figured it was arguable to say "many" when the only one I was sure had a logarithm instruction was x86.


Those x86 instructions are “legacy floating point” instructions. As in, the x87 FPU. Benchmarks I’ve seen seem to indicate that the x87 “coprocessor” is slow compared to the SSE/AVX FPUs, and only exists for backwards compatibility. I don’t think SSE/AVX has a logarithm instruction, sadly, but there are intrinsics for them: `_mm256_log_pd` for example. Considering that intrinsic generates a “sequence” instead of a single instruction, I’d be curious how it compares to x87.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: