> BLAKE is faster when using software to perform the hash
Is BLAKE 3 still faster than sha-256 when using the cpu speciliazed instructions? I think most modern desktop CPUs has built-in instructions for SHA256.
I’m guessing when people compare BLAKE 3 to SHA 256 they’re comparing software to software, but this wouldn’t be the case in reality?
I haven’t seen any benchmarks for BLAKE3 vs. the Intel/AMD SHA extensions. My guess is that Intel hardware accelerated SHA-256 will be faster than BLAKE3 running in software for most real world uses.
I can tell you this much: It is only with Ice Lake, which was released in the last year, that mainstream Intel chips finally got native hi speed SHA-NI support. Coffee Lake and Comet Lake, which are still the CPUs in a lot of new laptops being sold right now, do not support SHA-NI.
It's possible that Blake3 might be faster than accelerated SHA-256 on large inputs, where Blake3 can maximally leverage its SIMD friendliness. OTOH, Blake3 really pushes the envelope in terms of minimal security margin. Performance isn't everything. SHA-3 is so slow because NIST wanted a failsafe.
NOTE: /proc/cpuinfo shows sha_ni detection, and the apt-get source of this version of OpenSSL confirms SHA extension support in the source code, but I didn't confirm that it was actually being used at runtime.
This is based on the parent’s numbers with a fudge factor to account for Blake3 being a faster version of blake2s256 (i.e. the 32-bit version of Blake2 which is the only version in Blake3)
Of course, this does take in to account that Blake3 has tree hashing and other modes which scale better to multiple cores.
(Edit: update figures; I need to scale up Blake2s256 not Blake2b512)
The BLAKE3 tree mode also takes advantage of SIMD parallelism on a single core, which ends up being a larger effect than the reduced number of rounds. At 2-4 KiB of input (depending on the implementation) it's 2x faster than BLAKE2s on my laptop. Where AVX2 and AVX-512 are supported, those kick in at 8 KiB and 16 KiB of input respectively, widening the difference further. The red bar chart at https://github.com/BLAKE3-team/BLAKE3 is a single-threaded measurement on a machine that supports AVX-512.
Is BLAKE 3 still faster than sha-256 when using the cpu speciliazed instructions? I think most modern desktop CPUs has built-in instructions for SHA256.
I’m guessing when people compare BLAKE 3 to SHA 256 they’re comparing software to software, but this wouldn’t be the case in reality?