Which algorithm is best is use-case dependent. As of now, zstd offers best-in-class compression for a wider variety of use cases than lz4. lz4 (created by the same author) still wins for high-throughput software compression, yes. But zstd --fast 4 or 5 are getting pretty close.
It's not obvious to me what the relevant measurements are on the zstd side, but I'm pretty sure lz4 wins considerably where code size and RAM footprint are major considerations, as in some bootloader and embedded firmware situations.