Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that if you're actually using CRCs in any serious way, you probably want to use CPU intrinsics rather than generic C code.


We used CPU intrinsics in version 9 of GNU coreutils, for the cksum utility. From the release notes:

"cksum [-a crc] is now up to 4 times faster by using a slice by 8 algorithm, and at least 8 times faster where pclmul instructions are supported."

Implementing that portably is a bit tricky, as one must consider:

  - support various compilers which may not support intrinsics
  - runtime checks to see if the current CPU supports the instructions
  - ensure compiler options enabling the instructions are restricted to their own lib to ensure the don't leak into unprotected code.
    - automake requires using a separate lib for this rather than just a separate compilation unit
BTW we also introduced avx intrinsics for `wc -l`


A lot of CRC's go into embedded, no builtins available. On PC's standard 8/16/32b CRC's mostly suffice.

For embedded, goodluck finding a 12-bit CRC algorithm, and even if you find it, it's for a non optimal, non Koopman CRC [1].

While rolling your own is doable, it's also a risky endeavor, as any error hurts the product like forever.

With this generator, the problem that non Koopman CRC's were, is finally solved.

[1] https://users.ece.cmu.edu/~koopman/crc/


> A lot of CRC's go into embedded, no builtins available.

Indeed. It's all about generating lookup tables from a CRC's polynomial definition.


On the atmega we usually use just the builtin AES coprocessor. Much faster and smaller than the SW crc.


You probably should be more specific than that.

The CRC is a polynomial / Galois field, and today's CPUs have polynomial multiply (aka: pmul on ARM), or carryless multiply (aka: PCLMULQDQ on x86). These instructions can implement the "tough" part of the CRC in just one clock tick.


That's my point. Your CRCs will be much faster if you use the cpu-specific instructions rather than relying on generic C code.


Hardware CRC peripherals are often limited to a single polynomial or are restricted in how they invert/reflect the data.


Modern processors often contain polynomial multiply instructions or similar, which can be used for arbitrary CRC polynomials.


the C code looks really nice and very generic, and, it being a generator... would it be too difficult to make use of those CPU instructions during the code generation? (then it could even create architecture-specific code, which looks like a plus to me)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: