Off-topic question, but can some experts tell me why it is safe for `strlen()` and friends to use vector instructions when they can technically read out of bounds?
Essentially because memory mappings and RAM work at page granularity, rather than bytes. If a read from in-bounds in a page isn't going to fault, a read later in the same page isn't going to fault either (even if it is past the end of the particular object).
Implementations using 32- or 64-byte (256 or 512 bit) vector extensions would run afoul of 16-byte granularity. While it is not common yet, ARM SVE allows vector sizes larger than 128 bits -- e.g., Graviton3 has 256-bit SVE and Fujitsu A64FX has 512-bit. (x86 has had 256 and 512 bit vector instructions for some time, but current CHERI development seems to be on ARM.)
I think you might be confusing the tracking of validity of capabilities themselves (which could indeed be at a 16 byte granularity for an otherwise 64-bit system) with the bounds of a capability, which can be as small as 1 byte.