Even if you access every field, Cap'n Proto's approach should still be faster in...

Even if you access every field, Cap'n Proto's approach should still be faster in theory because:

- Making one pass instead of two is better for the cache. When dealing with messages larger than the CPU cache, memory bandwidth can easily be the program's main bottleneck, at which point using one pass instead of two can actually double your performance.

- Along similar lines, when you parse a protobuf upfront, you have to parse it into some intermediate structure. That intermediate structure takes memory, which adds cache pressure. Cap'n Proto has no intermediate structure.

- Protobuf and many formats like it are branch-heavy. For example, protobuf likes to encode integers as "varints" (variable-width integers), which require a branch on every byte to check if it's the last byte. Also, protobuf is a tag-value stream, which means the parser has to be a switch-in-a-loop, which is a notoriously CPU-unfriendly pattern. Cap'n Proto uses fixed widths and fixed offsets, which means there are very few branches. As a result, an upfront Cap'n Proto parser would be expected to outperform a Protobuf parser. The fact that parsing happens lazily at time of use is a bonus.

All that said, it's true that if you are reading every field of your structure, then Cap'n Proto serialization is more of an incremental improvement, not a paradigm shift.