It's infinitely faster if you have control over the data layout in your application - which most likely means your are developing your application in C/C++, or maybe Rust. In the case of JS you don't have, so accessing and writing the data is slow (since you would need [de]deserialization there to write it in a byte array). In fact any serializations in JS are slower than JSON, since this is natively implemented in the JS VMs while others are not.
No no, this is a common misunderstanding about Cap'n Proto. It does not take your regular in-memory data structures from your regular programming language (even C++) and put them on the wire. What it does is defines its own specific data layout which happens to be appropriate both for in-memory random-access use and for transmission.
Cap'n Proto generates classes which wrap a byte buffer and give you accessor methods that read the fields straight out of the buffer.
That actually works equally well in C++ and Javascript.
Ok, but these accessor methods will still have a very different performance. In C++ copying a UTF8 string from one byte array into an std::string is super fast. Whereas in JS it's really slow, since you need to read from an ArrayBuffer, convert code points to UTF16 in JS and then store these in a string (which is not efficient, since the strings are immutable). In node you could at least speed that up through some native extension, but even then it would most likely be slower than JSON. In the browser it would be in any case. But that's really a speciality of JS.
In general I then think the difference (for non-C++) between your method and others (protobuf, thrift, ...) is that yours would require the cost of a field serialization in the moment the field is accessed. In others all fields are deserialized at once. But in the end it should have the same cost if I need all fields, e.g. in order to convert the data into a plain Java/Javascript/C#/... object, or am I missing something there? For C++ is absolutely believe that you can have a byte-array backed proxy-object with accessor methods that have the same properties as accessing native C++ structures.
Even if you access every field, Cap'n Proto's approach should still be faster in theory because:
- Making one pass instead of two is better for the cache. When dealing with messages larger than the CPU cache, memory bandwidth can easily be the program's main bottleneck, at which point using one pass instead of two can actually double your performance.
- Along similar lines, when you parse a protobuf upfront, you have to parse it into some intermediate structure. That intermediate structure takes memory, which adds cache pressure. Cap'n Proto has no intermediate structure.
- Protobuf and many formats like it are branch-heavy. For example, protobuf likes to encode integers as "varints" (variable-width integers), which require a branch on every byte to check if it's the last byte. Also, protobuf is a tag-value stream, which means the parser has to be a switch-in-a-loop, which is a notoriously CPU-unfriendly pattern. Cap'n Proto uses fixed widths and fixed offsets, which means there are very few branches. As a result, an upfront Cap'n Proto parser would be expected to outperform a Protobuf parser. The fact that parsing happens lazily at time of use is a bonus.
All that said, it's true that if you are reading every field of your structure, then Cap'n Proto serialization is more of an incremental improvement, not a paradigm shift.