Why is not intended that in-memory state be in Cap'n Proto / Protobuf objects? W...

kentonv · on May 3, 2017

The classes generated by Cap'n Proto and Protobuf are 100% public and are limited to the exact structures supported by the respective languages. That means that if you decide one day that your state needs to include, say, a queue, or if you want to encapsulate some of your state to give a cleaner API to callers, you can't, unless you go all the way and wrap everything. Inevitably if you've been building up your internal APIs in terms of protobuf/capnp types all along then you're going to be resistant to rewriting it and will instead probably come up with some ugly hack instead, and over time these hacks will pile up.

With that said, using protobufs for internal state is not an uncommon practice and if you don't care about cleanliness and just want to pound out some code quickly, sometimes it can work well.

Cap'n Proto has an additional disadvantage here in that its zero-copy nature requires arena allocation, in order to make sure all the objects are allocated contiguously so that they can be written out all at once. This actually make allocation memory for Cap'n Proto object much faster than for native objects -- but you can't delete anything except by deleting the entire message. So if you have a data structure that is gradually gaining and losing sub-objects over time, in Cap'n Proto you'll see a memory leak, as the old objects aren't freed up. You can work around this by occasionally copying the entire data structure into a new message and deleting the old one -- essentially "garbage collecting". But it's rather inconvenient.

This is actually one reason I want to extend the Cap'n Proto C++ API to generate POCS (Plain Old C Structs) for each type, in addition to the current zero-copy readers/builders. You could use the POCS for in-memory state that you mutate over time, then you could dump it into a message when needed (requiring one copy, but it should still be faster than protobuf encoding).

https://capnproto.org/roadmap.html#c-capn-proto-api-features

rkv · on May 3, 2017

After integrating protobufs in my application for messaging I decided to use a separate schema for storing the current state of the program. Ie. When state changes, the protobuf is updated and written to disk. When the program restarts, the state file is loaded into memory. I have not run into any problems doing this.

Edit: Your question is addressed here: https://news.ycombinator.com/item?id=14249367

na_ka_na · on May 3, 2017

Thanks, but I don't follow how that comment addresses my question. Is it that cost of constructing Cap'n Proto / Protobuf is quite a bit higher than constructing objects defined natively?

kentonv · on May 4, 2017

> Is it that cost of constructing Cap'n Proto / Protobuf is quite a bit higher than constructing objects defined natively?

I discussed in more detail in reply to your first post, but just to be really clear on this:

No. In fact, for deeply-nested object trees, constructing a Cap'n Proto object can often be cheaper than a typical native object since it does less memory allocation. However, there are some limitations -- see my other reply.

(Constructing Protobuf objects, meanwhile, will usually be pretty much identical to POCS, since that's essentially what Protobuf objects are.)

There is a common myth that Cap'n Proto "just moves the serialization work to object-build time", but ultimately does the same amount of work. This is not true: Although you could describe Cap'n Proto as "doing the serialization at object build time", the work involved is not significantly different from building a regular in-memory object.