It sounds really well designed. Great job Cap'n Proto, I've been following your work for some time!
I've also built another message format recently (yes I know, they are never ending). It can't do everything Cap'n Proto can do, although it shares a lot of the same values. One thing I chose to do is to have order preservation on types, which can be very useful. It does mean that the wire format is largely BE though. Anyway, that's an aside.
I'm curious what you think of Amazon ION by the way?
I haven't looked at Ion before, but it appears to be similar to BSON or Msgpack in that it's a binary format that encodes field names as textual identifiers, to be "self-describing" and avoid the need for an external schema. I generally like schemas, because I like static typing. Of course, static types vs. dynamic types are another ancient flamewar and I'm unlikely to cover any new ground by stating arguments here. :)
Schemas are IMO the only way to go when you are transferring between heterogenous languages, even when all languages involved are untyped.
Consider javascript talking to common lisp. Of course JSON has a canonical mapping to javascript, but it does not for common lisp. Should a JS array be a lisp list or vector? Should lisp's NIL be false or null? Should a JS object decode to an alist, plist, or hash-table? &ct.
Interesting; I've always written &ct, which wikipedia informs me is considered "archaic" Thanks for that note, it will save me one character of typing 3 years from now when I've finally broken the muscle memory of the old way :)
Wikipedia probably mentioned it, but & is a combination of "e" and "t", or "et" - french (or perhaps latin?) for "and". Thus &c is short for "etc", or "et cetera". I wasn't aware of the form "&ct", I presume it stands for "ET CeTera", but I'm not entirely clear on why that extra "T" would ever make sense. I doubt anyone ever used "etct"?
I think there is a lot in common between Common Lisp, Python and JavaScript in general.
For many years I was in the schemaless camp before JS came along. Then for a number of years I was in the self-describing camp because I was thinking that if we don't accept JavaScript and JSON are pretty fundamental on the web we're fools and everyone seemed to be passing around JSON. So in that period was thinking that MsgPack was pretty damn good.
Recently I've switched back to the schemaless view, but with strict order preservation and richly typed fields. Very "tuple" based... so works well with Lisp and JavaScript but also C++. Highly inspired by Linda.
I don't do what Cap'n Proto does and lay out the fields and all that good stuff so that you can kind of memory map it onto structs.
That is nice and I understand the motivation for sure, and I have worked on systems that do that in the past with very good results, but currently my thinking is that compactness without additional compression is a good balance.
Also, since the protocol is order preserving (where it matters) you can do radix sort operations or hash maps on the server side extremely quickly. That was the ultimate motivating factor.
Inner tuples or BLOB's are length prefixed of course so you can skip them, but basic types and strings are not. Strings are zero terminated while Ints/Floats/Doubles are BE and complement encoded to preserve sorts, and also Integers are packed to minimum size.
Memory mapping is possible on this system too, but it's very "functional"... there's no attempt at pointer preservation. I don't go that far and Cap'n P seems to be preserving some of the semantics of ProtoBuf at least in that regard, which I'm sure is a good thing for many scenarios.
You could still do that with what I'm doing but it would be at the application level. Same with self-description actually, you could easily build something that looks like JSON if you wanted... either:
[["foo",1.0],["boo","cat"]...] etc
or
[[1.0,"cat"],["foo",boo"]
the choice is really up to the application programmer.
That might be a bit "loose" for a lot of people to stomach, but it works very well for what I'm doing, has a lot of flexibility and packs really well
Like I was suggesting, protocols are something of an art and I don't think we're at a final solution yet, which is why many people are constantly inventing new ones :-)
Also I might add that I very quickly became quite disenchanted with the lack of types in JSON; and I'm sure many of us have come to that same conclusion.
Really hope that it can be fixed at some point, but not holding my breath on that one.
I think it will take a major effort to reform that format although I am hopeful now that we at least have UInt8Array and friends that are starting to expose a broader set of machine friendly types.
I've actually seen people use types with JSON. JSON objects are clearly more than sufficient to represent the data you need, and you can enforce the schema in a library. It's just a much less efficient form of something like protobufs.
I think it depends on what data you need based on the problem space, but yes it does suffice for many use cases.
The thing is that it is least common denominator, and when you are dealing with high perf, cross language systems, it really isn't a good wire format or storage format.
It takes ages to parse, it's lossy, lacks commonly used types (or you have to annotate it with non standard attributes)... or worse guess the intention, and it's pretty verbose.
But again, that said it is a widely used standard and one that we have to live with. So there is that.
I like schemas for validation and strong types. I also like self-describing systems when all goes to hell and I'm debugging either my own protocols or someone else's.
I have this crazy idea that there's a middle ground: self-describing and schemas?
We can kind of glue this together (there's lots of json schemas floating around out there now, it seems), but it would be awfully interesting to see these well-supported as a pair (with explicitly language agnostic schema definitions -- which seems to be a sticking point for most of the json schema strapons).
Doesn't really have an implementation-agnostic schema spec, as far as I'm aware.
(I use CBOR a lot -- I'm otherwise quite happy with it!)
EDIT: I guess there's a "CDDL" listed on the tools page, but... It's still single implementation (ruby) and I don't see a clear link to a grammar for it.
Hahaha very true let's not go there. Anyway yes I'm a big fan of richly typed schema-less formats as well, even better when they can be read easily by both JS stuff and C++.
I've also built another message format recently (yes I know, they are never ending). It can't do everything Cap'n Proto can do, although it shares a lot of the same values. One thing I chose to do is to have order preservation on types, which can be very useful. It does mean that the wire format is largely BE though. Anyway, that's an aside.
I'm curious what you think of Amazon ION by the way?
Good stuff!