More

dweis · 2025-09-05T16:45:24 1757090724

I believe that the reason for this limitation is that not all languages can represent open enums cleanly to gracefully handle unknown enums upon schema skew.

dweis · on Dec 31, 2020

These are simply tradeoffs. For example, one can make very fast protobuf parsers/serializers if you prioritize that over the ergonomics of the generated code.

dweis · on May 27, 2019

Yes, they are both varint encoded on the wire. Refer to https://developers.google.com/protocol-buffers/docs/encoding...

keymone · on May 27, 2019

Sigh.

dweis · on May 27, 2019

Minor nit, but not necessarily. For basically all values of SomeMessage, dec should fail to parse due to improperly encoded UTF8 data for field 2 (modulo some proto2 vs. proto3 and language binding implementation differences).

Change field 2 to a bytes field instead of a string field and then yes.

shereadsthenews · on May 27, 2019

I should mention that I consider this a feature not a bug. The isomorphism permits an endpoint to use ‘bytes submessage_i_dont_need_to_decode’ to cheaply handle nested message structures that need to be preserved but not inspected, such as in a proxy application.

shereadsthenews · on May 27, 2019

True but UTF8 enforcements was quite absent in all implementations until proto3, and the empty string would be a special case.

dweis · on May 27, 2019

I don't think this is the case, or at least, I'd expect it to be a bug.

Protocol Buffers should generally be non-destructive of the underlying data. That means even if it encounters the wrong wire type for a field, it should simply retain that value in the unknown field set rather than discard it.

dweis · on Oct 10, 2018

How do you handle binary rollbacks and rollouts safely with an "everything is required" approach? Do you force binaries to roll out in a strict order with appropriate soak time at each layer? How does that affect developer velocity?

docker_up · on Oct 11, 2018

No, you create conversion routines that convert between different versions of structs. This keeps things well understood with no ambiguity. This is very easy, we were doing this 20 years ago and autogenerating the conversions using ANTLR that parsed the XDR files for ONC/RPC.

joshuamorton · on Oct 11, 2018

This doesn't make sense with the concept of rollbacks.

If I rollout server version 2 and client version 2 which each use a new required field, and then realize that there is some terrible error in server version 2, I can't roll it back to version 1, since it will reject all client calls from v2 clients.

The only way to make it work is to add a translation layer, as you suggest, on the server, wait a while, push the new 'required' client, wait a while longer, and then push the server without the translation layer. That's the "strict ordering with appropriate soak time" GP mentions.

docker_up · on Oct 11, 2018

Instead, you're going to get errors from the clients using version 2, because server version 2 was rolled back. You have to roll back the clients as well then.

Or you could have client version 2 know how to automatically convert to server version 1, because you're know what version the server is on, and you can convert your client parameters or even behavior to fit version 1.

You can't do this with protobufs because there is no such concept, you just add optional fields, and ignore them with different versions, and it's chaos.

joshuamorton · on Oct 11, 2018

>Instead, you're going to get errors from the clients using version 2, because server version 2 was rolled back. You have to roll back the clients as well then.

This depends on the update. If indeed the field was optional, you won't. A common example would be a field that is necessary for a new feature, but without which everything functions just fine, or functions with a minor degredation in experience.

But more importantly, you want to decouple features from api versions and communication protocols. It should be possible to enable an feature without rolling out a new version (for example, via a flipped flag). So a degraded communication protocol shouldn't actually impact anything. The application worked fine an hour ago, adding a new field won't break it.

>You have to roll back the clients as well then.

And if the clients are phones?

>Or you could have client version 2 know how to automatically convert to server version 1

This requires communicating with the server beforehand. Why should I have to roundtrip to negotiate which api version I should use? And what happens if you want to modify the protocol that you use to decide on api versions? Its turtles all the way down.

>because you're know what version the server is on, and you can convert your client parameters or even behavior to fit version 1.

So now, before I can roll out a server update, I need to roll out a client update that can make sure to negotiate back to a degraded experience until I update the server. Then I have to wait until that new client is rollback safe. Then I can update the server, then eventually I can update the client to remove the shim code. That's the "strict ordering with appropriate soak time" issue that you're still running into.

This has cascading consequences, each server/client update dance has to be mostly atomic, so you can only really do one of these dances at a time, and all clients have to be in sync. If a deep dependency service wants to make an api change, it has to wait until all of the clients are prepared before updating, and if any of those clients is the server in another context, they have to wait until everyone is ready.

That's chaos. And I don't want to do that when the other option is "update the server whenever, as clients upgrade, they'll see the improved service". You avoid the dance. It moves the initiative of an upgrade from the client to the server, and this is a good thing, because there are more clients than servers.

DougBTX · on Oct 12, 2018

> each server/client update dance has to be mostly atomic

There’s a nice write up showing how that’s a risk but not an absolute restriction of using messages with required fields here: https://martin.kleppmann.com/2012/12/05/schema-evolution-in-...

The basic idea is to acknowledge that all systems with forward and backward compatibility will have a translation layer, the question is just how is that defined and implemented?

If all fields are optional, it means that all readers need to handle any field being missing, in other words all readers must be able to process empty messages. A user update message might be missing a user id, and the reader will have to handle that. A couple of options come to mind: do nothing if there is no user id, or return an invalid message error. The key is that this is a translation layer that can noop or error before the message reaches the service business logic.

Then another thought is that message schemes needn’t be defined with version ids, trying to define a strict ordering between message versions is hard as you say, especially when handling non-linear updates, eg rollbacks or readers and writers skipping versions.

Instead, let’s define message schema compatibility. The user message processor could be defined to say it will only process messages with user ids - which practically speaking will be the case regardless of the message definition format - then a message without a user id can be rejected by common message parsing code, without per-service per-field translation code.

With a clear set of compatibility rules, it is even possible to write sensible reusable schema compatibility checking, eg: https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/...

docker_up · on Oct 11, 2018

We agree to disagree. I don't think you can convince me that all optional is better than all required and vice versa, which is okay. My point is required fields makes software age better over the long run because everything is explicit. If you don't agree, that's your prerogative. Everyone thought NOSQL without schemas was a godsend, until their code/service iterated a dozen times, developers leave, documentation gets out of date, and now all their older data can't be read because it doesn't match the code. Same thing holds true for RPC, in my opinion. Yours may differ.

joshuamorton · on Oct 12, 2018

Right, and my point is that all required fields prevents you from iterating. Your software doesn't age at all.

I've never found the problems you describe, and I work with some of the oldest protos around!

dweis · on Oct 10, 2018

Hi there,

I'm an actual author of Protocol Buffers :)

I think Sandy's analysis would benefit from considering why Protocol Buffers behave the way they do rather than outright attacking the design because it doesn't appear to make sense from a PL-centric perspective. As with all software systems, there are a number of competing constraints that have been weighed that have led to compromises.

- D

P.S. I also don't believe the personal attacks to be warranted or productive.

yiyus · on Oct 11, 2018

While I agree personal attacks do not help, he gives many reasons why he thinks Protocol Buffers are wrong. You may either respond to the issues he raises or explain what are those constraints and compromises you mention, but your comment basically just is "I'm an author, he does not know what he's talking about", which is not very productive neither.

PaulHoule · on Oct 11, 2018

It astonishes me how many Google products are not developer friendly. Because they think they are so freaking smart they figure they can waste their time with balky code.

Protobuf raised all my red flags the first time I saw it and every time I see it again.

For instance I once tested five vision recognition APIs and I could get the other ones working in 15 minutes each. The Google API went way into overtime because Google's libraries trashed my Python installation forcing me to reinstall.

Google made a real boner with namespace packages in Python and they've contributed a big chunk of entropy to the Java ecosystem with the Guava library that, to this day, holds back Hadoop and all of the code around it to version 13 point something because what was supposed to be a minor revision broke HDFS.

deanCommie · on Oct 13, 2018

Google hires the smartest software engineers in the world.

The problem with the SMARTEST software engineers is they are incapable of distinguishing between good and bad code. They're fantastic at writing EFFICIENT code, certainly, but not code that is comprehensible by others - all code is equally comprehensible.

Am I trolling? Slightly. I am exaggerating a little bit. Obviously all engineers care about readability/maintainability to a degree. But I definitely have noted a correlation during my career between super intelligent engineers and an over-emphasis on succinctness over readability. And when provided code review feedback, they are genuinely confused "why would your version be be better? Mine is perfectly understandable"

lowmagnet · on Oct 10, 2018

The only thing I was thinking while (half-) reading this article is there's some fundamental misunderstanding about what protobuf is for.

batmansmk · on Oct 10, 2018

Dear D,

I m very interested in Protocol Buffers. Could you explain the tradeoffs you made while designing protobuff and what would you change if you were to design it now?

Cheers!

Cthulhu_ · on Oct 11, 2018

There's been some Protobuf spinoffs that claim to be "Protobuf but better" or "Protobuf but with changes from experience", like (iirc) Cap'n Proto and FlatBuffers, and full alternatives like Thrift and Fast Buffers.

oh_sigh · on Oct 11, 2018

I thought Kenton Varda was the author of protobufs?

kentonv · on Oct 11, 2018

Nope, I'm not the original author -- that would be Jeff Dean and Sanjay Ghemawat (also often credited with inventing things like MapReduce, BigTable, Spanner, ...). I wrote version 2 (a complete rewrite, but largely following the original design) and open sourced it. I stopped working on Protobuf about 8 years ago. Many others who have been on the Protobuf team since can certainly call themselves "authors".

sangnoir · on Oct 11, 2018

> that would be Jeff Dean and Sanjay Ghemawat

or "amateurs", as the post would call them.

_Codemonkeyism · on Oct 11, 2018

"an actual author"

dukoid · on Oct 11, 2018

Can we get access to more metadata in the lite api? Asking for a friend O:)

dweis · on April 2, 2017

Disclaimer: I have written and designed many parts of the Protocol Buffer libraries.

https://tools.ietf.org/html/rfc1832.html#section-6 has a good encoding example of XDR.

Contrasting that with Protocol Buffers is enlightening as it clearly demonstrates some differences in design goals and where tradeoffs are made. Feel free to correct my interpretations as I may have missed something!

1. XDR appears to have no equivalent of a Protocol Buffer field number => the format is not self describing. That is, one must have a schema to properly interpret the data

2. XDR appears to encode lengths as fixed width based on the block size => faster to read/write but larger on the wire than using a varint encoding

3. XDR's string data type is defined as ASCII => it does not support modern unicode outside of the variable-length opaque type

1 would seem to present difficulty for modern distributed systems as one cannot control the release process of every distinct binary in the ecosystem to ensure that they are all schema equivalent at the same time. This can be remediated by propagating the schema as a header to the underlying data for consumption on the other side, but that bloats the wire format. Header compression could help with this but may be problematic for systems with severely constrained networks that constantly reestablish new connections (ex. mobile).

1 also has implications for data persistence. One cannot ever remove or reorder an XDR struct member else they will incorrectly parse data that was written in the older format. This is in contrast with Protocol Buffers, where one can remove or reorder message members whenever they'd like, as long as they take care not to reuse a tag number (and the newer `reserved` feature can help with that).

2 is just a performance tradeoff: binary size or (de|en)code performance?

3 has implications for memory constrained systems. Ex. on Android we eagerly parse string fields to avoid doubling the allocation overhead (first as raw bytes, then as a String object). If we required all string datatypes in Protocol Buffers to be defined as bytes fields (the variable-length opaque data type equivalent), we wouldn't be able to provide this optimization.

Overall, XDR looks like a good fit for inter-process communication in a homogeneous environment. Protocol Buffers looks like it's a good fit for cross language communication across heterogenous and unversioned environments. Directly comparing the two, XDR is much more verbose on the wire (particularly if we mitigate the versioning issues by serializing the schema in a header) whereas it's likely significantly faster to (en|de)code. i.e. there's a tradeoff for networking/storage costs vs. CPU performance.

Scott makes a bunch of provocative declarations in his post but I think many of them betray a lack of background to appropriately understand the tradeoffs involved. As illustrated above, Protocol Buffers makes a bunch of design affordances for compactness on the wire which XDR does not accommodate. He also believes Google's RPC system to be "unarguably shitty" even though it has never been open sourced due to dependency issues (what is open sourced as part of Protocol Buffers is a shim, gRPC is the future here). His impression of why Facebook built Thrift is similarly misinformed as Protocol Buffers was not open source when Thrift was written.

tjalfi · on April 2, 2017

TL;DR. XDR was designed in the 1980s. This is a major factor in Decisions 2 and 3.

RFC1832 has a rationale for decision 2 towards the end of the document. Here is an excerpt.

   "(4) Why is the XDR unit four bytes wide?
      There is a tradeoff in choosing the XDR unit size.  Choosing a small
   size such as two makes the encoded data small, but causes alignment
   problems for machines that aren't aligned on these boundaries.  A
   large size such as eight means the data will be aligned on virtually
   every machine, but causes the encoded data to grow too big.  We chose
   four as a compromise.  Four is big enough to support most
   architectures efficiently, except for rare machines such as the
   eight-byte aligned Cray*.  Four is also small enough to keep the
   encoded data restricted to a reasonable size."

Most RISC architectures of the time did not support unaligned memory accesses.

Decision 3 is a matter of timing. RFC1014 ( https://tools.ietf.org/html/rfc1014) was released in 1987. Unicode was still a work in progress.

Edited to fix formatting of the excerpt and a typo.

dweis · on April 3, 2017

Thanks for pointing these out! This sort of archaeology is always interesting. I tend to have to do it for Protocol Buffers too to understand various design decisions.

toddh · on April 7, 2017

How about ASN.1? It's a syntax that was meant to have different encodings. Doesn't seem like we've improved on ASN.1.

dweis · on Feb 5, 2015

We use LevelDB.

dweis · on Oct 24, 2014