Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Scuttlebutt is a neat concept, burdened by a bad protocol. Signing a message involves serializing a json object, signing it, adding the signature as a field on that json object, and then serializing it again. To verify, you deserialize the message into an object, remove the signature field, and then reserialize it, and verify the signature against that new serialization. This means that all the clients have to have a json serialization that 100% matches the node.js serialization that the majority (totality?) of current clients use. Any other implementation (I wrote one in Go) becomes a nightmare of chasing rarer and rarer differences that prevent verification, and if you miss one before you serialize your own message with it, suddenly you've poisoned your own journal...

All in all, it's something to study and learn from, but I strongly recommend not becoming involved unless you are 100% happy with being tied into node.js.



After reading about the protocol I came to a similar conclusion as you. Although it needs to be noted that the JSON serialization is defined as JSON.stringify as defined in ECMA-262 6th Ed. plus some more.

To me it's worse that key order must be preserved, which this standard does not specify the way I understand it.

Source: https://ssbc.github.io/scuttlebutt-protocol-guide/#message-f...


> In brief, the rules are:

- Two spaces for indentation.

- Dictionary entries and list elements each on their own line.

- etc...

This is so weird to me. Why a protocol needs a strict, opinionated format of JSON? If they really need a very specific format of JSON, why they even bother JSON? There are better options like protobuf.

This seems the worst example of "Use JSON for everything".


Just a guess, but maybe so they can reliable hash it?


This is terrible. It's basically not JSON anymore. It's some custom text protocol with similar value escaping...

If they wanted both the descriptions to be visible and the order to be preserved, they could use:

    [ ["previous", "..."], ["author", "..."], ...


lvh at Latacora wrote a blog post about this problem.

https://latacora.micro.blog/2019/07/24/how-not-to.html


This might actually improve in the near future: https://github.com/ssbc/ssb-spec-drafts/blob/master/drafts/d...


As @cel pointed out further down the tree [1] that's not entirely correct any longer. Especially the "100% tied to node.js" part has not been for a while. There are now alternative implementations for Rust and Go.

Now, the feed format is indeed a practical problem if you try to make a client from scratch and it is annoying. Luckily, using one of the already existing libraries will handle this for you.

Changing that feed format for everyone is not possible, simply because there's already an existing social network built on the old ones that we very much want to preserve since we actually... well... hang out there. Changing feed format thus involves adding a new feed format and making sure other clients can handle and link together both. At the benefit of abstracting away the feed format, and being able to iterate on them.

[1]: https://news.ycombinator.com/item?id=22912075


Indeed, json maps are not supposed to be ordered so doing anything that depends on the order is bound to fail.

This is the exact reason bencode (https://en.wikipedia.org/wiki/Bencode) was invented, and I still believe we can replace all uses of json by bencode and be better off it, because it solves all too common issues:

- bencoding maps are in lexicographical order of the keys, so no confusion possible for hashing/signing (a torrent id is the hash of a bencoding map)

- bencoding is binary friendly, in fact it must be because it stores the pieces hashes of the torrent

Why don't we use bencoding everywhere ?


Interesting that your experience with Bencode was this positive. I've implemented a Bencode serializer/deserializer in Rust and here are a few things I've noticed:

* It is very easy to parse/produce Bencode

* It's probably fast

* The specification is really bad

* No float type

* No string type, just byte sequences. This is especially bad because most/all dictionary keys will be utf-8 strings in practice but you can't rely on it

* Integers are arbitrary length, most implementations just ignore this

I think Bencode is an ok format for its use case, but I don't think it should be used instead of json.


When I was faced with this (signing a structure), I serialized the json into base64, then put that base64 string as a value (along with the MAC) into a new json document. It of course increases deserialization overhead (json, verify, unbase64, inner json) but sidesteps this issue.

I thought about sorting keys and other things like that, and the dozen edge cases and potential malleability issues dissuaded me for the compatibility issues mentioned above.

How have others solved it?


Use Bencoding, like bittorrent does: https://en.wikipedia.org/wiki/Bencode

As I put in another comment, a torrent id is a hash of a map, where one of the keys contains binary data. bencoding solved that decades ago already.


It looks similar to stackish and BON (binary object notation) https://github.com/bon-org/bon-doc/blob/master/README.asciid...

BON is compatible with json+ and erlang data type, in specific, it allows any data type for the map key. Json only allows string as map key.


Why bencoding and not BSON or CBOR or any of the other serialization options?


Binary protocols. Or out-of-band signing.


Do you mind going deeper here?

Which protocols? Can you point to examples?

Because I was about to design a signing Json solution but based on the comments here it is a bad idea.


The signing schemes I've seen used in binary protocols fall into two categories:

1. Canonicalize and sign: the format has a defined canonical form. Convert to that before doing cryptographic operations. If the format is well designed around it, this is doable, whereas JSON doesn't really have this and with many libraries it's hard to control the output to the degree that you'd need. 2. Serialize and sign: serialize the data as flat bytes, and then your signed message just has a "bytes" field that is the signed object. This is conceptually not far off from the base64 solution above, except that there's not extra overhead, since with a binary protocol you'll have a length prefix instead of having to escape stuff.


Being able to separate the object and signature saves tons of trouble https://latacora.micro.blog/2019/07/24/how-not-to.html


Protobuf, cap n proto, and messagepack are a few I've seen before


It sounds like base64 was unnecessary there since a JSON string can contain serialized JSON.

Personally I'll just concatenate the values in a defined order and sign/hash that.


jsonrpc “args” can vary in name, number, and type.


Sure, but where you're doing

    {"msg": "eyJhIjogMTAsICJjIjogInRlc3RpbmciLCAiYiI6ICJoZWxsbyJ9", "h": "..."}
you can as well be doing

    {"msg": "{\"a\": 10, \"c\": \"testing\", \"b\": \"hello\"}", "h": "..."}
and skip base64 altogether.

If you mean this as a point against the second part of my post, it's of course only in some limited circumstances you can simply dump the values in a defined order and be done with it. To make it general, you have to have delimiters for at least arrays, objects and numbers, canonicalize number representations, and probably output the keys as well, at which point you've invented your own complete serialization protocol.


I just use JWTs whenever I have to pass signed messages.


This is how Cosmos (https://cosmos.network/) deals with signing transactions, caused _major_ headaches for me trying to figure out why my signatures were invalid when sending them from Node.js


I did something similar for a project that had clients in go and node. I solved this by flattening the object key paths to strings and sorting the keys, basically. You need to build that into the client serialization/deserialization, it feels clunky but I had 0 issues and has been working smoothly for a long while now.


Any signing done over structured data has this problem. You always need a canonical representation.


It would be simpler to work out a canonical reduction for JSON. This should be reasonably easy since there are so few elements to it.

A simple proposal for actual security guys to rip to shreds:

Strings are represented as utf-8 blobs, and hashed.

Numbers should probably be represented using a proper decimal format and then hashed. If you're reading the JSON in and converting it to floats, you could get slight disagreement in some cases.

Arrays are a list of hashes, which itself is hashed.

For objects, convert the keys to utf-8 and append the value (which will always be the hashed representation) and then sort these entries bitwise. And then hash all that.

Or, better, it'd be great to have an order-independent hash function that's also secure. I doubt xoring all the pairs would be good enough. Update: a possible technique[1]

[1]: https://crypto.stackexchange.com/questions/51258/is-there-su...


could it use a HTTPHeader style system instead to avoid the json serializing back and forth?


Just serialize, sign, pack the signature and the raw bytes of the serialization. Doesn't matter how, just gotta pack the raw bytes, not futz with it


You can even have a JSON object containing base64 fields for message and signature if you feel it really needs to be JSON all the way down for whatever reason.


This is typically the case that would benefit from a function compiled to webassembly. "Write once, run everywhere."

The protocol helper functions would be compiled to a webassembly library, and you would reuse them in Go, Python, the browser, etc

Of course, it's not justification for using their protocol (rewriting a protocol in another language is a good test for the protocol specification), but that would be a usecase for webassembly.


Once someone has written a library to do this correctly in Go/Rust/Whatever isn't this problem solved? Everyone building scuttlebutt apps with that language can use that library. It didn't seem like this protocol is changing.


I tried, and tried, and tried to make that library for Go. I failed, cause of serialization issues too involved for it to be worth it to me (got to the point where I'd have to write my own json implementation)

Just giving people the heads-up that JS seemingly is the only blessed language for scuttlebutt



there is work on C, Go, and Rust versions. It runs in iOS and Android. So, yeah, not so much "tied into node.js". Yes, that's where the majority of the work is, but...



I spent quite a bit of time with XML digital signatures, which is a similar situation, but with even more surface area to get wrong. Months to implement, over a year to harden.


I’ve had similar, albeit easy to solve, problems with key formats when signing emails.

But that isn’t a reason why scuttlebutt isn’t more popular. It only takes one Go implementation and the problem is solved permanently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: