I want to ditch the Nar format as soon as possible. IPFS's unixfs format is too rich however.
When will the IPFS people finish up https://github.com/ipld/cid so we can link whatever content addressable data we want?
I'd use git tree objects, despite SHA-1, because it's widely supported. Or do a format identical tree objects but with the IPFS's multihash and SHA-1 banned.
Point is, underlying protocol should be agnostic to hashing scheme, we should have a trait/type class like
/// Node in try
trait Payload {
type Hash: HashingTrait;
fn unpack(Payload) -> (Vec<u8>, Set<Hash>);
fn pack(Vec<u8>, Set<Hash>) -> Payload;
// Implement either and get the other for free!
fn hash_packed(p: Payload) -> Hash { hash_unpacked(packed(p))
fn hash_unpacked(p: (Vec<u8>, Set<Hash>)) -> Hash { hash_packed(packed(p)) }
}
any `(Hash, Payload)` than can define a `(binary blob, Set<Hash>) -> Hash` and Payload function should work.
The cid stuff has been implemented and initial support for it is being landed in our 0.4.5 release (which will be soon, hopefully release candidate within a week).
With that and IPLD, you can craft arbitrary objects in JSON or CBOR (theres a 1 to 1 mapping, objects are stored as cbor) and work with them in ipfs. For example, i could make an object that looks like:
{
"Contents": {"/":"QmHashOfPkgContents"},
"Compression": "bzip2",
"NarSize": 12345,
"References": {
"foo": {"/": "QmHashOfFoo"},
"bar": {"/": "QmHashOfBar"}
},
"Signature": "signature info, or a link to the signature",
}
(please excuse my attempt at recreating a nar file in rough json).
This object could then be put into ipfs with:
cat thing.json | ipfs dag put
And you would get an ipld object that you can move around in ipfs, and do fun things like:
ipfs get <thatobjhash>/Contents
to download the package contents, or:
ipfs get <thatobjhash>/References/foo
to get the referenced package (or open that hash/path in an ipfs gateway to browse the package graph for free in your browser :) )
IPLD does allow storing tons of data, but custom schemas allow restricting the data referenced in arbitrary ways.
IPLD, last I checked, supports relative paths (which can make certain cycles), and not every node child gets its own hash. This is too much flexibility for my purposes (Nix or otherwise).
Also, when interfacing with legacy systems like git repos, one needs to dereference a legacy hash without knowing what it points to, which is easiest done with custom schemas.
Now, granted, customs schemas aren't a super fine-grained solution as every node in the network that cares about the data needs to implement the schema, but they are useful tool for these reasons (and that downside doesn't apply to private networks).
Ok, so it's good we can finally refer to other node types. But I worry about putting all that in a single namespace. The IPLD node types constitute different hashing strategies as I describe above, but stuff like media codecs are orthogonal to hashing strategies---media of various sorts given a hashing strategy will be treated as black-box binary data for the foreseeable future.
The big takeaway here is a really like the idea of IPFS, and want to be a full fan, but everywhere I look I see dubious interfaces. I see what already looks like legacy cruft, and they haven't even hit 1.0!
> Also, when interfacing with legacy systems like git repos, one needs to dereference a legacy hash without knowing what it points to, which is easiest done with custom schemas.
The CID (address format) in IPLD doesn't represent types of systems, but it represents types of data structures. E.g. in the case of git, it's not "git,$thehash", but instead "git-tree,$thehash" or "git-commit,$thehash".
That way you know which code you'll need to run once you have the object's payload, or you could have datastores that simply pull blocks out of a git repo.
Yeah, my OP was saying what's happened to CID. I guess it's been implemented without finishing off the spec :/.
While I'm not opposed to treating git that way, do note that git hashes are specifically constructed by prefixing the serialization of blobs, trees, and commits separately so that collisions are not likely.
Structs/Traits already exist in some form that is not well defined yet, what you are describing is a generalization of what is happening with Ethereum right now, we have eth-block being a IPLD "Format", which is basically a struct with some particular characteristic where the parser instead of being written in a IPFS VM language, it is written and executed as part of the daemon.
The idea that you have described above or a subset of it, it's part of the plan!
Please do participate in the issue that I linked you to!
@Ericson2314, yeah thnaks for bringing this up. As was mentioned elsewhere:
* CID is finished and live in go-ipfs@master and js-ipfs@master. We haven't announced it widely because go-ipfs@0.4.5 is still to land. (ooof)
* IPLD spec needs work, but work continues.
I wanted to add that:
* please contribute to CID to get it where you need it to be.
* i am personally very interested in defining IPLD data structures and their operations in a good language. This will probably be transpiled to the IPFS impl language, or compiled down to WebAssembly and run in a small WA VM (the web of datastructures)
When will the IPFS people finish up https://github.com/ipld/cid so we can link whatever content addressable data we want?
I'd use git tree objects, despite SHA-1, because it's widely supported. Or do a format identical tree objects but with the IPFS's multihash and SHA-1 banned.
Point is, underlying protocol should be agnostic to hashing scheme, we should have a trait/type class like
any `(Hash, Payload)` than can define a `(binary blob, Set<Hash>) -> Hash` and Payload function should work.