Cloning something like Chromium or Linux is already a multi-gigabyte affair without needing everyone to download everything ever committed to the system... and what happens when code needs to be taken down for copyright reasons?
I like more-decentralized Git, but I don't think this is necessarily the way to do it. The main problems today include plumbing around decentralized issue tracking and wikis, good Git web interfaces, etc.. Better P2P might make forking easier without centralization (Gittorrent?), but most serious projects probably won't find it difficult to obtain a VPS; data integrity and distributed authority, things the blockchain provides, are not really necessary at all.
That code can't be taken down for "copyright reasons" is a feature in my eyes, not a flaw. The best way to stand up to censorship and legal threats is to really not be able to comply, just like the best defense against torture is not to know (and much more so if your would-be agressors have a way of being sure you really don't know, and technology can do that against censorship.)
You should see some of the stuff that's in the Bitcoin blockchain... one day the community is going to have to make a decision on how to purge that data.
To be fair, right now it's all but impossible to entirely take code down for "copyright reasons" if incentive exists to pirate it. I don't know that this is necessarily a strong argument for using a blockchain in this way.
I fully agree, but isn't that roughly the same with Github?
Technically it's much much different of course, but if you commit some private keys/etc to Github, you have to invalidate everything used and assume that someone saw it, Google saw it, and it's cached in multiple places. Right?
While your criticisms are valid, I think the best way to consider developments like this is as if they were a new branch of math.
These are new things we can do with the logical/mathematical system. As of now, the best application may not have been found. Saying how these new discoveries are 'not useful' sort of confuses the spirit of exploration.
I really feel like this is a misapplication of a blockchain. Block chains are already heavyweight, when you suddenly add arbitrary data like git repos you go from 17GB to hundreds of gigabytes and potentially much more.
Pushing also becomes a slower affair, you need to get your data into a block and then get confirmations.
I don't want to crush innovation, especially in the cryptocurrency space, but I really think this is an example of using the wrong tool for the job.
Thanks for the response. How do you plan on sparsely distributing data across nodes? I read the whitepaper but that part has not been filled out.
You'd need to make sure that repo's are safe against attack - if not every node has all the data you are moving into highly innovative territory. I'd be interested to know what sorts of solutions you are considering.
I find it interesting you are using kickstarter to fund this, when everyone (and their dog) seems to get ridiculous amounts of BTC thrown at them when they post their project ideas to bitcointalk. Why not raise funds in BTC?
Also, in this thread you say "There are proof of storage techniques" - I have seen others talk about this (paying people to store data: StorJ, maidsafe) and it seems flawed - you can never know how many copies of your data are truly floating around. If you give me a financial incentive to store "multiple copies of data" I will do it in the cheapest possible way, which means storing it on 1 machine and lying about it.
There is a big difference between storing 20 copies of a file on 1 hard drive vs 1 copy x 20 hard drives, but there is no algorithm that can tell me (the file owner) which is occurring. How do you plan on monetizing "store other people's data" fairly?
Maybe you could use erasure coding instead of replication so that dedupe isn't possible. There's still a Sybil attack where multiple blocks end up getting deleted by the same peer.
You have to make sure that somehow the person storing the erasure codes can't first deduce the original data, and then from this regenerate the erasure codes. Maybe that is not a hard problem?
1. Add the ability to git push over git:// to any git repository, which results in a patch being presented to the repository owner in some useful way Ie, reverse github pull requests, that work anywhere. http://joeyh.name/blog/entry/idea:_git_push_requests/
2. True P2P git pull/push over telehash. Something I plan to implement as soon as there's a working telehash implementation. Will allow peers to collaborate from anywhere without a central point of control, and with built-in encryption too. http://telehash.org/
(To be clear, telehash has a DHT, but it's used to find routes to peers, not for distributed data storage.)
Joey, your feedback is always welcome! I am aware of telehash (have been playing with it some time ago). Haven't seen the git push requests idea of yours. Digging in. It's a neat thought.
I think one of the most important aspects of the work I am doing is about establishing a tamper-proof record of history, decentralizing availability and proof of storage. If you have any further thoughts or questions, I am all ears!
Are the git objects themselves stored in the DHT, or is it just (like a trackerless torrent) the connection information needed to find peers willing to send you those objects?
Does trackerless bittorrent provide a way to update a file ?
The git objects themselves are content-addressable (the key is the hash of the value) but you still need a unique ID that references the HEAD of this chain of objects. This id needs to be updateable.
Just because you can do something in the blockchain, it doesn't mean you should. The blockchain is a good innovation, but it's not gonna complete with the status quo.
I agree. The idea is to safeguard against github going down by storing data in a dht. I don't see why just hosting a repository on https://en.m.wikipedia.org/wiki/MaidSafe won't work here.
Not mine but I thought it was pretty cool so I pledged. It's in golang, open source and there's already code on github https://github.com/gitchain/gitchain (disclosure: I never met Yurii but I know him from the internet).
I like more-decentralized Git, but I don't think this is necessarily the way to do it. The main problems today include plumbing around decentralized issue tracking and wikis, good Git web interfaces, etc.. Better P2P might make forking easier without centralization (Gittorrent?), but most serious projects probably won't find it difficult to obtain a VPS; data integrity and distributed authority, things the blockchain provides, are not really necessary at all.