Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gitchain: Decentralized P2P Git Repos aka "Git meets Bitcoin" (kickstarter.com)
107 points by patrickaljord on May 16, 2014 | hide | past | favorite | 44 comments


Cloning something like Chromium or Linux is already a multi-gigabyte affair without needing everyone to download everything ever committed to the system... and what happens when code needs to be taken down for copyright reasons?

I like more-decentralized Git, but I don't think this is necessarily the way to do it. The main problems today include plumbing around decentralized issue tracking and wikis, good Git web interfaces, etc.. Better P2P might make forking easier without centralization (Gittorrent?), but most serious projects probably won't find it difficult to obtain a VPS; data integrity and distributed authority, things the blockchain provides, are not really necessary at all.


That code can't be taken down for "copyright reasons" is a feature in my eyes, not a flaw. The best way to stand up to censorship and legal threats is to really not be able to comply, just like the best defense against torture is not to know (and much more so if your would-be agressors have a way of being sure you really don't know, and technology can do that against censorship.)


You should see some of the stuff that's in the Bitcoin blockchain... one day the community is going to have to make a decision on how to purge that data.


What could possibly be so bad as to have every single node in the network compromise the blockchain?


At a guess, child porn which would make it illegal to posess?


To be fair, right now it's all but impossible to entirely take code down for "copyright reasons" if incentive exists to pirate it. I don't know that this is necessarily a strong argument for using a blockchain in this way.


Just turns out to be a huge drag when secrets get accidentally committed to the swarm.


I fully agree, but isn't that roughly the same with Github?

Technically it's much much different of course, but if you commit some private keys/etc to Github, you have to invalidate everything used and assume that someone saw it, Google saw it, and it's cached in multiple places. Right?


While your criticisms are valid, I think the best way to consider developments like this is as if they were a new branch of math.

These are new things we can do with the logical/mathematical system. As of now, the best application may not have been found. Saying how these new discoveries are 'not useful' sort of confuses the spirit of exploration.


"When your only tool is a hammer..."

I really feel like this is a misapplication of a blockchain. Block chains are already heavyweight, when you suddenly add arbitrary data like git repos you go from 17GB to hundreds of gigabytes and potentially much more.

Pushing also becomes a slower affair, you need to get your data into a block and then get confirmations.

I don't want to crush innovation, especially in the cryptocurrency space, but I really think this is an example of using the wrong tool for the job.


The object data doesn't go into the blockchain, it is sparsely distributed across the nodes


Thanks for the response. How do you plan on sparsely distributing data across nodes? I read the whitepaper but that part has not been filled out.

You'd need to make sure that repo's are safe against attack - if not every node has all the data you are moving into highly innovative territory. I'd be interested to know what sorts of solutions you are considering.


Git is already entirely decentralised. People use github because it hosts nice webpages for them. I cannot understand what the point of ths is.


I find it interesting you are using kickstarter to fund this, when everyone (and their dog) seems to get ridiculous amounts of BTC thrown at them when they post their project ideas to bitcointalk. Why not raise funds in BTC?

Also, in this thread you say "There are proof of storage techniques" - I have seen others talk about this (paying people to store data: StorJ, maidsafe) and it seems flawed - you can never know how many copies of your data are truly floating around. If you give me a financial incentive to store "multiple copies of data" I will do it in the cheapest possible way, which means storing it on 1 machine and lying about it.

There is a big difference between storing 20 copies of a file on 1 hard drive vs 1 copy x 20 hard drives, but there is no algorithm that can tell me (the file owner) which is occurring. How do you plan on monetizing "store other people's data" fairly?


Maybe you could use erasure coding instead of replication so that dedupe isn't possible. There's still a Sybil attack where multiple blocks end up getting deleted by the same peer.


You have to make sure that somehow the person storing the erasure codes can't first deduce the original data, and then from this regenerate the erasure codes. Maybe that is not a hard problem?


Here are two ways to decentralize git usefully:

1. Add the ability to git push over git:// to any git repository, which results in a patch being presented to the repository owner in some useful way Ie, reverse github pull requests, that work anywhere. http://joeyh.name/blog/entry/idea:_git_push_requests/

2. True P2P git pull/push over telehash. Something I plan to implement as soon as there's a working telehash implementation. Will allow peers to collaborate from anywhere without a central point of control, and with built-in encryption too. http://telehash.org/

(To be clear, telehash has a DHT, but it's used to find routes to peers, not for distributed data storage.)


Joey, your feedback is always welcome! I am aware of telehash (have been playing with it some time ago). Haven't seen the git push requests idea of yours. Digging in. It's a neat thought.

I think one of the most important aspects of the work I am doing is about establishing a tamper-proof record of history, decentralizing availability and proof of storage. If you have any further thoughts or questions, I am all ears!


There doesn't seem to be any code for these though. Gitchain has a (very early) prototype https://github.com/gitchain/gitchain


I'm a bit confused as to what's trying to be achieved. Is this trying to use the blockchain as storage for a git repo?


Using its own blockchain as a ledger and a DHT network as a storage layer


Are the git objects themselves stored in the DHT, or is it just (like a trackerless torrent) the connection information needed to find peers willing to send you those objects?


The object themselves are stored across the peers, not the connection information.


What prevents someone from using this to store backups of their harddrives, photos, etc, and consuming as much space as possible?


It needs an incentive model, like for every GB of space you provide you get a GB of space from peers.


There are proof of storage techniques. I already listed some references in the paper I am working on (https://github.com/gitchain/gitchain/blob/master/gitchain.te...)


Please could you explain why? Trackerless bittorrent is just as decentralized, and seems more scalable.


Does trackerless bittorrent provide a way to update a file ?

The git objects themselves are content-addressable (the key is the hash of the value) but you still need a unique ID that references the HEAD of this chain of objects. This id needs to be updateable.


Just because you can do something in the blockchain, it doesn't mean you should. The blockchain is a good innovation, but it's not gonna complete with the status quo.


I don't think it's meant to replace the status quo but to add a way to keep repositories that the status quo would refuse to host themselves.


We seem to be in a rule 34 period for shoving shit in the blockchain.


"Gitchain has its own custom blockchain"


It seems like maybe the git support could be separated from the P2P storage layer, which is a very difficult problem in its own right.


I agree. The idea is to safeguard against github going down by storing data in a dht. I don't see why just hosting a repository on https://en.m.wikipedia.org/wiki/MaidSafe won't work here.


MaidSAFE doesn't have any releases yet. It would work but the software isn't complete yet.

At 7 years and counting, you can't be sure that anything is around the corner.


Here they mention that the testnet will be available in a few weeks https://groups.google.com/forum/m/#!topic/maidsafe-developme...


Not mine but I thought it was pretty cool so I pledged. It's in golang, open source and there's already code on github https://github.com/gitchain/gitchain (disclosure: I never met Yurii but I know him from the internet).


I recently ran into two github repos that were taken down due to DMCA requests. It's a weird feeling.



Thanks!


I like this idea. Why is it on kickstarter?


I want to dedicate all of my working time to this project this summer!


Am I the only person who saw P2P and read "pay to play" rather than "peer to peer"?


yes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: