Gitchain: Decentralized P2P Git Repos aka "Git meets Bitcoin"

comex · on May 17, 2014

Cloning something like Chromium or Linux is already a multi-gigabyte affair without needing everyone to download everything ever committed to the system... and what happens when code needs to be taken down for copyright reasons?

I like more-decentralized Git, but I don't think this is necessarily the way to do it. The main problems today include plumbing around decentralized issue tracking and wikis, good Git web interfaces, etc.. Better P2P might make forking easier without centralization (Gittorrent?), but most serious projects probably won't find it difficult to obtain a VPS; data integrity and distributed authority, things the blockchain provides, are not really necessary at all.

goldfeld · on May 17, 2014

That code can't be taken down for "copyright reasons" is a feature in my eyes, not a flaw. The best way to stand up to censorship and legal threats is to really not be able to comply, just like the best defense against torture is not to know (and much more so if your would-be agressors have a way of being sure you really don't know, and technology can do that against censorship.)

voltagex_ · on May 17, 2014

You should see some of the stuff that's in the Bitcoin blockchain... one day the community is going to have to make a decision on how to purge that data.

colordrops · on May 17, 2014

What could possibly be so bad as to have every single node in the network compromise the blockchain?

pjc50 · on May 17, 2014

At a guess, child porn which would make it illegal to posess?

krapp · on May 17, 2014

To be fair, right now it's all but impossible to entirely take code down for "copyright reasons" if incentive exists to pirate it. I don't know that this is necessarily a strong argument for using a blockchain in this way.

bnj · on May 17, 2014

Just turns out to be a huge drag when secrets get accidentally committed to the swarm.

kadhjasdkj · on May 17, 2014

I fully agree, but isn't that roughly the same with Github?

Technically it's much much different of course, but if you commit some private keys/etc to Github, you have to invalidate everything used and assume that someone saw it, Google saw it, and it's cached in multiple places. Right?

droopyEyelids · on May 17, 2014

While your criticisms are valid, I think the best way to consider developments like this is as if they were a new branch of math.

These are new things we can do with the logical/mathematical system. As of now, the best application may not have been found. Saying how these new discoveries are 'not useful' sort of confuses the spirit of exploration.

Taek · on May 17, 2014

"When your only tool is a hammer..."

I really feel like this is a misapplication of a blockchain. Block chains are already heavyweight, when you suddenly add arbitrary data like git repos you go from 17GB to hundreds of gigabytes and potentially much more.

Pushing also becomes a slower affair, you need to get your data into a block and then get confirmations.

I don't want to crush innovation, especially in the cryptocurrency space, but I really think this is an example of using the wrong tool for the job.

yrashk · on May 17, 2014

The object data doesn't go into the blockchain, it is sparsely distributed across the nodes

Taek · on May 17, 2014

Thanks for the response. How do you plan on sparsely distributing data across nodes? I read the whitepaper but that part has not been filled out.

You'd need to make sure that repo's are safe against attack - if not every node has all the data you are moving into highly innovative territory. I'd be interested to know what sorts of solutions you are considering.

pjc50 · on May 17, 2014

Git is already entirely decentralised. People use github because it hosts nice webpages for them. I cannot understand what the point of ths is.

chuckup · on May 17, 2014

I find it interesting you are using kickstarter to fund this, when everyone (and their dog) seems to get ridiculous amounts of BTC thrown at them when they post their project ideas to bitcointalk. Why not raise funds in BTC?

Also, in this thread you say "There are proof of storage techniques" - I have seen others talk about this (paying people to store data: StorJ, maidsafe) and it seems flawed - you can never know how many copies of your data are truly floating around. If you give me a financial incentive to store "multiple copies of data" I will do it in the cheapest possible way, which means storing it on 1 machine and lying about it.

There is a big difference between storing 20 copies of a file on 1 hard drive vs 1 copy x 20 hard drives, but there is no algorithm that can tell me (the file owner) which is occurring. How do you plan on monetizing "store other people's data" fairly?

wmf · on May 17, 2014

Maybe you could use erasure coding instead of replication so that dedupe isn't possible. There's still a Sybil attack where multiple blocks end up getting deleted by the same peer.

im3w1l · on May 17, 2014

You have to make sure that somehow the person storing the erasure codes can't first deduce the original data, and then from this regenerate the erasure codes. Maybe that is not a hard problem?

joeyh · on May 17, 2014

Here are two ways to decentralize git usefully:

1. Add the ability to git push over git:// to any git repository, which results in a patch being presented to the repository owner in some useful way Ie, reverse github pull requests, that work anywhere. http://joeyh.name/blog/entry/idea:_git_push_requests/

2. True P2P git pull/push over telehash. Something I plan to implement as soon as there's a working telehash implementation. Will allow peers to collaborate from anywhere without a central point of control, and with built-in encryption too. http://telehash.org/

(To be clear, telehash has a DHT, but it's used to find routes to peers, not for distributed data storage.)

yrashk · on May 17, 2014

Joey, your feedback is always welcome! I am aware of telehash (have been playing with it some time ago). Haven't seen the git push requests idea of yours. Digging in. It's a neat thought.

I think one of the most important aspects of the work I am doing is about establishing a tamper-proof record of history, decentralizing availability and proof of storage. If you have any further thoughts or questions, I am all ears!

patrickaljord · on May 17, 2014

There doesn't seem to be any code for these though. Gitchain has a (very early) prototype https://github.com/gitchain/gitchain

azinman2 · on May 17, 2014

I'm a bit confused as to what's trying to be achieved. Is this trying to use the blockchain as storage for a git repo?

yrashk · on May 17, 2014

Using its own blockchain as a ledger and a DHT network as a storage layer

cjbprime · on May 17, 2014

Are the git objects themselves stored in the DHT, or is it just (like a trackerless torrent) the connection information needed to find peers willing to send you those objects?

yrashk · on May 17, 2014

The object themselves are stored across the peers, not the connection information.

sillysaurus3 · on May 17, 2014

What prevents someone from using this to store backups of their harddrives, photos, etc, and consuming as much space as possible?

wmf · on May 17, 2014

It needs an incentive model, like for every GB of space you provide you get a GB of space from peers.

yrashk · on May 17, 2014

There are proof of storage techniques. I already listed some references in the paper I am working on (https://github.com/gitchain/gitchain/blob/master/gitchain.te...)

cjbprime · on May 17, 2014

Please could you explain why? Trackerless bittorrent is just as decentralized, and seems more scalable.

zimbatm · on May 17, 2014

Does trackerless bittorrent provide a way to update a file ?

The git objects themselves are content-addressable (the key is the hash of the value) but you still need a unique ID that references the HEAD of this chain of objects. This id needs to be updateable.

kolev · on May 17, 2014

Just because you can do something in the blockchain, it doesn't mean you should. The blockchain is a good innovation, but it's not gonna complete with the status quo.

u124556 · on May 17, 2014

I don't think it's meant to replace the status quo but to add a way to keep repositories that the status quo would refuse to host themselves.

lallysingh · on May 17, 2014

We seem to be in a rule 34 period for shoving shit in the blockchain.

koberstein · on May 20, 2014

"Gitchain has its own custom blockchain"

wmf · on May 17, 2014

It seems like maybe the git support could be separated from the P2P storage layer, which is a very difficult problem in its own right.

rcpt · on May 17, 2014

I agree. The idea is to safeguard against github going down by storing data in a dht. I don't see why just hosting a repository on https://en.m.wikipedia.org/wiki/MaidSafe won't work here.

Taek · on May 17, 2014

MaidSAFE doesn't have any releases yet. It would work but the software isn't complete yet.

At 7 years and counting, you can't be sure that anything is around the corner.

rcpt · on May 17, 2014

Here they mention that the testnet will be available in a few weeks https://groups.google.com/forum/m/#!topic/maidsafe-developme...

patrickaljord · on May 16, 2014

Not mine but I thought it was pretty cool so I pledged. It's in golang, open source and there's already code on github https://github.com/gitchain/gitchain (disclosure: I never met Yurii but I know him from the internet).

autodidakto · on May 17, 2014

I recently ran into two github repos that were taken down due to DMCA requests. It's a weird feeling.

bascule · on May 17, 2014