Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lovely work!

We ask for the commit we want and connect to a node with BitTorrent, but once connected we conduct this Smart Protocol negotiation in an overlay connection on top of the BitTorrent wire protocol, in what’s called a BitTorrent Extension. Then the remote node makes us a packfile and tells us the hash of that packfile, and then we start downloading that packfile from it and any other nodes who are seeding it using Standard BitTorrent.

The disadvantage here is that any given file in the repo could be stored in an ever increasing number of packfiles. Each existing version of the repo will generate a new packfile to get it to the newest version, and it's up to the authenticated masters to generate and seed each of those packfiles, while peers either do or do not cache these replicated datas. In short, this means of syndicating updates ignores the Merkle-DAG-ness (DAG-osity?) of Git.

The un-updateability of torrents is something that seems to seriously limit it's use. There are a lot of interesting attempts to hack around this- LiveStreaming and Nightweb are two that spring to mind. https://www.tribler.org/StreamingExperiment/ https://sekao.net/nightweb/protocol.html



Check out IPFS (http://ipfs.io/)

You can use it for git repos essentially out of the box by uploading your repo.

It is made of content addressed chunks which will get re-used on each re-upload.


I just checked this out thanks to your link. Awesome. ipfs solves a whole class of problems - distributed git repos being just one of them. Great stuff.


I believe that these technologies are bound to wither not because of technical problems but because of "political" problems. These technologies are a huge problem with the regard to how intellectual property is managed today and how it is monetized.

Every IP owner will try to slow down the progress of these technologies, mainly by not adopting them. I think these technologies won't be adopted until the IP monetization problem is solved.

To me, this means that if the media are to be easily decentralized and shared, the monetization (how you get money for your work and how you give money when you benefit from it) must become equally easily decentralized and shared (between the publishers, authors, etc.).


Creative Commons cense is important, that is why. Research and development can scale expression if we look for new ideas before old fear. #CreativeCommonsCense? (#CCC?)


I agree it is really stellar. The billion dollar concept for me though is encrypted repo torrents. Imagine a group of servers that are hosting encrypted chunks which form the basis of a homomorphic encryption protocol for distribution using forward error correction to allow recovery of the deltas if n of m components of that delta can be recovered. Basically if you have the key you can pull out of this amorphous cloud your source code, and if you don't have the key you won't even know it is out there.

I started building a toy version of this about 5 years ago but got distracted by work. Essentially the repo key encrypted the packfile, the storage reliability layer used its another key to encrypt the chunks. The latter key would find the chunks, with enough reliability to re-create the encrypted packfile, which the latter key could decrypt and apply to your repo.

A very fun problem in distributed systems and data structures.


If I'm not mistaken, other than the Git specific aspects, this is more or less what Tahoe-LAFS strives to accomplish.

http://en.wikipedia.org/wiki/Tahoe-LAFS


Unfortunately Tahoe-LAFS is something I've been looking at recently and it suffers from a few missing features which have been known for many years.

(The one that I want most is the ability to rebalance on the fly, as storage-hosts become full.)


Tahoe-LAFS also doesn't support deletion.


I've always thought this would be amazing for file storage in general. A Dropbox that just syncs with the cloud encrypting your content with a private key and you can withdraw it at any time. You simply give back say 5 x as much storage as you use for the data mirroring.


AFAIK this was the concept behind Wuala. Founded in Switzerland, later bought by French company LaCie. According to their website the clien side encryption is still there, however they discontinued the collaborative disk space business model.


We're trying to recreate that concept with peergos, https://github.com/ianopolous/Peergos we're considering trying to use ipfs, http://ipfs.io under the hood which would be nice. Wuala was great but they never open sourced it and stopped the storage earning in 2008.


Like http://storj.io/ ? I think they have a lot of code to write though.



Wow they are moving to Rust. It's going to be quite interesting


How would that work? There's no homomorphic encryption schemes anywhere near general enough for that, as far as I know.


That's why it's a fun problem.



> The un-updateability of torrents is something that seems to seriously limit it's use.

I am not involved with the development of torrents at all but (please bear with me until the end) my initial reaction is that we should think of the lack of ability of torrents to update as a feature and not as a bug.

Perhaps if ability of torrents to update is a concern then it warrants a new peer to peer protocol? (Please note that this is not the case of http://xkcd.com/927/?cmpid=pscau as I am not advocating a new protocol for every use case)

It seems like we can sign torrent files with gpg keys. Perhaps I am wrong. Perhaps, we can allow updating in torrents if we require that the updates be signed with the same private key as the original torrent? Am I barking up the right tree here?

Edit: Oops. I edited this post before I saw the reply about BEP-0039. Apologies.


> Perhaps if ability of torrents to update is a concern then it warrants a new peer to peer protocol?

There is a new peer to peer protocol, it's even an IETF draft, it's called PPSP and is full of nice stuff:

https://tools.ietf.org/html/draft-ietf-ppsp-peer-protocol-12

https://github.com/libswift/libswift


Do you imply that PPSP's live stream protocol could be used to create updateable keys?


Yes, if append-only is enough then a streaming protocol like PPSP is good.

Otherwise we need something else, which I hope to achieve in rakoshare (https://github.com/rakoo/rakoshare).


it's not append only since there's churn. There is not "delete" per se. Except maybe with the PPSP private network.


Updatability does not need to imply mutability. It could be possible to have a torrent with 50 files, and then the torrent is updated with a new file, and a client that has already downloaded the new torrent will only need to download one new file. And another client that still wants the old torrent would still get the same 50 files, but would be able to download from clients that have the new torrent. (Similarly, clients downloading the new torrent could download the first 50 files from clients seeding the old one.) That's updatability without mutability.


Follow-up question: does BitTorrent work this way currently? If I take your 50-file torrent, add one file and re-seed, will you be seeding the original 50 files in the same swarm as my 51 file torrent?


Every torrent have a hash id that identifies it, even if 2 torrents have the exact same files you are only seeding one of these hashes.


I regularly download mame updates by pointing the updated torrent at my ROMs directory. All changes are redownloaded, and new ROMs are grabbed.

Edit: just realized I misread the question. No. In my case, the previous mame romset swarm is usually abandoned, and the new torrent takes all the traffic. The swarms are unique.


What I described would be the ideal situation. I don't know of any existing P2P system that works like that, unless it's a system for sharing individual files instead of "torrents" containing multiple files.


So basically, git over IP? Each "update" is just a commit containing the modifications since the last one. If you want the most recent, ask for the HEAD.


The article goes into why that's not really an option. Making a separate request for each blob/file/commit/whatever would be way too slow for large repos.


I was more commenting that the very updatability you were discussing is pretty much directly analogous to what git is in the first place. So now we're building a system that has the same functions as git, but isn't git.


Support for mutable/updateable torrents was proposed in BEP-0039 (http://www.bittorrent.org/beps/bep_0039.html) and is the basis of how BitTorrent Sync functions.


Great to have. Alas it relies on a central system of record, is just a feed that can be re-fetched. The 'originator' key is probably worth standardizing around for any future approaches, but using PEX or the DHT to notify the most recent magnet uri or what not.

Simply signaling new Magnet URI's would have the disadvantage Gittorrent sought to avoid: resyndicating the entire contents with every single change: a killer for things like the Linux kernel or projects like Debtorrent. Git's merkle-dag avoids this problem, allows multiple concurrent versions to share the bulk of the content-indexing, and best-of-all-worlds solutions would preserve this capability.


bittorrent sync is actually very much like this. People can offer read only subscriptions to their repositories, and then everyone distributes that repository to everyone else bittorrent style. So you have multiple read-writers who can publish and update the repository and multiple readers who can just subscribe to it and help to distribute.

So an example where this technology could be put to a unique use: Minecraft streamers and lets players sometimes like to distribute the world they are using. So they could make a repository of their world and distribute the read only keys for it to other users. This would allow them to play it, even temporarily make changes because sync kicks in and refreshes it, and the repository would be kept current as the world progresses. That should be viable right now with Sync.

Problem is, I think the bittorrent foundation is doing their damnedest to keep themselves firmly planted in the distribution and ownership of the technology. So we won't see an explosion of third party clients. I don't think it will see much adoption for this reason, and that's a real shame, because it would be a wonderful bit of kit for the internet.


Peers don't have to seed packfiles the way we're used to for "traditional" seeding of movies or music; these packfiles actually represent transition from one commit to another (instead of "all content from beginning to now"), so they are inherently ephemeral. They don't even need to be kept on the disk, because they will be generated on the fly every time a client is interested.

The DAG-osity of git really helps here (because you only have to transfer what's really needed), and the "immutability" of git helps because if your project is popular and you update your branch, everyone will want to go from the old commit to the new commit, so everyone can share the diff between them directly.


I found this discussion interesting regarding conceptual approaches about how to handle partial torrent updates:

"Thinking about 'meta' torrent file format." https://gist.github.com/mait/8001883

Truly Meta - Meta: https://news.ycombinator.com/item?id=6920244


I had no idea torrents had that limitation. Could torrents be tagged with versions and then old ones could be removed from seeders?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: