Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, because we have more information than rsync does. We own both ends of the connection and can keep versions synchronized.


That sounds interesting, could you elaborate on how it is different from rsync though? "Keep versions synchronized" is a bit vague


The piece in the CloudFlare network and the piece in the customer network are able to keep track of which page versions they each have and so the part in the CloudFlare network sends a request saying "Please do GET /foo and compress it against version X". That means that at request time there's no back-and-forth between the components deciding what compression dictionary to use.


Well, no good binary delta algorithm uses compression dictionaries anyway (since they are binary deltas, not compression algorithms :P), except to compress the newly added strings, which you can't avoid.

Note of course, that relying on the data not being corrupt on the client (which you must if you assume the compression dictionaries are sane) is dangerous. I assume you guys must store some checksum that you compare once to make sure when someone says "i have version 5, delta against this", that they really have a good copy of version 5?

SVN used to what you are suggesting, btw. We only send clients deltas against the versions they already have, and precompute them in some cases :)


I assume you guys must store some checksum that you compare once to make sure when someone says "i have version 5, delta against this", that they really have a good copy of version 5?

Yes.


For what it's worth, this is fairly standard binary patching approach as used in software updates. I am aware of at least two mainstream titles that do this, and I'd be surprised if Firefox, for example, doesn't push updates this way.

(edit) That's an awesome name by the way. Railgun.


Can't claim credit for the name. I wanted to call it Rocket Sled.


I'm glad we didn't call it Rocket Sled.


A bit like rsync's --fuzzy or --compare-dest then?


Well, fuzzy tries to find something to use as a 'destination' file so it can send across some hashes. Railgun has more complete information because it is keeping synchronized and thus the part making a request can specify the dictionary to compress with in a single hash.


Thanks for the explanation, that does sound useful! :)


Don't you understand? We need you to accept that this is a new technology and a ground-breaking algorithm and a new innovative (and valuable, non-obvious) technology. CloudFlare was established in 2007 with the goal to develop a faster, safter, better internet. CloudFlare, the web performance and security company, set records this month hitting more than 100 million daily active users and more than 50 billion monthly page views!


> we have more information than rsync does.

To what end? Rsync too works off both copies.


rsync is going to perform checksums on blocks to see if the blocks are the same. It transmits these checksums, and where the checksums differ, it deltas the blocks. Note that insertion/deletion in a file can push block boundaries off between two files, causing a problem known as "stream alignment", which can cause your binary delta to be much larger because it doesn't realize the block really shifted 16384 bytes over (or whatever), and so it thinks the client really doesn't have any of the bytes of that block.

In any case, if you know the files are related, you

1. Don't need to do any of this. You can simply send the binary delta that is is usually copy/add instructions (IE copy offset 16384, length 500 to offset 32768)

2. Can precompute the deltas.

You can actually precompute in any case, it just makes no sense unless you know you will be diffed against something else.


I always thought rsync detected block moves and that's what made it a worthy PhD thesis.


I thought that too. It'd be interesting to see a comparison of the two software designs with actual difference in resource usage (cpu, io, bandwidth).

That would be really cool in fact.


Yes, I simplified and I shouldn't have. It does detect them, but it does have a minimum size of block move it can detect due to the signature matching method.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: