Bandwidth needs halved by new compression written in Go

rjknight · on Feb 26, 2013

The title suggests that there's something unique about Go, either the language or its standard library, that enables bandwidth savings. In fact, Cloudflare have written some software which they claim enables them to reduce their bandwidth, and this software happens to be written in Go. This might be an excellent choice (and I suspect it probably is), but it's not Go per se that is reducing the bandwidth usage.

jgrahamc · on Feb 26, 2013

I agree. The benefit of using Go is that it's fast to write and has good concurrency features. To give you an idea of the size, there are 7,329 lines of Go code in Railgun (including comments) and a 6,602 line test suite.

In the process we've committed various things back to Go itself and at some point I'll write a blog on the whole experience, but one thing that made a big difference was to write a memory recycler so that for commonly created things (in our case []byte buffers) we don't force the garbage collector to keep reaping memory that we then go back and ask for.

The concurrency through communication is trivial to work with once you get the hang of it and being able to write completely sequential code means that it's easy to grok what your own program is doing.

We've hit some deficiencies in the standard library (around HTTP handling) but it's been fairly smooth. And, as the article says, we swapped native crypto for OpenSSL for speed.

The Go tool chain is very nice. Stuff like go fmt, go tool pprof, go vet, go build make working with it smooth.

PS We're hiring.

gatherknwldg · on Feb 26, 2013

"write a memory recycler"

sigh This is by far go's biggest wart IMO, and one that frequently sends me back to a pauseless (hah! at least less pausy:) systems language. I sure do like it in almost every other meaningful regard. But I wish latency wasn't something the designers punted on.

saidajigumi · on Feb 26, 2013

I occasionally hear this kind of complaint, but I've yet to see any silver-bullet memory management system. AFAICT, the best we've been able to accomplish is to provide a easier path to correctness with decent overall performance. Also, GC latency isn't the only concern. As soon as the magic incantation "high performance" is uttered, all bets are off.

There's been decades of work on real-time garbage collection yet all of those approaches still have tradeoffs. Consider that object recycling is a ubiquitous iOS memory management pattern. This reduces both memory allocation latencies and object recreation overhead. Ever flick-scroll a long list view on an iPhone? Those list elements that fly off the top are virtually immediately recycled back to the bottom -- it's like a carousel with only about as many items as you can see on screen. The view objects are continually reused, just with new backing data. This approach to performance is more holistic than simply pushing responsibility onto the memory allocator.

Memory recycling here also reminds me of frame-based memory allocator techniques written up in the old Graphics Gems books, a technique likewise covered in real-time systems resources. Allocating memory from the operating system can be relatively expensive and inefficient, even using good ol' malloc. A frame-based allocator grabs a baseline number of pages and provides allocation for one or more common memory object sizes (aka "frames"). Pools for a given frame size are kept separate, which prevents memory fragmentation. Allocation performance is much faster than straight malloc, while increasing memory efficiency for small object allocation and eliminating fragmentation. Again, this is a problem-specific approach that considers needs beyond latency.

pcwalton · on Feb 27, 2013

"AFAICT, the best we've been able to accomplish is to provide a easier path to correctness with decent overall performance."

Precisely. Which is why for performance-critical systems code it's important to give the programmer the choice of memory allocation techniques, but to add features to the language to make memory use safer.

Garbage collection is great, but occasionally it falls down and programmers have to resort to manual memory pooling. Then it becomes error-prone (use after free, leaks) without type system help such as regions and RAII.

haberman · on Feb 26, 2013

I can't speak for the grandparent, but for my part I agree with your point that allocation patterns matter and that there is no silver bullet to memory management, which is exactly the reason that GC'd languages like Go are uninteresting as systems languages. Why use a language where you have to work around one of its main features when you care about performance?

I find Rust's approach much more interesting, because GC is entirely optional, but it provides abstractions that make it easier to write clear and correct manual memory management schemes.

saidajigumi · on Feb 27, 2013

Hm, Rust keeps hitting my radar with interesting attributes like this. Time to go have a look see. Thanks!

zurn · on Feb 27, 2013

> I wish latency wasn't something the designers punted on.

The simplistic GC isn't part of the language design, it's a stopgap in the first version.

cwzwarich · on Feb 27, 2013

Do they have a standard ABI or FFI for interaction with C? If so, they probably designed the assumption of a conservative GC into it. You can always make an incompatible change, but it's a pain.

rjknight · on Feb 26, 2013

Go does look awesome. I've spent some time with Erlang, Clojure and Scala (roughly in the order that I liked them most), but Go passed the "get started writing useful code quickly" test better than any of them. Haven't gone beyond the basics yet, but I think it might occupy a sweet spot of ease of use combined with "power", loosely defined.

rudiger · on Feb 26, 2013

Erlang with OTP also does pretty well on that test; I've deployed non-trivial programs that weighed in at a couple hundred lines of code.

I agree though — Go looks awesome.

pjmlp · on Feb 27, 2013

And when you need generics...

_p62c · on Feb 26, 2013

http://www.jobscore.com/jobs/cloudflare/technical-customer-s...

Not going to lie - I'm heavily considering taking this as an entry level position to get my foot in the door.

xxdesmus · on Feb 26, 2013

Please do consider applying if you're interested. We are actively looking for qualified technical folks.

_p62c · on Feb 26, 2013

I live in Lancashire at the moment, the only issue for me would really be spending £150 on a round trip to London, do you guys do preliminary Skype interviews?

(I'm aware that the post might not be prestigious as say, engineering - however, I feel that having someone with strong web development experience (who is a user of Cloudflare already) would more than offset the slight inconvenience on your part.)

EDIT: Grammar

eastdakota · on Feb 26, 2013

We typically conduct the first interviews on the phone or Skype for interesting candidates. If it makes sense to do in-person interviews, we're happy to cover the cost of transportation for candidates we're excited about. In other words, if you're excited about working with CloudFlare, don't let the £150 stand in the way of applying.

xxdesmus · on Feb 26, 2013

We definitely do the initial interview via phone/Skype. There absolutely is room to grow and move into other areas of the company with experience. I highly recommend becoming familiar with the platform through the "front lines" of support. It gives engineers a different perspective on our service.

_p62c · on Feb 26, 2013

Ok, that sounds great! (Same to the comment above, replying to this one for continuity) - let me mull it over this week.

And I couldn't agree more, being placed in the firing lines of customers is often more telling than building the software yourself - "normal" people tend to notice things which we as developers are prone to miss or gloss over unintentionally.

-----

The awkward moment when I notice I blanked the CEO

halcyondaze · on Feb 26, 2013

Mull it over a week?

_p62c · on Feb 26, 2013

Large changes in lifestyle, such as a complete relocation, new job, etc, should not be something undertaken lightly - if I interviewed and got offered the job I would be under a lot of pressure, which I can mitigate now by thinking more carefully before undertaking anything, I wouldn't want to waste both my time and the time of the people at Cloudflare by making an important decision without carefully weighing the pros and cons.

23david · on Feb 27, 2013

How about just emailing them? I feel like I've seen these kinds of 'wow I where do I apply for a job' posts on cloudflare news articles before. Smells like astroturfing.

_p62c · on Feb 27, 2013

I have emailed them.

laumars · on Feb 26, 2013

The title only suggests something unique about Go to those who didn't read the article.

There's a good chunk of that article dedicated to discussing the language choice and how other languages could have been used instead but -in this specific instance- wasn't chosen. The language choice is as much a part of the topic as the compression routines themselves. So it makes a lot of sense to include the term 'Go' in the title given that's a large focus of the article.

It's really no different to all these articles that spring up about fancy demos being built in Javascript or CSS tricks. Yet in those instances nobody says "the title is misleading. You could write that demo in C++ as well."

nickolai · on Feb 27, 2013

>The title only suggests something unique about Go to those who didn't read the article.

The thing is, many people use the title to determine whether the article is worth reading. As is, the title suggests that there is something unique about Go that reduces the bandwidth needed by the program, implying that this is something that other common languages fail to achieve. This is obviously impossible (any widely used language is capable of serializing an output byte stream in any way the programmer desires). As a result, the title sets off the alarm for "Language fanboyism", and "mathematically impossible claims", and goes swiftly into the "don't bother" pile together with "universal lossless compression algorithm invented!"[1], "perpetual motion machine" and "My favourite X language is faster that C/C++/Assembler!1!1"

[1]http://en.wikipedia.org/wiki/Pigeonhole_principle

laumars · on Feb 27, 2013

> The thing is, many people use the title to determine whether the article is worth reading.

That same argument could be used for having the language in the title as people who are not interested in programming are going to be less interested in a thread about programming.

And language fanboyism is going to happen with or without this title (given the content of the article). What's happening here is more a case of lazy members wanting to commentate on articles they've not even read. It's basically the lowest form of blogging.

vanderZwan · on Feb 26, 2013

I like Go, but that bothered me too.

From my personal point of view I'm happy they used it though, because it means more people tinkering with and improving the language.

_p62c · on Feb 26, 2013

I've never used go professionally and most of my spare time is split between C++ and Scheme at the moment, but when I did go spelunking with Go, I found it a breeze to write complicated functionality in it - it felt like C++, but easier and more initially powerful.

I still feel that C++ is generally a better choice, but if I only had a short time to write something in, I would definitely go for Go.

pivo · on Feb 26, 2013

I'm curious, if you can get things done more quickly in Go, why do you feel C++ is generally a better choice? Performance?

jlarocco · on Feb 26, 2013

For me, I just don't see enough benefits of Go over C++. I already know how to use C++ in a way that avoids or mitigates the problems Go solves. With C++11 support starting to take off Go's advantages are even smaller.

On the other hand, if I didn't know C++ and I was looking for a native compiled language to learn, I'd probably choose Go over C++.

abraininavat · on Feb 27, 2013

Depending on what you're doing, the libraries make a giant difference. Look over Go's standard library packages and then imagine what a pain in the ass it would be to find and manage all the separate C/C++ libs it would take to replicate all that functionality (or to write it yourself).

_p62c · on Feb 26, 2013

Performance, I also feel more in control, it's hard to describe the feeling, it's just as though Go is providing an abstracted interface to the hardware, where as C++ provides raw, unfiltered but potentially dangerous access.

Perhaps it's just my personal experience though.

vanderZwan · on Feb 26, 2013

That kind of fits Rob Pike's explanation for why Go isn't more popular with C++ programmers (although he gave it a negative spin).

In a neutral way, it's like: if you spent all that effort mastering this language to get such fine-grained control, why would you give that up again? And really, I understand: why would you give that up? Especially if you know how to use C++ in a fairly painless way.

_p62c · on Feb 26, 2013

Do you have a link to that? I would be interested to read it.

I also couldn't agree more, but normally, people retort with the Hammer and Screwdriver argument.

mseepgood · on Feb 26, 2013

http://commandcenter.blogspot.de/2012/06/less-is-exponential...

pjmlp · on Feb 27, 2013

> Performance, I also feel more in control, it's hard to describe the feeling, it's just as though Go is providing an abstracted interface to the hardware, where as C++ provides raw, unfiltered but potentially dangerous access.

Funny I though C++ did exactly the same thing. Where are the L1, L2 and L3 caches references, multiple opcode execution pipelines, processor instructions ?

eru · on Feb 27, 2013

> where as C++ provides raw, unfiltered but potentially dangerous access.

Wouldn't that be assembly? Last time I heard Stroustrup, he was all raving about abstractions, not raw unfiltered access.

pjmlp · on Feb 27, 2013

Generic code?

calinet6 · on Feb 26, 2013

Go is used, sure, but the cool part about this is the binary Railgun protocol. Really smart. Send only file hashes and binary diffs back and forth, do a little extra computation to figure out the changes, but only send the absolute minimum data you need to the CDN. That's just smart, and frankly, I hope other CDNs have been doing this already, because at any high volume it seems to be an obvious solution.

So that brings up the question—is this just something CloudFlare is announcing for the PR, or is it actually innovative?

0x0 · on Feb 26, 2013

It sounds like they reinvented rsync to me?

jgrahamc · on Feb 26, 2013

No, because we have more information than rsync does. We own both ends of the connection and can keep versions synchronized.

0x0 · on Feb 26, 2013

That sounds interesting, could you elaborate on how it is different from rsync though? "Keep versions synchronized" is a bit vague

jgrahamc · on Feb 26, 2013

The piece in the CloudFlare network and the piece in the customer network are able to keep track of which page versions they each have and so the part in the CloudFlare network sends a request saying "Please do GET /foo and compress it against version X". That means that at request time there's no back-and-forth between the components deciding what compression dictionary to use.

DannyBee · on Feb 26, 2013

Well, no good binary delta algorithm uses compression dictionaries anyway (since they are binary deltas, not compression algorithms :P), except to compress the newly added strings, which you can't avoid.

Note of course, that relying on the data not being corrupt on the client (which you must if you assume the compression dictionaries are sane) is dangerous. I assume you guys must store some checksum that you compare once to make sure when someone says "i have version 5, delta against this", that they really have a good copy of version 5?

SVN used to what you are suggesting, btw. We only send clients deltas against the versions they already have, and precompute them in some cases :)

jgrahamc · on Feb 26, 2013

I assume you guys must store some checksum that you compare once to make sure when someone says "i have version 5, delta against this", that they really have a good copy of version 5?

Yes.

huhtenberg · on Feb 26, 2013

For what it's worth, this is fairly standard binary patching approach as used in software updates. I am aware of at least two mainstream titles that do this, and I'd be surprised if Firefox, for example, doesn't push updates this way.

(edit) That's an awesome name by the way. Railgun.

jgrahamc · on Feb 26, 2013

Can't claim credit for the name. I wanted to call it Rocket Sled.

eastdakota · on Feb 26, 2013

I'm glad we didn't call it Rocket Sled.

0x0 · on Feb 26, 2013

A bit like rsync's --fuzzy or --compare-dest then?

jgrahamc · on Feb 26, 2013

Well, fuzzy tries to find something to use as a 'destination' file so it can send across some hashes. Railgun has more complete information because it is keeping synchronized and thus the part making a request can specify the dictionary to compress with in a single hash.

0x0 · on Feb 26, 2013

Thanks for the explanation, that does sound useful! :)

IheartApplesDix · on Feb 26, 2013

Don't you understand? We need you to accept that this is a new technology and a ground-breaking algorithm and a new innovative (and valuable, non-obvious) technology. CloudFlare was established in 2007 with the goal to develop a faster, safter, better internet. CloudFlare, the web performance and security company, set records this month hitting more than 100 million daily active users and more than 50 billion monthly page views!

abcd_f · on Feb 26, 2013

> we have more information than rsync does.

To what end? Rsync too works off both copies.

DannyBee · on Feb 26, 2013

rsync is going to perform checksums on blocks to see if the blocks are the same. It transmits these checksums, and where the checksums differ, it deltas the blocks. Note that insertion/deletion in a file can push block boundaries off between two files, causing a problem known as "stream alignment", which can cause your binary delta to be much larger because it doesn't realize the block really shifted 16384 bytes over (or whatever), and so it thinks the client really doesn't have any of the bytes of that block.

In any case, if you know the files are related, you

1. Don't need to do any of this. You can simply send the binary delta that is is usually copy/add instructions (IE copy offset 16384, length 500 to offset 32768)

2. Can precompute the deltas.

You can actually precompute in any case, it just makes no sense unless you know you will be diffed against something else.

huhtenberg · on Feb 26, 2013

I always thought rsync detected block moves and that's what made it a worthy PhD thesis.

zobzu · on Feb 27, 2013

I thought that too. It'd be interesting to see a comparison of the two software designs with actual difference in resource usage (cpu, io, bandwidth).

That would be really cool in fact.

DannyBee · on Feb 27, 2013

Yes, I simplified and I shouldn't have. It does detect them, but it does have a minimum size of block move it can detect due to the signature matching method.

dubya · on Feb 26, 2013

I think it's more like rsync + git. You have copies of previous versions, and just ask for their hash to figure out which previous version to diff against the current version, then send the diff.

calinet6 · on Feb 26, 2013

I think that's a good analogy; it's rsync with versioning. Very cool.

judofyr · on Feb 26, 2013

Not sure why you're downvoted. This is basically what rsync is: a smart algorithm for doing rolling checksums and only sending diffs.

coldtea · on Feb 27, 2013

Because every diff algorithm "reinvents rsync", right?

bitcartel · on Feb 26, 2013

The bandwidth reduction is due to use of a binary protocol, not Go. It just so happens the server code is written in Go and C.

From the article:

“Go is very light,” he said, “and it has fundamental support for concurrent programming. And it’s surprisingly stable for a young language. The experience has been extremely good—there have been no problems with deadlocks or pointer exceptions.” But the code hit a bit of a performance bottleneck under CloudFlare’s heavy loads, particularly because of its cryptographic modules—all of Railgun’s traffic is encrypted from end to end. “We swapped some things out into C just from a performance perspective," Graham-Cumming said.

“We want Go to be as fast as C for these things,” he explained, and in the long term he believes Go’s cryptographic modules will mature and get better. But in the meantime, “we swapped out Go’s native crypto for OpenSSL,” he said, using assembly language versions of the C libraries.

_p62c · on Feb 26, 2013

On another note, it's always nice to see such an influential part of the HN community giving quotes for sites like this - not only does it make me a little proud to be associated with any of you, it makes me more hopeful for the chances of my future that I can call myself one of us.

lclarkmichalek · on Feb 26, 2013

Success by association seems about as valid as guilt by association.

jgrahamc · on Feb 26, 2013

The binary protocol means we don't add (much) overhead, the bandwidth reduction is because we are sending page diffs which themselves are encoded in a compact binary format.

sigil · on Feb 26, 2013

Question for jgrahamc: how much more efficient is your binary delta algorithm than cperciva's bsdiff [1]?

I assume since you've got the preimages of compression, as well as control over the compression format, that the diff and patch operations are much more efficient in space and time than they would be with arbitrary binary data. But...by how much?

[1] http://www.daemonology.net/bsdiff/

jakubw · on Feb 26, 2013

bsdiff is not a general purpose binary delta algorithm, it's targeted at executables. When you change a single line in the source code of a program and recompile it, bsdiff produces a small diff, even though a normal binary diff between the old and new executable would be huge due to how even a single extra instruction can cause many more addresses to shift. bsdiff wouldn't be particularly useful here.

sigil · on Feb 26, 2013

This is true. Re-reading the bsdiff paper, it's pretty tailored to executable file formats.

http://www.daemonology.net/papers/bsdiff.pdf

cperciva · on Feb 26, 2013

It works fine on non-executables too. Executables are the hard case, that's all.

j_s · on Feb 26, 2013

I am particularly interested in this aspect of the discussion (explaining the process leading to deciding to develop a new tech in-house instead of re-using any existing approach). In an ideal world there would be plenty of experimentation with real-world data to justify things, but I don't read about that happening too often.

sigil · on Feb 26, 2013

Agree with you there on "profile first." But knowing jgrahamc, he did -- and I'd love to know the results.

jgrahamc · on Feb 26, 2013

:-) Yes, that's very true.

Initially, I wasn't actually planning to do deltas for the compression technique and it was in testing with a whole bunch of common sites that I stumbled upon the fact that they don't change very much. That lead me to wonder about the algorithms that might be used.

I did test quite a lot of stuff (and at one point thought I'd come up with a truly cool new algorithm only to realize that I was mistaken :-) to decide what to do.

Railgun has to trade off three things: compression efficiency, space and time. Because we are trying to do this for performance time is the most important thing to optimize for, followed by efficiency, followed by space. bsdiff is very, very good at delta compressing binary things; Railgun isn't as good, but it's very, very fast.

thristian · on Feb 26, 2013

Out of curiosity, can you say anything about the algorithm you are using?

A year or two ago I got quite interested in delta compression, read all the papers I could find on the topic, and eventually came up with an algorithm that seems pretty competitive, although I've mostly focussed on efficient compression rather than speed. Someday I'll get around to porting the code from Python to C and find out what the performance is really like.

For what it's worth, my code is here: https://gitorious.org/python-blip

silvertonia · on Feb 26, 2013

Could be very cool. I couldn't get through the article because it read like a press release. Maybe if someone who hasn't been spoon-fed the story reports on it, I'll take notice.

0xbadcafebee · on Feb 26, 2013

I don't know why you're being downvoted, the article is written pretty shittily. The article is mostly just quotes from jgc and the CEO and some filler by the writer.

Also the assertion that "It has already cut the bandwidth used by 4Chan and Imgur by half" sounds disingenuous and possibly not backed up by moot's quote “We've seen a ~50% reduction in backend transfer for our HTML pages (transfer between our servers and CloudFlare's),”. Is backend transfer for HTML pages the only bandwidth they're using? Is the rest of it halved, and if so, how and why?

The title of the story also makes me gag.

justinsb · on Feb 26, 2013

I think this is just RFC 3229, with a binary protocol (?) http://www.ietf.org/rfc/rfc3229.txt

I've always thought there were some potential attacks there around cache disclosure (which Google avoided by going with SDCH instead).

CloudFlare controls the server and the client, so they don't need to worry about the attacks or about persuading everyone to adopt their RFC.

glymor · on Feb 26, 2013

How large is the per site cache? Are cookies part of the hash (and if so how do you strip meaningless cookies)?

Otherwise the this is more compelling for content sites like the referenced 4chan. But still very cool.

justinsb · on Feb 26, 2013

Presuming this is RFC 3229, this is transport compression, not webserver offload.

The response is generated by the origin webserver as normal. But rather than sending that response using the normal HTTP encoding, instead the proxy first does a binary diff against any versions that the (CloudFlare) client says it has and that the (CloudFlare) proxy also has in its cache. They use e.g. ETags or MD5 to uniquely identify the entire response content.

You can still do cookie stripping etc to try to avoid the request to the webserver altogether, but that's a separate concern.

jgrahamc · on Feb 26, 2013

There isn't a per-site cache in Railgun because it's part of our large shared in-memory cache in our infrastructure.

Currently, cookies are not part of the hash.

We have customers of all types using Railgun. As an example, there's a British luggage manufacturer who launched a US e-commerce site last month. They are using it to help alleviate the cross-Atlantic latency. At the same time they see high compression levels as the site boilerplate does not change from person to person viewing the site.

What sort of sites do you think it doesn't apply to?

glymor · on Feb 26, 2013

> What sort of sites do you think it doesn't apply to?

Single page webapps. In those cases the html/js is normally static and already CDN'ed and the data is a JSON API which varies on a per user basis.

There would be some gain as the dictionary would learn the JSON keys but I doubt it would be very dramatic vs deflate compared to the content sites referenced in the article.

justinsb · on Feb 26, 2013

Surely there is a per-site cache on the origin server (in what you call the "Listener")?

jgrahamc · on Feb 26, 2013

Yes. That's up to the particular configuration of the site. It varies from site to site, but for optimal results you want it big enough to keep the content of the common pages of your site.

songgao · on Feb 26, 2013

I'm curious about the crypto part. Could anybody explain to me, if it's a HTTPS link, where does SSL encryption happen? Does Railgun listener talk with the origin server over HTTP or HTTPS?

If it's HTTP, then how does CDN handle certificates? Does it use CDN's certificates?

If it's HTTPS, then 1) Isn't hash gonna be a lot different if if the two versions are very alike? 2) Why does Railgun encrypt the encrypted data again?

jgrahamc · on Feb 26, 2013

The link between CloudFlare and the customer network (i.e. between the two bits of Railgun) is TLS. We have an automated way of provisioning and distributing the certificates necessary for that part.

For the connection from Railgun to the origin server it will depend on the protocol of the actual request being handled. If HTTPS Railgun makes an HTTPS connection to the origin.

songgao · on Feb 26, 2013

Thanks! That makes sense now :-)

xanadohnt · on Feb 26, 2013

The change detection algorithm is clever. But this is a classic memory vs. processor problem. The real trick here is that the Railgun service instantly adds massive amounts of cache to your service; it just so happens - if their claims aren't inflated - adding these additional resources to your service is transparent. This has nothing to do with Railgun being developed on Go.

tuxidomasx · on Feb 26, 2013

Other than general traffic data compression, I've always been somewhat interested in html compression in particular.

I know lots of webservers zip their response data, but I was always curious about the things in html that show up very often and if there's a way to optimize around that.

For example, most web xml data contains a lot of common tags, like "div" and "span" and others that are specific to html. I think if you add them up, they might make up a considerable percent of traffic data. Is it possible for the web server to swap those out for a single character before it sends the data, and have the browser replace it when it arrives?

Or does zip compression already do that somehow?

wisty · on Feb 26, 2013

Yeees, no.

Zip will replace the common tags (like "div") with a single "div" (in the compression dictionary), then a single character every time it appears (more or less - it might be less than a single byte if it's a really common tag). So there'll be a wasted overhead of a dictionary of common tags (which is kind of wasted).

It would be more efficient if both the browsers and compression algorithms could agree (beforehand) had a dictionary of common terms which would be likely to appear in the document.

If you're compressing a lot of data which is likely to be similar, you can do this with a common dictionary. See - http://stackoverflow.com/questions/479218/how-to-compress-sm...

Of course, my answer on Stackoverflow is pretty crude. You could create a dictionary used to compress the compression dictionary. Google is probably going to do this any time soon (if they haven't already) since they control the client (Chrome), server (google web server) and protocol (SPDY).

shotgun · on Feb 27, 2013

I see that the article is tagged "open source." Is CloudFront going to open source Railgun? Publish any papers?

This isn't an announcement about companies supporting Railgun...it's about companies supporting CloudFlare by installing the Railgun Listener.

cobrabyte · on Feb 26, 2013

This is the third time this week that I've read or heard about Communicating Sequential Processes (CSP), the formal programming language devised by Sir Tony Hoare.

Third time's a charm. Definitely going to have to investigate.

coolj · on Feb 27, 2013

> Today, [cloud providers Amazon Web Services and Rackspace, and thirty of the world’s biggest Web hosting companies] announced that they will support Railgun...

I can't find any such announcements; anybody have links? Based on comments further down, I wonder if the author is confused.

> CloudFlare will provide software images for Amazon and RackSpace customers to install

That is very different from the claim in the first paragraph.

eastdakota · on Feb 27, 2013

Amazon and Rackspace customers need to install the software themselves (for now). The other listed hosts have made it one-click simple without the customer having to install anything. A couple announcements from major hosts today:

Dreamhost: http://dreamhost.com/dreamscape/2013/02/26/cloudflare-railgu... Media Temple: http://weblog.mediatemple.net/2013/02/26/the-web-just-got-fa...

pjmlp · on Feb 27, 2013

Another Go PR story.

Same thing could be easily achieved using futures or any of the asynchronous libraries available to C++, Ada, JVM and .NET languages.

DoubleCluster · on Feb 26, 2013

This is WAN optimization, right? This is already being done but usually for (VPN) connections to other branches of a company.

0xbadcafebee · on Feb 26, 2013

No. This is basically binary diffing and compression.

Edit: err, you are correct, I didn't realize WAN optimization included binary diffing and compression. Should google before I comment.

zobzu · on Feb 27, 2013

Uho. Binary protocol. The problem being, it's actually bringing financial advantages over HTTP. HTTP has the advantage of being standard, simple, plain text and thus easy to work with.

Hopefully http2.0 will attempt solving this.. erm...

radd9er · on Feb 27, 2013

is your concern that the proprietary protocols will take over the web?

zobzu · on March 3, 2013

not in particular. complex binary protocols while slightly more efficient are much harder to use, understand, and design properly.

philiac · on Feb 26, 2013

The article mentions how this compression technique is similar to image compression. Would anyone care to explain, in detail if necessary, how this is so? Thanks.

radd9er · on Feb 27, 2013

I think its because a whole bitmap isnt streamed for every new frame, just a diff telling the player about the parts of the map that need updating.

jamieb · on Feb 26, 2013

FTA: "If it was written in C++, it would be threaded code"

Uh, why?

wmf · on Feb 26, 2013

Because many people find threads easier to understand than callbacks?

jussij · on Feb 27, 2013

Because that’s one approach to getting the most out of all of those multiple core CPU servers.

For Go that came for free because its Communicating Sequential Processes design does that for you.

abraininavat · on Feb 27, 2013

Came for free? Go takes advantage of multiple cores by using threads. CSP doesn't magically multiplex your code onto your cores.

jussij · on Feb 27, 2013

> CSP doesn't magically multiplex your code onto your cores

Take a look at this Rob Pike video: http://blog.golang.org/2013/01/concurrency-is-not-parallelis...

Now that video might well be crap, I'll be the first to admit I'm not skill enough to know one way or the other.

But based on that video, it does appear to me that Go does offer some form of multi-core magic and it does appear to come at a minimal cost.

abraininavat · on Feb 27, 2013

It's not magic. It's threads. Go multiplexes your goroutines onto N OS threads. There are also abstractions in C/C++ (though of course as libs, not part of the language, like in Go) which hide the usage of threads. But there is no magic. If your code is running in parallel, your code is using OS threads.

corresation · on Feb 26, 2013

I was just looking into what SDCH is (an accept-encoding option from Chrome) and it sounds very, very similar: It generates a dictionary and then uses VCDIFF between requests. Is this related somehow?

jgrahamc · on Feb 26, 2013

Vaguely. Both Railgun and SDHC work by compressing web pages against an external dictionary. In SDHC the dictionary must be generated (somehow), and it is intended for use between a web server and browser. Railgun is back-end for our network and automatically generates dictionaries.

http://calendar.perfplanet.com/2012/efficiently-compressing-...

jws · on Feb 26, 2013

Is anyone aware of a performance analysis between SDCH and one of the dynamic compressions like deflate?

I google, but all I find is people complaining their proxy/filter/appliance/diagnostic is breaking because it doesn't understand SDCH.

It seems like SDCH has been around for 4 years, I presume the lack of data means it hasn't worked out.

(I imagine that you could drastically reduce the CPU load of compression by making simple hard coded state machines for each dictionary. For content like XML or json you could easily make your field names and surrounding punctuation minimal. For many very short messages sharing a dictionary that would beat deflate on compression ratio, and for long messages of non-repeating field values it wouldn't be much worse. CPU use of expansion is probably comparable, though you might get better memory access behavior out of SDCH.)

corresation · on Feb 26, 2013

What you describe is exactly what I've been looking for. There are remarkably few resources on this.

We have users in Singapore who access various XML-heavy web services in our NY office. A dictionary-style over-multiple-requests compression technique would be brilliant for their case.

packetslave · on Feb 26, 2013

Take a look at the various WAN accelerator appliances (Cisco, Silverpeak, Riverbed). They do almost exactly what it sounds like you want (if I'm remembering back to my evaluations, Cisco at least uses a multi-request dictionary for their compression)

emmelaich · on Feb 26, 2013

I was going to mention Riverbed, glad someone else did.

They've saved a huge amount for us (I think of 90%) of AJP (http<->tomcat) traffic. Not particularly difficult to set up.

corresation · on Feb 26, 2013

Riverbed looks ideal but aren't they incredibly expensive? We looked at it years ago and I believe the necessary endpoints in our data center and in Singapore pushed past $140,000.

packetslave · on Feb 27, 2013

Riverbed (and all of the players in this space, really) are quite expensive, but this is where you get into the whole "Total ROI" argument for justifying the purchase.

Most companies depreciate hardware over 3 years. How much WAN/Internet bandwidth will you NOT use over the next 3 years, and how does that translate into upgrades you won't need to make?

There are also arguments for these boxes along the lines of "right now we use really expensive WAN links, but these boxes do end-to-end encryption too, so we can put the traffic on the Internet instead" but that opens up a few obvious cans of worms (and can of course be done without an accelerator with VPNs and whatnot).

Then you get into the more nebulous arguments that big bosses tend to like, such as "The average user makes Y XML requests per day to process X widgets. Each request takes Q seconds now. If we lower that to Q*0.5 with WAN acceleration, each user can now process N more widgets per day". Fluffy argument, but can have a big impact on business decision makers, especially if you can tie it to a dollar amount.

Note that WAN Accelerator salespeople are really, really good at coming up with arguments like this for/with you during the sales process.

wmf · on Feb 26, 2013

Also see http://rproxy.samba.org/

corresation · on Feb 26, 2013

That is superbly illuminating. Thank you.

jgrahamc · on Feb 26, 2013

It seems like SDCH has been around for 4 years, I presume the lack of data means it hasn't worked out.

A barrier to implementation of SDCH is deciding what dictionaries to create and when to update them.

songgao · on Feb 26, 2013

Railgun is used between CDN and http server, while this one seems to be between browser and http server.

Railgun only requires website to deploy a client, and cooperate with cloudflare. User's client doesn't have to be Chrome or whatever; web server doesn't need to be aware of Railgun. It's transparent to both HTTP clients and HTTP servers. SDCH, however, requires a modification to HTTP/1.1 protocol, which implies changes in both HTTP clients and HTTP servers.

Both are quite promising, though, Railgun is easier to adopt.

newman314 · on Feb 26, 2013

Until apache or nginx implements it, I wouldn't really see SDCH as gaining any real traction.

AFAIK, Chrome is the only one supporting it from a browser standpoint.