GWT generates very performant code, even in high-CPU scenarios like hashing/encryption (and please don't spam the Matasano article about dangers of encryption in JS - it is not relevant here).
I implement high-performance software systems in C++ as my day job. The software has to compile and run on Linux, Solaris, and AIX. The same code is 2x slower on AIX (Power) and 3x-5x slower on Solaris (Sparc) than on Linux (x86). So say whatever you want about theoretical differences in architectures, but in the real world Sparc and Power systems are absolutely not competitive, both on price (absolute $$, and per CPU) and performance (per CPU - they do have more cpu cores, usually).
That's just a circular way of saying "x86 is more popular, therefore better." Which doesn't address the person aboves' point that x86 is inferior in terms of its design.
Of course x86 is going to be faster per dollar spent. One is mass market (x86-64) and the other two are hugely niche (Sparc and Power). Plus the Linux kernel has by far the most human-hours spent on its development relative to every other operating system in the world.
There's also a reason why some of x86's market share has been eaten up by ARM. Moving from x86 to ARM was hugely expensive by all measures, but it was worthwhile because x86 was so wasteful.
It's not just "x86 is more popular, therefore better." It's that the performance of x86 was better than SPARC or Power. Regardless of the cost of the chip, performance is what is really important here. In some instances, performance per watt is more important, but either way... it's performance that's key, not market forces driving cost savings.
I haven't had much experience with SPARC, but I've done some work on Power systems (long ago). Back then (10-ish years ago), Power chips were more powerful than their x86 contemporaries. But at some point, that relationship switched.
However, I wonder how much of this is the chip, and how much is the tooling. Its been awhile since I've needed to think about C/C++ compiling, but from what I remember, the Intel compiler produced (slightly) faster binaries than gcc. Now this is where popularity could prove to be decisive... if the compiler that the OP uses works for x86, SPARC, and Power, how much do you suspect each of those architectures has been optimized? Even if the non-x86 chip itself is capable of running faster than x86, if the toolchain isn't similarly optimized, they could end up having worse performance.
One would think there'd be proper competition because one of the major motivators is going paperless... it's kind of odd that in 2013 there still aren't a lot of easy to use solutions that can store sensitive documents (bills, tax documents etc) that require a great level of privacy and security.
I do too, but it doesn't fit the use case of going paperless. Ability to drop in PDFs and OCR images in Evernote as well as handling large data sets are essential features.
That's why client-side encryption is useful - even with the company (Dropbox) not leaking/selling their users' data on purpose, it is easy to inadvertently leak it.
Proper client-side encryption, while often not appropriate in critical environments, is useful to protect against this type of situations.
The problem with GWT is that it appears to be a low-priority product at Google - it took them more than a year to release v 2.5 vs 3-4 months for 2.4 and earlier. And we all know what happens to low-priority products at Google...
GWT has been open sourced and a steering committee appointed to guide future development. So it is no longer really a Google project but owned by the community.
Google has moved their resources on to Dart, which is a validation of the approach started with GWT, i.e. a higher level language that is compiled to JavaScript. It will be interesting to see where that goes, but it is difficult to imagine Dart as a language reaching the kind of maturity that Java has (libraries, community, ecosystem) for the foreseeable future.
>> Throughput here is measured in messages, each containing 100 events, so master is processing 200,000–215,000 events/sec.
So in reality it is ~ 2k messages/sec. This is a rather poor throughput, as even off-the-shelf generic web servers (e.g. nginx) have the throughput an order of magnitude higher, and proprietary systems can reach 500k messages/sec over the network.
Ah, I didn't intend for this post to reach Hacker News absent context. If you haven't been tracking Riemann, this post might not make sense. ;-)
Riemann is not an HTTP server, or anything analogous. It's an event processor, and reacts to incoming events by running them through an arbitrary set of functions. Events are the logical "requests" against the system, if you're thinking in HTTP terms. Messages are just a bundle of events for synchronous transport, and events can be repackaged in varying bundles of messages depending on latency/throughput requirements. The clients can do this for you.
For instance, the code which generated this benchmark looks like:
which is a synchronous call, returning when the event is acknowledged by the server. It's making that call 200,000 times a second (in various threads). The clients are doing all sorts of internal buffering and pipelining to make that possible--this particular test uses a batch size of 100 events/msg.
Well, I'm sure there are specific use cases where Riemann would be preferable to a generic web server. But for most developers in most situations, it is a no brainer to choose an HTTP-based protocol with off-the-shelf HTTPD server over a 10x slower proprietary system.
Er, I don't mean to be contrarian, but Riemann is anything but proprietary. I'm an unemployed OSS developer. Every bit of code from clients to dashboard to server to integration tools to the website is open source: http://aphyr.github.com/riemann/. I'm not trying to sell anything. I'm just building a tool to solve a problem I faced in developing distributed systems.
Second... I guess I can reiterate. Riemann is not an HTTP server. It's an event-stream driven monitoring system. The protocol uses existing standards (e.g. protobufs), is simple to implement, and the community has written clients for many languages: http://riemann.io/clients.html.
As an aside, I do plan on adding an HTTP interface to Riemann, but HTTP processing (and using JSON for serialization) comes with certain unavoidable costs in bandwidth, memory and latency. It'll fill a complementary space to the existing TCP and UDP interfaces.
Sorry, by "proprietary" I meant "custom". I admire people who have skills and dedication to built OSS, this was just a wrong word to use.
I completely agree that for specific uses Riemann is great. Your post was, though, about the performance/throughput, and so my comment was about the performance/throughput. Streaming messages/events over the network is an old problem, with well-known limitations, and this was what my comment was about.
Did you even read what aphyr wrote before posting this? It detailed that unlike HTTP, which in general supplies one request per message (GET, POST, etc), a Riemann message contains potentially hundreds of different requests that must be processed individually.
You can put as many "events" in the body of an HTTP POST request as you wish. What really matters in distributed messaging systems, from the performance point of view, is the number of distinct messages per second.
And if a system designer wants to send a stream of "events" to another system to be acted upon, and if this designer cares about throughput (which is assumed here, given the title of the post), then this designer is likely to choose a faster messaging system, especially if it is more flexible, due to its ubiquity and universal support, protocol (e.g. HTTP) over a custom protocol.
They're completely different kinds of software, at least as I understand them. Riemann is about pushing lots of little hashmaps (events) through a DAG of streaming functions; HTTP is about synchronous gets/puts/updates/deletes to a tree of resources.
If you are trying to make sense of Riemann in HTTP terms, sending an event to Riemann might look like POST /streams, with a body containing a single JSON object. There's no notion of GET, PUT, or DELETE though--the state inside Riemann streams has no name or external representation.
There are other components in Riemann which can be expressed as HTTP resources--the index, which is used for tracking the most recent event for a given host and service, and the pubsub system for example. Those have HTTP APIs for making a query (GET /index?q=service = "www" and state = "critical"), and a websocket variant which streams down updates for that query to you.
But as far as a general replacement, I'd say no, it doesn't make any sense. This is more akin to... a slow, insanely flexible, less complete version of Esper than an HTTP server.
Don't let the afterhours haters -- sorry, hackers -- get you down. Riemann looks to be a fantastic piece of open-source software for scratching a particular itch.
Coincidentally, I was just working on a tracing system to dump data to Riemann while eating HTTP logs and/or handling live requests from browsers. It seems to be just what we need to aggregate, monitor and graph our trace data. Thanks!
Thank you. :) If you have any questions, feel free to hop on Freenode #riemann and I'll do my best to help out.
BTW, you're not the first to wonder about streaming events directly to Riemann from client browsers. I... don't recommend it, just because I don't have the time to appropriately guarantee Riemann's performance and security characteristics as an internet-facing service (yet), but adding an HTTP POST path to (ws-server) is definitely on my list. Even if the HTTP+JSON interface is much slower than the TCP/UDP interfaces, I think it'll be plenty useful for many deployments, especially those making requests from JS.
Many start-ups are built by well-meaning people who have no formal CS or even engineering background and thus are somewhat out of touch with what it means to build a robust system. It's natural for people to focus on "what's important" and ignore boundary/edge conditions, while in reality 90% of sound engineering is getting boundary/edge cases right.
And as most of such start-ups use Ruby/Rails due to the easiness of "getting it up and running", and thus they inject the Ruby/Rails ecosystem with this "focus on what's important" mindset, important boundary issues, including security, are neglected.
Mega is not the first one. AES.io (my company) and several others have been available for some time. Mega is the first one to bring client-side JS crypto into public discussion.