The wired version of this device was in human trials ~10 years ago through cyberkinetics, a startup spun out of brown to commercialize the technology. See NYT article from 2004 talking about the start of their clinical trials: http://www.nytimes.com/2004/04/13/health/13BRAI.html
This is not new technology. A company called cyberkinetics commercialized this 12 years ago and made it work in humans. See their coverage in wired from 2003.
Definitely an impressive benchmark by any standard. However, there are some things to be aware of:
1) They used Infiniband interconnects. Running on ethernet is likely to yield less impressive results.
2) Their benchmark does simple primary key lookups. If you start doing joins or transactions that need to hit multiple data nodes, things will slow down. Depending on your workload, this may or may not be an issue.
3) NDB is an in-memory storage engine, so you're limited to the aggregate RAM in your cluster for max storage size.
4) AFAIK, MySQL Cluster doesn't re-balance, you need to pre-determine how data is partitioned and changing it at runtime is hard. I don't know if this has changed in the later releases.
For point #3, NDB has supported on disk non-indexed attributes for a while now (2-3 years?). So you just need to be able to fit indexes in memory which is a much smaller dataset, but still limiting.
I'm sure for the benchmark it was all in memory though.
Also to add to #3: The NDB C++ API doesn't use raw SQL queries either (it uses a lower lever of abstraction for accessing the database), so it avoids the overhead of having to parse SQL queries. Most production systems and third-party libraries use SQL queries.
OK, you hooked me with the title. But "FreeBSD + Erlang" was kind of a dissatisfying reason for how you achieved it. Would love to hear more details! How far we've come since http://www.kegel.com/c10k.html
How does kqueue compare to epoll on Linux? I've written C code using kqueue on OpenBSD and OS X, but have only used epoll via libev (and not at especially high load). I thought the big change came from trading level- for edge-triggered nonblocking IO, but maybe the kqueue implementation is superior for sockets somehow?
The main advantage Erlang has over C/Python/Ruby/etc. is that asynchronous IO is the default throughout all its libraries, and it has a novel technique for handling errors. Its asynchronous design is ultimately about fault tolerance, not raw speed. Also, it can automatically and intelligently handle a lot of asynchronous control flow that node.js makes you manage by hand (which is so 70s!).
You can make event-driven asynchronous systems pretty smoothly in languages with first class coroutines/continuations (like Lua and Scheme), but most libraries aren't written with that use case in mind. Erlang's pervasive immutability also makes actual parallelism easier.
With that many connections, another big issue is space usage -- keeping buffers, object overhead, etc. low per connection. Some languages fare far, far better than others here.
Yes I would say kqueue, the interface, is superior to epoll. Kqueue allows one to batch modify watcher states and to retrieve watcher states in a single system call. With epoll, you have to call a system call for every modification. Kqueue also allows one to watch for things like filesystem changes and process state changes, epoll is limited to socket/pipe I/O only. It's a shame that Linux doesn't support kqueue.
I fully agree that kqueue is awesome, but what specifically is broken on OSX? I've used in extensively on that platform, and haven't run into any showstoppers.
Yeah, the lack of support for TTYs can be annoying when writing a terminal application (a workaround for some cases is to use pipes), but it hardly qualifies as a significant problem for writing network applications.
Here's a more interesting bug in OSX: kqueue will sometimes return the wrong number for the listen backlog for a socket under high load.
How does it work? Do users provide a buffer and the kernel fills the buffer with data and notifies the user when ready?
That is more akin to AIO Linux system, then? Otherwise, epoll/poll/select just notifies users when data is available but the actual copy is done by the user. Surprisingly this can make a huge difference when streaming large amounts of data.
We have argued here before and I have gotten downvoted into oblivion for being pedantic and distinguishing between asynchronous IO and non-blocking IO but it looks like that extra user-space memcpy can make a huge difference.
I can't find anything about this now, just spent a good 20 minutes searching for it. I guess keywords kqueue, buffer request http are too generic in some sense. :-/
Anyway, the idea was to avoid context switches by waiting/parsing in kernel-side till there was enough data for the client to do something else that just another gimme_more_data()-call back to the kernel.
It could even be applied to other methods than kqueue, so perhaps I remember a bit wrong that this was just for kqueues.
There might well be errors here, but what those errors might be is not stated.
From the cited article on ports of the libev event loop library: "The whole thing is a bug if you ask me - basically any system interface you touch is broken, whether it is locales, poll, kqueue or even the OpenGL drivers." with no particular details on what is broken in Mac OS X.
Issues with porting to AIX, Solaris and Windows are also discussed in that article, and with reports of errors, though with no specific details provided for those platforms.
Without error details, there is also insufficient information around whether alternatives or workarounds or fixes might exist, or whether there were bug reports and reproducers logged that would allow the vendors to address the (unspecified) errors.
It's useful to differentiate between "SQL Databases do not scale" and "SQL Databases do not _cost effectively_ scale". The second argument is more accurate.
Vertical scaling of a DB is definitely an option for many people and has been used to scale many applications. However, the cost curve associated with buying bigger and bigger hardware is super-linear; doubling CPU & Memory in a single system leads to more than doubling the hardware cost. This can be problematic for many businesses whose revenue growth is exceeded by cost growth of the database.
Sharding is also an option for scaling, leveraged to great success by Facebook, Yahoo, and many others. However as the article points out, sharding prevents the developer from using many of the features that make a relational database a productive development environment. There are lots of foot guns that emerge in a sharded SQL environment and if you have not set up your development constraints appropriately, you can slow the pace of development considerably. This again leads to a cost problem because the incremental costs of adding features grows as you add more things like sharding around your database.
SQL is not useless and not hopeless. In a large number of cases, SQL is the right solution. However the techniques used to scale SQL tend to be options only to very large budget organisations. NoSQL solutions tend to be more cost effective in their scaling approach (scale out vs. scale up) without crippling the developers productivity. For these reasons, NoSQL solutions tend to be the better choice for the cost-conscious.