Tornado is using MongoDB while Rails is using MySQL. For a test that's essentially about responding to a query with a database access, I can't figure out what I'm supposed to take from this.
* Access a database table or collection named "World" that is known to contain 10,000 rows/entries.
* Query for a single row from the table or collection using a randomly generated id (the ids range from 1 to 10,000).
* Set the response Content-Type to application/json.
* Serialize the row to JSON and send the resulting string as the response.
10,000 is a couple of orders of magnitude too small to be interesting for a primary key lookup. And encoding two integers as JSON is not exactly a test of JSON encoder performance either.
I'm not saying that this kind of multi-way-shootout benchmark is a bad idea, I'm just saying that the current rounds of results are unlikely to have much predictive power ...
Obviously I am biased about the benchmarks project, but I disagree concerning the predictive power of the rounds to-date.
Thus far, the rank order of each round, as we add more tests, remains largely consistent. The most extensive test--Fortunes--exercises request routing, database connectivity and pooling, the ORM (if available), entity object instantiation, dynamic-sized collections, sorting, server-side templates, and XSS counter-measures. On the whole, where a framework has received an implementation of Fortunes, we see roughly the same order as in the other test types.
To clarify some points:
* The magnitude of the Worlds table is intentionally small enough to easily fit into the database server's in-memory cache. This is an exercise in measuring the frameworks' and platforms' database drivers, connection pooling, and ORM performance; not the performance of the database server. As an unintended side-effect--largely thanks to the contributions of readers--the scope of the project has broadened to include some Mongo and Postgres tests so it is to a very small degree a rough comparison of the request-processing capacity of three popular database platforms. But it is expressly not a benchmark of database servers.
* The response payload is intentionally extremely small because these tests are designed to exercise framework fundamentals such as request routing and header processing among others. Increasing the payload size directly increases the number of frameworks that will saturate gigabit Ethernet. As it is, even with a trivial payload, high-performance frameworks saturate gigabit Ethernet on the trivial JSON-encoding and plaintext tests.
* A larger payload JSON encoding test type is planned for the future [1], but I would caution that it is unlikely to shuffle the rank order seen in tests to-date in any notable fashion.
> But it is expressly not a benchmark of database servers.
Oh, okay, cool. I'd misinterpreted it as being a kind of a "full stack" test. I didn't look at any of the "fortunes" examples, either, as it happens.
I can see where you're coming from, but I'd still be concerned that the concurrency and connection pooling behaviour could be quite different depending on the DB query behaviour.
What I should have said is "... predictive of how well your website will actually work under load". But nothing much is predictive of that except for trying it with synthetic data ...
You're right that it's impossible to fully disentangle the performance of the database server from the frameworks' ORMs and their platforms' drivers and connection pools. We indicate which database server is being used in each test permutation and one can make very rough observations from the data--such as "MySQL appears very slightly faster in this type of use-case than Postgres"--but like I said, that's not really the purpose of the project. If one wants to compare database servers, there are many better resources for that insight.
You're also right that nothing can predict how your application will perform under load until you build it and test it.
By testing the fundamentals of web application frameworks, however, we hope to inform a preliminary selection process (along with self-selects such as comfort level with code type and community) to give you a rough idea of capacity before you build out the full application. I feel especially that the massive spread of the performance numbers--covering many orders of magnitude as it does--is illuminating to newbies and also valuable to seasoned pros.
http://www.techempower.com/benchmarks/
I'm building a site in rails (will never need to be high throughput, and if it does i'll be rich as fuck), so this is interesting info to have.