I was wondering what kind of database does an application like stock charts or anything that displays real time stats uses? Does it matters if its NoSQL for real time statistics and SQL for offline statistics and analytics?
I have done some work on these types of applications back before NoSQL was the thing. Essentially we did use SQL as a backing store for data analytics but never for the real-time display.
For real-time display and queries we used a combination of Queues and Cache. Where data would come in and be written to the fifo queue and cache simultaneously and then be served to clients from the cache, while the queue would guarantee order as we would write data into the database and handle transactions. Building this type of system reliably is a ton of work. In our case too, the queues were persistent to survive crashes etc and we used an object database called Versant back then for the queue backing.
Today, given all the new choices, it would really depend on the real performance requirements, not the perceived ones. 95% of the time using something like Elasticsearch or Cassandra would probably work just fine. But if the performance is truly demanding like stock quotes can be, where literally every millisecond counts and order is critical, then I'd probably go back to a queue/cache and maybe back it by Elasticsearch for ad-hoc queries and SQL for long term data storage/reporting. But there are so many good options now that it would really boil down to the use case, and requirements.
Elastic rocks! I've been using it for a new project and am really digging it.
You guys should build in some better profiling tools for logstash. I'd love to be able to see how much time each filter rule took to know where to spend my time optimizing. Since its all in a java virtual machine its tough to see things like how many threads are working hard vs idle and how much memory each is using etc.
A great product and still evolving! :) It is crazy how powerful it becomes with Kibana, but you still have the option for DIY graphics and integrations!
For real-time display and queries we used a combination of Queues and Cache. Where data would come in and be written to the fifo queue and cache simultaneously and then be served to clients from the cache, while the queue would guarantee order as we would write data into the database and handle transactions. Building this type of system reliably is a ton of work. In our case too, the queues were persistent to survive crashes etc and we used an object database called Versant back then for the queue backing.
Today, given all the new choices, it would really depend on the real performance requirements, not the perceived ones. 95% of the time using something like Elasticsearch or Cassandra would probably work just fine. But if the performance is truly demanding like stock quotes can be, where literally every millisecond counts and order is critical, then I'd probably go back to a queue/cache and maybe back it by Elasticsearch for ad-hoc queries and SQL for long term data storage/reporting. But there are so many good options now that it would really boil down to the use case, and requirements.