Ask HN: Databases for real time stats and analytics?

davismwfl · on March 8, 2019

I have done some work on these types of applications back before NoSQL was the thing. Essentially we did use SQL as a backing store for data analytics but never for the real-time display.

For real-time display and queries we used a combination of Queues and Cache. Where data would come in and be written to the fifo queue and cache simultaneously and then be served to clients from the cache, while the queue would guarantee order as we would write data into the database and handle transactions. Building this type of system reliably is a ton of work. In our case too, the queues were persistent to survive crashes etc and we used an object database called Versant back then for the queue backing.

Today, given all the new choices, it would really depend on the real performance requirements, not the perceived ones. 95% of the time using something like Elasticsearch or Cassandra would probably work just fine. But if the performance is truly demanding like stock quotes can be, where literally every millisecond counts and order is critical, then I'd probably go back to a queue/cache and maybe back it by Elasticsearch for ad-hoc queries and SQL for long term data storage/reporting. But there are so many good options now that it would really boil down to the use case, and requirements.

npalmer · on March 8, 2019

Have a look at Elasticsearch. https://www.elastic.co/products/elasticsearch

Disclaimer: I work for Elastic.

IpV8 · on March 8, 2019

Elastic rocks! I've been using it for a new project and am really digging it.

You guys should build in some better profiling tools for logstash. I'd love to be able to see how much time each filter rule took to know where to spend my time optimizing. Since its all in a java virtual machine its tough to see things like how many threads are working hard vs idle and how much memory each is using etc.

npalmer · on March 12, 2019

We do pull out a lot of metrics as well as how well your Logstash pipelines are performing.

Feel free to drop me an email (in my bio) I’d love to help if I can!

IpV8 · on March 12, 2019

Just stumbled upon this: https://www.elastic.co/blog/monitoring-logstash-filters. It may be exactly what I was looking for.

nikonyrh · on March 8, 2019

A great product and still evolving! :) It is crazy how powerful it becomes with Kibana, but you still have the option for DIY graphics and integrations!

selmat · on March 8, 2019

Do you know TICK stack - Inflix DB?

web: https://www.influxdata.com/time-series-platform/

codegeek · on March 7, 2019

I haven't used them personally but rethinkdb looks good

https://rethinkdb.com

RocketSyntax · on March 7, 2019

Streaming frameworks like Kafka have persistence capability in key value stores like RocksDB. You can grab the latest data from there.