Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Our (https://bestprice.gr/) services/“programs” generate three different types of events:

- Short events (no longer than ~1k in size) where the cost to generate and transmit is very low(sent to dedicated service via UDP). We can generate dozens of them before we need to care about the cost to do so. They are binary-encoded. The service that receives those datagrams generates JSON representations of those events and forwards them to all connected clients(firehose) and also publishes them to a special TANK partition.

- Events of arbitrary size and complexity, often JSON encoded but not always -- they are published to TANK(https://github.com/phaistos-networks/TANK) partitions

- Timing samples. We capture timing traces for requests that may take longer-than-expected time to be processed, and random samples from various requests for other reasons. They capture the full context of a request(complete with annotations, hierarchies, etc). Those are also persisted to TANK topics

So, effectively, everything’s available on TANK. This has all sorts of benefits. Some of them include:

- We can, and, have all sort of consumers who process those generates events, looking for interesting/unexpected events and reacting to them (e.g notifying whoever needs to know about them, etc)

- It’s trivial to query those TANK topics using `tank-cli`, like so:

  tank-cli -b <tank-endpoint> -t apache/0 get -T T-2h+15m -f "GET /search"  | grep -a Googlebot
This will fetch all events starting 2 hours ago, for up to 15 minutes later, that include “GET /search”

All told, we are very happy with our setup, and if we were to start over, we’d do it the same way again.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: