What I find most frustrating when it comes to GPG encryption and email is the lack of support for public key encryption for generated mails. I've seen very few sites supporting GPG and if they did, I found it always just worked and I imaging not much of a big deal to setup. So why do even the biggest shops not offer this? I really would like to be able to upload my public key to e.g. Amazon. It's great to make the checkout process and everything super secure, but just to send every purchase you did and your personal data in an unencrypted mail across the web afterwards.
Thanks for posting this, it's an interesting approach. Monitoring the availability of replicas instead of individual nodes probably makes sense. However, I'm wondering how this information is actionable for your team. How would you act differently in case your monitoring reports certain consistency levels becoming unavailable, compared to just reporting 2 unavailable nodes in your cluster?
It mostly came down to flexibility. The original form of our monitoring did do basically what you suggest, "one node failure is ok, two is not", but we found that was not good enough. That approach in our experience was:
1. Noisy. We had a lot of large deployments where they had high replication factors (e.g. RF=5 or 7) and they very much didn't care if 2 nodes failed, or 3, or even 4. They had the high replication factor for resilience to multiple rack failures and didn't want to get paged by a few racks failing.
2. Hard to generalize, especially with multi-tenant clusters. Size of cluster != replication of keyspaces. For example if we had a 50 node cluster, but had a keyspace with RF=1, a single node failure should be a pageable event. Why is there a RF=1 keyspace ... because devops means that developers sometimes do things like that.
3. Had poor attribution. If you have a large cluster with many keyspaces, one of which has a lower RF than the rest or a higher consistency level, then only the owner of those keyspaces care if we lose a node or two. When we're dealing with an incident we can rope in the teams owning specifically the keyspaces that are under-replicated so they can take appropriate action.
To be totally honest, mostly it just helps us find keyspaces that have low RF ... The number of times we found out the new Cassandra version we just deployed added another system table that had a SimpleReplicationStrategy with default replication of 2 ...
That module looks pretty useful. I've been resorting to node-natural [0] and then using a quick part of speech tagger so far, but this looks a lot better and doesn't have a heap of baggage.
A while back when I used to work a lot with this stuff I wrote a library to do a lot of sentiment, tokenization, and part of speech tagging based on several corpus I came across when I realized natural didn't have what I needed at the time.