CloudWatch has the worst UX of any log management tool I've used. We are using Logentries currently, and it is so easy and intuitive.
I just don't get how AWS developers are able to accept the eyesore that is their UX. It's easy to miss things in any AWS service's UI. I keep discovering functionality even after years of use. Google Cloud UX is light years ahead. Also, Google's Stackdriver seems great, haven't used it yet though. Would be great if any Stackdriver users here share how it is better than CloudWatch or other log management tools.
I just tried it. It's just as bad as AWS UI ever has been. It shows lots of fields that it extracted from the log messages. Can you click them to add them to the search box? Nope. Right click for more options? Nope. These are things Splunk has had for years.
Teams are under control of their backlog in a way that let's them prioritise how they approach the UX/UI aspects. For instance, this is why you'll see 2-3 different UIs across AWS journeys, due to teams implementing at different paces and priorities.
Interesting -- they're going after the various log management companies (Scalyr, DataDog, Splunk, Sumo Logic, etc.).
Figured this was bound to come eventually since it's a very very big market and their basic CloudWatch product was lacking in many ways. It's not like Amazon to let an ecosystem eat their lunch.
Few things stand out:
(1) Per-query pricing seems...odd? Likely a good deal for small folks with a low volume of logs (i.e. just need to check actual AWS infrastructure logs vs. application logs), but if you have any actual volume this gets absurdly expensive ($0.005/GB scanned = $5 per query if you need to scan a terabyte. Large enterprises ingest multiple terabytes per day.)
(2) The quote "I pick the first one, click Run query, the logs are scanned and the results are visible within seconds" doesn't sound terribly promising performance-wise. "Seconds" is an eternity in the log management world.
What I value from DataDog are the vast number of integrations available. Want to monitor and alert on prometheus endpoints? No problem, just set up a little config. Want to monitor your EC2 nodes? No problem just link up your AWS account. Want to monitor k8s, etcd, nginx? Cool, no problem, there's a thing for that too.
If Logs Insights can match the simplicity of integration that DD and the like services provide, then those services had best watch out. On the other hand, DD's dashboards are pretty slick and I can't imagine AWS's utilitarian UI/X ever competing. I wonder if that is a big enough differentiator. DD can get really expensive as well, but I'd love to see some comparisons on price.
What’s wrong with seconds when implementing / running a new query?
Obviously disk iops are the bottleneck with such a system, so encouraging folks to conserve them by charging for bytes scanned seem like a good measure.
Won’t an enterprise implementing this themselves or using third party tools have to scale out their index nodes based on the amount data queries are scanning?
It'll be interesting to see what the pricing numbers actually mean.
Say my app generates 1GB of logs per day. If I create a dashboard showing hits/errors for the last 24 hours and have the dashboard refreshing every 5 minutes it might cost me $1.44 per day (0.005 * 12 * 24).
By comparison list price for Sumo Logic is $90/month for 1GB/day of data (stored for 30 days) with queries being "free" (although there are limits to the number you can do).
Definitely will be interesting. I'd say this -- for sure, $1.44 per day is better than $90/month, but is it that much better? (Also, $90 is on the high end -- Scalyr charges more like $50/month for 1GB/day disclosure: I'm a co-founder, though no longer there).
For $50-$100/month, you get basically unlimited querying, unlimited dashboards, etc., vs. $43.20 from Amazon for a single dashboard that refreshes every 5 minutes in your example.
Looking at the actual numbers now, there's no way this is targeting the same types of users at this pricing level. It's gotta be aimed more at folks looking at meta-logs for AWS services, the volume of which is going to be much less than actual application logs. Otherwise I can't see this being competitive.
With my logging setup right now (ELK + Fluentd) org-wide we have ~120 dashboards, and the overall setup costs us about $750 per month in resources. This would be a 700% increase for us. A 2x-3x increase would be worth it I think, but not that much.
Could also be to disuade people from (ab)using it instead of proper metrics. Pretty sure SumoLogic lets you run queries on a cron and use the results for alarming (IMO not a great idea, seen it go wrong). And if there's an API, people will automate it and use it for god knows what. So charging for it seems like a smart enough move, iff the price is sane/competitive.
It’s version 1.0 but this starts to make Splunk and similar utilities look less and less important or differentiating in the future of cloud.
AWS understands most people using tools like Splunk probably only need a few simple features so AWS just goes and builds that and gives a lot of people excuses to dump expensive licenses for the AWS version of it. It’s a sneaky but highly successful business model.
We’ve had a poor experience with Cloudwatch for logs. The UX is poor and queries over large data sets take forever. So much so that I’m sure we’re ‘using it wrong’. What have others’ experiences been like?
Well for what it’s worth this service solves the “queries take forever” part. But Cloudwatch logs is just one potential piece of the puzzle. It’s easy to hook into for AWS services and easy to pipe into Lambdas, S3, Kinesis, etc etc. If none of that integration appeals to you then there are probably better products, but for certain workflows it’s far superior to other solutions despite the rough edges. Same story as most of AWS’s services.
This looks great, and the query syntax is fairly easy to pick up (just from the "tips" in the UI - I haven't been able to find the documentation yet). The idea of connecting the parts of the query with pipe characters reminds me of https://stedolan.github.io/jq/
The feature where you can add the queries to a Cloudwatch dashboard seems to be a bit broken at the moment. First, the version of the query that I copied to a dashboard didn't seem to respect the time limit I'd set so instead of looking at the last hour, it was looking at all time - could get accidentally very expensive! Also, I couldn't see a way to show the visualisation (ie the stats graph) on the dashboard - just the raw table of query results, which is not ideal. Hopefully I've missed something, or those niggles will be sorted out soon.
I'm with the team that built this. It's actually a new purpose-built engine, optimized for near real-time availability of large scale volumes of log data. It supports an unbounded number of fields across log records, and doesn't require the definition of a schema or dealing with data types at setup/ingestion time. Compute and storage are separate, which is what allows queries to address data from any time window. There is no limit on the data retention period.
Have they got an ML tool yet to allow me to configure alerts for things that aren't "normal"? Logs are all well and good but I generally just care about things that are abnormal for my stack.
I just don't get how AWS developers are able to accept the eyesore that is their UX. It's easy to miss things in any AWS service's UI. I keep discovering functionality even after years of use. Google Cloud UX is light years ahead. Also, Google's Stackdriver seems great, haven't used it yet though. Would be great if any Stackdriver users here share how it is better than CloudWatch or other log management tools.