Amazon CloudWatch Logs Insights

pcx · on Nov 28, 2018

CloudWatch has the worst UX of any log management tool I've used. We are using Logentries currently, and it is so easy and intuitive.

I just don't get how AWS developers are able to accept the eyesore that is their UX. It's easy to miss things in any AWS service's UI. I keep discovering functionality even after years of use. Google Cloud UX is light years ahead. Also, Google's Stackdriver seems great, haven't used it yet though. Would be great if any Stackdriver users here share how it is better than CloudWatch or other log management tools.

zwily · on Nov 28, 2018

Have you tried Insights?

cddotdotslash · on Nov 28, 2018

I just tried it. It's just as bad as AWS UI ever has been. It shows lots of fields that it extracted from the log messages. Can you click them to add them to the search box? Nope. Right click for more options? Nope. These are things Splunk has had for years.

throwawaylolx · on Nov 28, 2018

>It's just as bad as AWS UI ever has been.

Why is this though? Is Amazon not employing UX/designers? Is there a culture there that trivializes UX benefits?

jamietanna · on Nov 28, 2018

Teams are under control of their backlog in a way that let's them prioritise how they approach the UX/UI aspects. For instance, this is why you'll see 2-3 different UIs across AWS journeys, due to teams implementing at different paces and priorities.

markstos · on Nov 28, 2018

I use Logentries currently, and frankly using "grep" on my logs is often faster than extracting what I need from Logentries.

pcx · on Nov 28, 2018

Yeah, it's slow, I'll give you that. But for now we can live with that. We are a 4 member shop, it's a problem for another day :).

nlh · on Nov 28, 2018

Interesting -- they're going after the various log management companies (Scalyr, DataDog, Splunk, Sumo Logic, etc.).

Figured this was bound to come eventually since it's a very very big market and their basic CloudWatch product was lacking in many ways. It's not like Amazon to let an ecosystem eat their lunch.

Few things stand out:

(1) Per-query pricing seems...odd? Likely a good deal for small folks with a low volume of logs (i.e. just need to check actual AWS infrastructure logs vs. application logs), but if you have any actual volume this gets absurdly expensive ($0.005/GB scanned = $5 per query if you need to scan a terabyte. Large enterprises ingest multiple terabytes per day.)

(2) The quote "I pick the first one, click Run query, the logs are scanned and the results are visible within seconds" doesn't sound terribly promising performance-wise. "Seconds" is an eternity in the log management world.

Still, super interesting!

bmiller2 · on Nov 28, 2018

What I value from DataDog are the vast number of integrations available. Want to monitor and alert on prometheus endpoints? No problem, just set up a little config. Want to monitor your EC2 nodes? No problem just link up your AWS account. Want to monitor k8s, etcd, nginx? Cool, no problem, there's a thing for that too.

If Logs Insights can match the simplicity of integration that DD and the like services provide, then those services had best watch out. On the other hand, DD's dashboards are pretty slick and I can't imagine AWS's utilitarian UI/X ever competing. I wonder if that is a big enough differentiator. DD can get really expensive as well, but I'd love to see some comparisons on price.

ec109685 · on Nov 28, 2018

What’s wrong with seconds when implementing / running a new query?

Obviously disk iops are the bottleneck with such a system, so encouraging folks to conserve them by charging for bytes scanned seem like a good measure.

Won’t an enterprise implementing this themselves or using third party tools have to scale out their index nodes based on the amount data queries are scanning?

tyingq · on Nov 28, 2018

"so encouraging folks to conserve them by charging for bytes scanned seem like a good measure."

I wonder if it warns you about potentially expensive button clicks. A little cartesian join, and...ouch.

zwily · on Nov 28, 2018

It’s not powerful enough to do a Cartesian join (yet?). Most you can do is scan once over a big dataset.

tyingq · on Nov 28, 2018

Hmm. I wonder if an "index only" query charges for the scan of the index, or a full table scan.

zwily · on Nov 28, 2018

AFAICT, there is no indexing. It appears to just scan the logs as necessary. Maybe indexing will come later.

dberg · on Nov 28, 2018

Bigquery has a cool function like this where it will tell you the cost of the query before you run it.

slyall · on Nov 28, 2018

It'll be interesting to see what the pricing numbers actually mean.

Say my app generates 1GB of logs per day. If I create a dashboard showing hits/errors for the last 24 hours and have the dashboard refreshing every 5 minutes it might cost me $1.44 per day (0.005 * 12 * 24).

By comparison list price for Sumo Logic is $90/month for 1GB/day of data (stored for 30 days) with queries being "free" (although there are limits to the number you can do).

nlh · on Nov 28, 2018

Definitely will be interesting. I'd say this -- for sure, $1.44 per day is better than $90/month, but is it that much better? (Also, $90 is on the high end -- Scalyr charges more like $50/month for 1GB/day disclosure: I'm a co-founder, though no longer there).

For $50-$100/month, you get basically unlimited querying, unlimited dashboards, etc., vs. $43.20 from Amazon for a single dashboard that refreshes every 5 minutes in your example.

Looking at the actual numbers now, there's no way this is targeting the same types of users at this pricing level. It's gotta be aimed more at folks looking at meta-logs for AWS services, the volume of which is going to be much less than actual application logs. Otherwise I can't see this being competitive.

expathacker · on Nov 29, 2018

With my logging setup right now (ELK + Fluentd) org-wide we have ~120 dashboards, and the overall setup costs us about $750 per month in resources. This would be a 700% increase for us. A 2x-3x increase would be worth it I think, but not that much.

guitarbill · on Nov 28, 2018

> Per-query pricing seems...odd?

Could also be to disuade people from (ab)using it instead of proper metrics. Pretty sure SumoLogic lets you run queries on a cron and use the results for alarming (IMO not a great idea, seen it go wrong). And if there's an API, people will automate it and use it for god knows what. So charging for it seems like a smart enough move, iff the price is sane/competitive.

code4tee · on Nov 28, 2018

It’s version 1.0 but this starts to make Splunk and similar utilities look less and less important or differentiating in the future of cloud.

AWS understands most people using tools like Splunk probably only need a few simple features so AWS just goes and builds that and gives a lot of people excuses to dump expensive licenses for the AWS version of it. It’s a sneaky but highly successful business model.

allengeorge · on Nov 28, 2018

We’ve had a poor experience with Cloudwatch for logs. The UX is poor and queries over large data sets take forever. So much so that I’m sure we’re ‘using it wrong’. What have others’ experiences been like?

skywhopper · on Nov 28, 2018

Well for what it’s worth this service solves the “queries take forever” part. But Cloudwatch logs is just one potential piece of the puzzle. It’s easy to hook into for AWS services and easy to pipe into Lambdas, S3, Kinesis, etc etc. If none of that integration appeals to you then there are probably better products, but for certain workflows it’s far superior to other solutions despite the rough edges. Same story as most of AWS’s services.

j4mie · on Nov 28, 2018

This looks great, and the query syntax is fairly easy to pick up (just from the "tips" in the UI - I haven't been able to find the documentation yet). The idea of connecting the parts of the query with pipe characters reminds me of https://stedolan.github.io/jq/

The feature where you can add the queries to a Cloudwatch dashboard seems to be a bit broken at the moment. First, the version of the query that I copied to a dashboard didn't seem to respect the time limit I'd set so instead of looking at the last hour, it was looking at all time - could get accidentally very expensive! Also, I couldn't see a way to show the visualisation (ie the stats graph) on the dashboard - just the raw table of query results, which is not ideal. Hopefully I've missed something, or those niggles will be sorted out soon.

Overall, very impressive!

fomojola · on Nov 28, 2018

Any idea what the query engine/translation layer they are using is? I'm assuming there is something like Presto underneath it, but quite curious.

DVassallo · on Nov 28, 2018

I'm with the team that built this. It's actually a new purpose-built engine, optimized for near real-time availability of large scale volumes of log data. It supports an unbounded number of fields across log records, and doesn't require the definition of a schema or dealing with data types at setup/ingestion time. Compute and storage are separate, which is what allows queries to address data from any time window. There is no limit on the data retention period.

wikibob · on Nov 28, 2018

We would all be exceptionally interested in more technical details about this.

Particularly if you compare to other approaches, talk about technical tradeoffs made, etc.

dc_gregory · on Nov 28, 2018

The pricing model actually sounds a little like Athena.

throwaway829 · on Nov 28, 2018

Here's the user guide for Amazon CloudWatch Logs Insights: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Ana...

nprateem · on Nov 28, 2018

Have they got an ML tool yet to allow me to configure alerts for things that aren't "normal"? Logs are all well and good but I generally just care about things that are abnormal for my stack.

borlum · on Nov 28, 2018

Looks like Humio. I wonder if they can match speed and flexibility of Humio.

baseballMan · on Nov 28, 2018

Reminds me of log analytics in Azure!