Mute uninteresting log noise with machine learning

WiseWeasel · on March 16, 2018

Today's uninteresting log noise is tomorrow's critical data.

I've been loving Kibana for filtering and reporting on log data in flexible and insightful ways, including automatically generated charts for certain data sources.

sebcat · on March 17, 2018

Yes. Using loglevels/priorities, facilities and identities in a sane way, your logs would already be classified.

Let's say there's a service failure and I want to know what the service has done prior to the failure. I wouldn't want a classifier to filter the logs in that case, so that use case is out of the picture. What other use cases than filtering are there for this? Maybe as a way to provide feedback to developers to fix the log messages, as in "this thing that we log all the time can be determined to never affect the process of trouble-shooting our services, and the classifier thinks it's noise, so we'll remove it".

peterevans · on March 17, 2018

It would be neat if MachineBox could sense whether log noise would be useful in other contexts--e.g., as a metric that can be graphed. Or whether your logging is lacking something that might be useful, or just lacking signal at all (hey, user, your logs are just noise!).

bpchaps · on March 16, 2018

One of the ways that I do this (assuming you have access to unix utilities) is to do:

  cat output.log | tr -d '[0-9]' | sort | uniq -c | sort -n

This is a fairly useful way of removing relatively useless information such as timestamps and line numbers when you're looking for rare or unique events. The alternative, I think, is to do a bunch of awk or sed magic, which isn't really fun for anybody. It's especially useful in a time crunch when there's an ongoing outage.

_ZeD_ · on March 17, 2018

onestly I found really fun to do "a bunch of awk or sed magic"

bpchaps · on March 17, 2018

Except for us weirdos :p

apotheosis · on March 17, 2018

It would be nice if you told us what this command does for you.

bpchaps · on March 17, 2018

It removes numbers from a log file, sorts it, groups and counts unique lines, then sorts numerically by the count of each unique line.

But don't take my word for it. Try it yourself!

lopmotr · on March 16, 2018

Is it possible to make a ML algorithm which has only "noise" data for training and then identifies abnormalities? It seems like that's people do that easily and it would be ideal for an application like this where you might not have much training data on all the "not noise" types of examples.

Another application would be a security camera that detects unusual events without having to train it on actual burglars.

slashcom · on March 17, 2018

This is a subfield called anomaly detection.

kthielen · on March 17, 2018

Maybe an easier way to go is to record it structured up front (it’s already structured in the original application source anyway). This makes it much easier to record efficiently (so you can record more data) and also much easier to query efficiently, where eg you might invest time in machine learning on logical data instead of having to mess around with text.

That’s what we do here anyway, it’s worked well for us:

https://github.com/Morgan-Stanley/hobbes/blob/master/README....

foo101 · on March 17, 2018

Would not false negative (a critical log being muted) be a major concern while using machine learning in this domain?

What if I never see a critical log because the trained model decided that it is unimportant? How is such a situation generally solved in the industry?

sannee · on March 17, 2018

I have limited experience, but I think that usually you would take this into account when building your loss function and heavily penalize false negatives during training.

arbie · on March 18, 2018

It would be nice if logfile analysis tools (including ELK) supported logs that were multiple lines per message. Does anyone know of such tools?

vinchuco · on March 17, 2018

I really wanted this to be about real time sound editing and not about log data.

matsucks · on March 16, 2018

Awesome