When you say "clean data", what exactly do you mean? I've often seen this claim that cleaning data takes a lot of time, but it seems like an ill-defined term.
For user behavior: bots, clickfraud/clickjacking, bored teenagers, competitors who are sussing out your product, people who got confused by your user interface, users who have Javascript disabled and so never trigger your clicktracking, users who are on really old browsers who don't have Javascript to begin with.
And then there's bugs in your data pipeline: browser (particularly IE) bugs, logging bugs, didn't understand your distributed databases's conflict resolution policy bugs, failed attempts at cleaning all the previous categories, incorrect assumptions about the "shape" of your data, self-DOS attacks (no joke - Google almost brought down itself by having an img with an empty src tag, which forces the browser to make a duplicate request on every page) which result in extra duplicate requests, incorrectly filtering requests so you count /favicon.ico as a pageview, etc.
My own experience has shown that dirty data impacts advanced AI just as much as it impacts far more basic ML techniques.
Even for the most advanced AI we work on, we spend just as much time worrying about clean data as we do anything else.