When you say "clean data", what exactly do you mean? I've often seen this claim that cleaning data takes a lot of time, but it seems like an ill-defined term.
For user behavior: bots, clickfraud/clickjacking, bored teenagers, competitors who are sussing out your product, people who got confused by your user interface, users who have Javascript disabled and so never trigger your clicktracking, users who are on really old browsers who don't have Javascript to begin with.
And then there's bugs in your data pipeline: browser (particularly IE) bugs, logging bugs, didn't understand your distributed databases's conflict resolution policy bugs, failed attempts at cleaning all the previous categories, incorrect assumptions about the "shape" of your data, self-DOS attacks (no joke - Google almost brought down itself by having an img with an empty src tag, which forces the browser to make a duplicate request on every page) which result in extra duplicate requests, incorrectly filtering requests so you count /favicon.ico as a pageview, etc.