For user behavior: bots, clickfraud/clickjacking, bored teenagers, competitors who are sussing out your product, people who got confused by your user interface, users who have Javascript disabled and so never trigger your clicktracking, users who are on really old browsers who don't have Javascript to begin with.
And then there's bugs in your data pipeline: browser (particularly IE) bugs, logging bugs, didn't understand your distributed databases's conflict resolution policy bugs, failed attempts at cleaning all the previous categories, incorrect assumptions about the "shape" of your data, self-DOS attacks (no joke - Google almost brought down itself by having an img with an empty src tag, which forces the browser to make a duplicate request on every page) which result in extra duplicate requests, incorrectly filtering requests so you count /favicon.ico as a pageview, etc.
In general: duplicate data, missing fields, different formats for different parts of the data, inconsistent naming schemes
For text: character encodings, special symbols, escape characters, punctuation, extra or missing spaces and newlines, capitalization
For images: different sizes, rotations, crops, blurry images
For numbers: inconsistent decimal point/comma, outliers with obviously nonsense values or zeros, values in different units of measurement etc.