Hacker News new | past | comments | ask | show | jobs | submit login

There is actually a fix to this problem in classical statistics (as far back as Pearson in the early 20th century) if the object is to perform PCA on data matrices: Don't use the covariance matrix for PCA. The issue of units is only a problem if you're using a matrix that has units itself. This problem is readily solved by using the correlation matrix instead, which is dimensionless by definition. The downside to this circumvention is that you have essentially re-weighted each of your variables, so the weight contributed by each variable is more similar. This may not be what you want.

If PCA is not being used to reduce the dimensionality of multivariate data, this might be invalid. There are other uses of PCA besides working on data (that image reduction technique using SVD comes to mind) that he might be addressing.

If you want a treatment of PCA in a respected text (at least by the statistical community--not sure what the ML people think ;) ), look no further than Hastie, Tibshirani and Friedman's Elements of Stastical Learning.

http://www.stanford.edu/~hastie/local.ftp/Springer/OLD//ESLI...




> The downside to this circumvention is that you have essentially re-weighted each of your variables, so the weight contributed by each variable is more similar.

Is this similar to "whitening"?


It is somewhat similar. The procedure I describe is normalizing (using z-scores instead of raw data) the data. The difference, as far as I can tell, is that normalizing retains the correlation structure of the data, while whitening uncorrelates the observations. R is the lingua franca of the academic statisticians, so you might not derive a huge amount of value from this, but this question was asked a couple years ago on Stats.SE: http://stats.stackexchange.com/questions/53/pca-on-correlati...


Isn't this ICA, independent component analysis ? This makes much more sense when trying to extract information from the combination of multiple variables independently of their unit type and scale.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: