A tutorial on Principal Component Analysis

presscast · on Oct 19, 2018

Relatedly, from a colleague for whom I have a lot of respect: http://gael-varoquaux.info/science/ica_vs_pca.html

mayankkaizen · on Oct 20, 2018

Your colleague is actually quite famous in ML space. You are lucky to have him as your colleague.

presscast · on Oct 20, 2018

Oh, I know!

(To be fair, I should have said former colleague, back when I worked at NeuroSpin in 2010-2012).

najarvg · on Oct 19, 2018

Incredibly insightful! Thanks for sharing, I've bookmarked this

radus · on Oct 20, 2018

Thanks, this was great!

rundigen12 · on Oct 19, 2018

"By defining \mathbb{E}\left[\mathbf{x}\right]=\muE[x]=μ, ...and using the linearity of the expectation operator \mathbb{E}E, we easily arive [sic] to the following conclusion..."

Yikes. You don't define that \mathbb{E} was an 'expectation operator', or what an expectation operator even does, or the fact that it's linear. The v's disappeared somehow from inside the square brackets -- maybe you meant \muE[v]=μ?

So far this "tutorial" isn't defining its terms very well. I'm lost and it's only the very beginning.

IngoBlechschmid · on Oct 19, 2018

"E[foo]" is syntax to mean the expected value of the random variable foo, roughly speaking the mean value. (For instance the expected value of a dice roll is 3.5. The terminology is slightly suboptimal, since we will never expect a dice to come up 3.5.)

Hence the "E" itself is called an "operator". It can be applied to a random value in order to yield its expected value. You can read up on it here: https://en.wikipedia.org/wiki/Expected_value

The definition "E[x] = mu" is correct, though I would write it the other way, as "mu = E[x]", as it's the variable mu which is being defined.

The v's disappear because of a suppresed calculation:

sigma^2 = E[ (v^T x - E[v^T x])^2 ] = E[ (v^T x - E[v^T x]) (v^T x - E[v^T x]) ] = E[ v^T x v^T x - 2 v^T x E[v^T x] + E[v^T x] E[v^T x] ] = E[ v^T x x^T v - 2 v^T x v^T E[x] + v^T E[x] v^T E[x] ] = E[ v^T x x^T v - 2 v^T x E[x]^T v + v^T E[x] E[x]^T v ] = E[ v^T (x x^T - 2 x E[x]^T + E[x] E[x]^T) v ] = v^T E[ x x^T - 2 x E[x]^T + E[x] E[x]^T ] v = v^T E[ (x - E[x]) (x - E[x])^T ] v = v^T E[ (x - mu) (x - mu)^T ] v = v^T Sigma v.

bunderbunder · on Oct 20, 2018

Also, the "E[foo]" notation is something you'd pick up in an introductory statistics course. Which, IMO, means it's perfectly appropriate to use it without further explanation in this sort of context.

It's not really reasonable to expect technical subjects like this to always be presented in a way that's easily digestible to people who lack any background in the subject area. This article is clearly aimed at people who are studying machine learning, and anyone who is studying machine learning should already have a good command of basic statistics in linear algebra.

johndough · on Oct 20, 2018

Slightly more formated: http://mathb.in/28658

em500 · on Oct 20, 2018

Rewind to the start of the paragraph: "Let \mathbf{x} be a random vector with N dimensions."

You're assumed to already know what a "random vector with N dimensions" is. It's very resonable then to also assume you know what expectations and covariances of random vectors are, and some of their basic properties, such as linearity of expectations and quadratic forms of covariance matrices, since all of these are typically taught in the same course.

doublerebel · on Oct 19, 2018

That's code for KaTeX/MathJax. It should be rendered, check your script blocker.

IngoBlechschmid · on Oct 19, 2018

An important application of principal component analysis is "low-rank approximation of matrices"; very roughly, this can be used for "dimension reduction": Substituting a system of hundreds of thousands of equations by a system with only hundreds.

This is neatly illustrated by applying the technique to compress images, even if all the usual compression techniques actually employ vastly different methods. An interactive demo is here: https://timbaumann.info/svd-image-compression-demo/

mitchtbaum · on Oct 19, 2018

Which are you your other favorite forms of analysis? Mine are:

https://en.wikipedia.org/wiki/Formal_concept_analysis

https://en.wikipedia.org/wiki/Latent_semantic_analysis

https://en.wikipedia.org/wiki/Root_cause_analysis

bunderbunder · on Oct 19, 2018

FWIW, latent semantic analysis is just a particular application of principal component analysis.

closed · on Oct 19, 2018

And principle component analysis is a special case of factor analysis, which definitely makes all this stuff fit together nicely!

https://www.microsoft.com/en-us/research/wp-content/uploads/...

kurthr · on Oct 20, 2018

This PPCA is really clever... allowing MLE for missing data/variance in a PCA in genius.