Singular Value Decomposition for Programmers

tw1010 · on Oct 10, 2017

One of my favourite ways to understand the SVD is like this.

Think about the x-y plane, and imagine that we've drawn out the unit circle centered at the origin. Now imagine we have some matrix A that we apply to each point in the plane. Thus points will go here and there and you could visualize this effect, if you want, as producing a sort of vector field drawn above the plane where the start of each arrow is the original position of a point and the end of each arrow is where the matrix A moved it. A cute thing about matrices is that it can only do one of three things to the unit circle. It can either collapse the unit circle into a point (which happends only if A is the zero matrix), or it can transform it into a line, or it can transform it into an ellipse. Other functions could possibly transform the circle into some other shape, but matrices are constrained enough that they can only do one of these three things to it.

Now for the SVD. Here's the thing. Basically the SVD gives you information about the ellipse at the output of the matrix. You apply numpy.linalg.svd to your matrix and how can you understand what it gives you? Essentially it gives you information about the semi-axes of that ellipse.

The first column of the "U" matrix is a vector that start at the origin and extend out towards the first (biggest) semi-axis. The second column extend out to the second largest semi-axis, and so on. But they are all normalized so they don't actually touch the semi-axes, they only point towards them. The values in the diagonal of the middle matrix (often denoted Σ) in the SVD (called "singular values") contains the length of the semi-axes. If all of them are 1 then the matrix did not change the size of the circle one bit.

From this intuitive description follows a lot of the properties mentioned in the article. For instance, want to know if the matrix is invertible? Just check out the singular values. If one of them is zero (or near zero since the algorithm for computing the SVD often only gives you an approximate result) then that means the ellipse at the output is collapsed down one dimension, which means it is definitely not invertible since there are a bunch of points which are mapped to the zero vector, and there is no way to figure out where those points should map back to if you wanted to create an inverse function.

By the way, if you want to learn more, go check out Boyds course on linear dynamical systems[1]. That's essentially where I stole this explanation from.

[1] https://www.youtube.com/watch?v=ZNQ_pBCtYDg

twic · on Oct 10, 2017

I really liked the animated diagrams of principal component analysis [1] in this Stack Exchange answer:

https://stats.stackexchange.com/questions/2691/making-sense-...

There, it's not about ellipses, it's about springs.

[1] PCA, not SVD, but PCA and SVD are, in a useful but not total sense, the same thing [2]

[2] https://math.stackexchange.com/questions/3869/what-is-the-in...

vlahmot · on Oct 10, 2017

https://m.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVF...

The linear algebra series from 3blue1brown has some great explanations and visualizations for thinking about linear transforms.

cglouch · on Oct 10, 2017

Jeremy Kun also has a couple good articles about SVD:

https://jeremykun.com/2016/04/18/singular-value-decompositio...

https://jeremykun.com/2016/05/16/singular-value-decompositio...

The first one is particularly helpful for understanding SVD conceptually.

Pseudomanifold · on Oct 10, 2017

This finally nudged me into uploading a note on the relationship between PCA and SVD, which is not covered in Jeremy's otherwise excellent article. If anyone is interested in this, you can find it under http://bastian.rieck.ru/research/Note_PCA_SVD.pdf.

dragandj · on Oct 10, 2017

Useful links to the library used in the article:

source code: http://github.com/uncomplicate/neanderthal

and the documentation: http://neanderthal.uncomplicate.org

amelius · on Oct 10, 2017

Note that computing the SVD of an m*n matrix is O(mn^2) with m<=n.

So never try to do this for large square matrices. (Just like you should never compute the inverse of a large matrix).

Even for large non-square matrices, computing the SVD can be prohibitively expensive.

ajtulloch · on Oct 10, 2017

Not with randomized methods (which you should almost always use IMO).