Sparse matrix representations in SciPy (2017)

goerz · on Aug 3, 2018

A good sparse matrix implementation is key to any serious numerics. SciPy's implementation is very nice and quite complete, and a foundation for the entire scientific Python ecosystem. Unfortunately, though, many of the low-level sparse operations are not implemented very efficiently, which has lead to some projects having to do their own implementation, see e.g. this discussion for the QuTiP project: https://github.com/qutip/qutip/issues/850#issuecomment-38400...

It would be really nice if all of the sparse linear algebra in SciPy could be heavily optimized (to a similar extent as e.g. Intel optimizes their sparse operations in MKL), so that the entire ecosystem could benefit from that. This is probably something that would require some corporate support, but given how many data science and finance companies use Python for their workflow these days, it might be a wise investment.

timr · on Aug 3, 2018

This is true across the board. NumPy, Tensorflow, Pytorch...no library has anywhere near 100% coverage of core functionality in sparse tensors. It's just sort of generally assumed that if you're using sparse tensors, you're on your own.

Tensorflow is better than most, in that it at least implements most ops for sparse tensors, even if it doesn't have efficient implementations for things. I was recently frustrated to find that PyTorch doesn't implement a lot of basic functionality for sparse tensors at all -- it just throws an exception.

yvdriess · on Aug 3, 2018

TensorFlow is indeed the one-eyed king here. We found that running with large SparseTensors, CPU training jobs is actually faster than on GPU nodes. An individual job is faster on GPU, but SparseMatMul just blows up GPU memory use to the point that only one train job fits in its memory. SparseMatMul is inefficient on CPU as well, but you can keep getting some decent speedup by cranking up the parallel trade jobs per node.

I did have great success using Netflix Vectorflow. I roughly gained 2x performance for my use case. It does leave some other performance on the table, no vectorized minibatches etc. If you are training on huge sparse datasets with a shallow ffw network, Vectorflow is a good alternative.

yvdriess · on Aug 3, 2018

I wholeheartedly agree. Partly because of that, it is too easy in SciPy to accidentally trip a conversion to a dense representation. As users typically only grab towards sparse matrices because their data otherwise wouldn't fit in memory, this means a minefield of operations that can blow up in their face.

I do want to say a word about how hard it is to offer a set of optimized sparse representations. It is by many measures many times harder than offering an optimized dense linear algebra library. The crux is that the library or routine can not know what the actual data movement and dependencies are until the data arrives. Dense routines do not care about what the data actually is beyond its dimensions: the access pattern, compute and control flow stay exactly the same. The result is that the set of optimized sparse linear algebra routines has, on top of the usual optimization difficulties, this entire extra cartesian product of different representations and low-level operation implementation strategies. Intel MKL has a good SparseBLAS, but compare the performance of feeding it a clustered or block-sparse dataset (e.g. something out of a solver) to a large graph500 RMAT matrix.

In short, do not underestimate the great effort to offer a generally-applicable yet optimized set of sparse linear algebra routines. It makes dense algebra look really easy.

kdelok · on Aug 3, 2018

I feel like they (and everyone else) should just use SuiteSparse: http://faculty.cse.tamu.edu/davis/suitesparse.html

In the same way that people almost universally use BLAS and LAPACK or the MKL for linear algebra.

goerz · on Aug 3, 2018

While there is a "suitesparse-5.2.0" package in conda, I haven't found any documentation. Do you have a link? There doesn't seem to be anything on the project website. I wasn't even quite sure this contains a Python package.

kdelok · on Aug 3, 2018

I suspect it's not. The code is largely written in C and C++ and what I can see of the package on Github makes it look like it's the version of SuiteSparse with CMake as the build system.

I realise my comment may have come off a bit flippant - it wouldn't be a trivial piece of work to write Python bindings for SuiteSparse, but would conceivably be simpler than trying to write a better sparse matrix algebra implementation.

tanderson92 · on Aug 3, 2018

> A good sparse matrix implementation is key to any serious numerics.

I agree with much of your comment, but there is a great deal of serious numerics which doesn't use sparse linear algebra at all. In fact there are entire subfields of numerical analysis which involve fast algorithms for dense linear algebra.

srean · on Aug 3, 2018

Indeed and those are the lucky ones. For the rest of us handling sparsity often becomes a 'make or break' deal

joe_the_user · on Aug 3, 2018

There seems to be no end of sparse formats, potential sparse formats and algorithms for dealing with them.

But it also seems like for a given sparse matrix in a given situation, there's no guaranteed that there's algorithm for handling it. The whole thing requires deep experience or black magic.

perimo · on Aug 3, 2018

To get the best performance you typically want to know something about the sparsity structure of your problem. Do most nonzeros fall near the diagonal? Or do they clump into dense submatrices?

If you don't have that information, you can look at how your matrix will be accessed. Do you need fast access to random rows, or columns? Do you need to write new values? Are those values already in the sparsity structure?

All of these questions can lead you to choosing the right sparse format, but it does take some experience to know where to look.

timr · on Aug 3, 2018

That's true, but mostly you just want to be able to store massive sparse matrices without filling your RAM with zeros (or spending tons of time reading/writing zeros from disk). Any implementation will do, as long as it works.

srean · on Aug 3, 2018

Storage is a minor worry. The main thing is whether the storage makes the frequent operations efficient. The tuple (index_position, value) is reasonably storage efficient but abysmal for linear algebraic operations. So one cannot decouple storage efficiency concerns from compute concerns too much.

I think this is one of those shibboleths that separate folks who are experts in ML/data-science at scale vs those who are coming more from a enterprise/DBA background. I often have to convince folks why its not a great idea to write sparse matrix multiplication in SQL.

timr · on Aug 3, 2018

"Storage is a minor worry."

For many domains, it's a big worry. I routinely encounter large, sparse datasets where I/O and storage dominate other concerns. Big sparse tensors are worse than the "slowness" associated with a suboptimal sparse vector math implementation.

"I think this is one of those shibboleths that separate folks who are experts in ML/data-science at scale vs those who are coming more from a enterprise/DBA background."

I've been doing ML work for a long time. This is actually a shibboleth that separates folks who mostly do image classification from other kinds of ML people.

srean · on Aug 4, 2018

"Minor" in the sense relatively easy to solve if not constrained by the compute needs.

Interesting comment regarding image classification. I would have assumed they are the ones who don't have to worry about sparsity much. Physics, and image and video are some of the places you encounter large and dense matrices. Lapack's wet dream :)