Huh I had wondered why I saw so many Python packages blacklist MKL now I know wh...

dbcurtis · on April 6, 2022

The philosophy behind MKL is that each CPU vendor provides an MKL for their CPU. If you expect to mix and match MKLs and CPUs, you don’t understand the goals of MKL.

ReleaseCandidat · on April 6, 2022

That would be 'each CPU vendor provides an optimized BLAS library for their CPU'. The problem is that Intel's MKL is more than just BLAS.

But AMD does have its own optimized libraries:

https://developer.amd.com/amd-aocl/

wyldfire · on April 6, 2022

Each CPU vendor or each CPU architecture? (genuinely asking, I don't know how it's intended)

bee_rider · on April 6, 2022

The expectation in the HPC community is that an interested vendor will provide their own BLAS/LAPACK implementation (MKL is a BLAS/LAPACK implementation, along with a bunch of other stuff), which is well-tuned for their hardware. These sort of libraries aren't just tuned for an architecture, they might be tuned for a given generation or even particular SKUs.

hallway_monitor · on April 6, 2022

I learned about this recently when trying to optimize ML test architecture running on Azure. It turns out having access to Ice Lake chips would allow optimizations that should decrease compute time and therefore cost by 20-30%.

bee_rider · on April 6, 2022

Some AVX-512 stuff I guess?

AVX-512 had a rough rollout, but it seems like it is finally turning into something nice.

wmf · on April 6, 2022

Each vendor. Intel BLAS (MKL) has Intel-specific optimizations and AMD BLAS has AMD-specific optimizations.

Intel is still acting in bad faith by allowing MKL to run in crippled mode on AMD. They should either let it use all available instructions or make it refuse to run.

microtonal · on April 6, 2022

The latest oneMKL versions have sgemm/dgemm kernels for Zen CPUs that are almost as fast as the AVX2 kernels (that require disabling Intel CPU detection on Zen).

monocasa · on April 6, 2022

Are there any implementations of MKL other than Intel's?

ReleaseCandidat · on April 6, 2022

No. There are AMD's AOCL and Apple's 'Accelerate', but of subsets of the MKL only AFAIK.

https://developer.amd.com/amd-aocl/ https://developer.apple.com/documentation/accelerate

stephencanon · on April 6, 2022

Accelerate and MKL have some overlap (notably BLAS, LAPACK, signal processing libraries and basic vectorized math operations), but each also contains a whole bunch of API that the other lacks. Neither is a subset of the other.

They both contain a sparse matrix library, but exactly what operations are offered is somewhat different between the two. They both have image processing operations, but fairly different ones. Accelerate has BNNS, MKL has its own set of deep learning interfaces...

kxyvr · on April 7, 2022

In case you or anyone else knows, are there other libraries that implement a high performance sparse QR? Really, I need a Q-less QR factorization for sparse matrices. As far as I know, there are only two: one comes from MKL:

https://www.intel.com/content/www/us/en/developer/articles/t...

The other comes from SPQR, which is part of SuiteSparse:

https://people.engr.tamu.edu/davis/suitesparse.html

Part of the issue is that SPQR is dual licensed GPL/Commercial and the last time I checked a license was not cheap. Conversely, MKL has no redistribution fee, so it's been essentially the only option for this factorization if the code can't be bundled in a way compatible with the GPL.

stephencanon · on April 7, 2022

Replying to [dead] sibling post from kxyvr: yes, Accelerate provides a Q-less sparse QR on Apple platforms (https://developer.apple.com/documentation/accelerate/sparse_..., in particular SparseFactorizationCholeskyAtA). I believe that MA49 from HSL does it as well, and may have more acceptable licensing than SuiteSparse depending on your situation.