I tried it out on one of the Udacity Deep learning assignments using the Wasserstein loss functions built into tensorflow. I was unsuccessful in my limited use. The discriminator always ‘won out’ rather than the combo finding a saddle point. I eventually got my project to work without it, and did not go back to compare against just swapping EM back in.
It needs the Hungarian algorithm to solve it and it's not the easiest algorithm to implement. In fact, it's by far the hardest algorithm I've implemented (I can't exactly remember why). I wrote it in Common Lisp and worked on the performance quite a bit. It's still an O(n^3) algorithm, though.
Sentence similarity were my explorations with WMD too, reached a setup in Keras with a siamese configuration, Wasserstein + KL loss (have a known vocabulary and feeding both word vector sequences as well as their LDA distributions as input). Post training cosine distance between encodings of such sequences look pretty decent - with one issue I've spotted though: WMD really seems to like about the same number of valid tokens in both sentences which is not how real world looks like - eager to see results of EM distance between image feature vectors, cheers.
I tried it out on one of the Udacity Deep learning assignments using the Wasserstein loss functions built into tensorflow. I was unsuccessful in my limited use. The discriminator always ‘won out’ rather than the combo finding a saddle point. I eventually got my project to work without it, and did not go back to compare against just swapping EM back in.