Show HN: 3d Word Vectors with Neural Networks, T-SNE, and WebGL

dhammack · on Nov 14, 2013

Hey Dave - Your visualization is cool, no doubt, but I'm not seeing as much structure in the 3D cloud as I see in 2D visualizations. In 2D, the words form clusters of similarity which are pretty obvious, but I don't really see that here. What training data did you use for word2vec, and for how many iterations did you run t-sne?

I ran some visualizations of my own a while back, and they're a lot less aesthetically pleasing, but you can see more structure and clustering.

https://raw.github.com/dhammack/Word2VecExample/master/visua... https://raw.github.com/dhammack/Word2VecExample/master/visua...

(It could also just be a WebGL thing, because the third dimension seems to have a lot less variance than the other two)

dave_sullivan · on Nov 14, 2013

It's true, I've noticed this too.

I trained word2vec on wikipedia and trained a bunch of different models w/ TSNE for between 800 and 1500 iterations. In 2d, the results actually appear fine (similar to your graphics + others that have been demonstrated). Unfortunately, at least for me, this doesn't seem to be translating to 3d.

It's either a mix of the quality of the vectors (maybe I need to try some other word vectors) and/or T-SNE needs to be optimized a bit better (and/or there's a bug). It's not quite where I want it to be yet, so consider this a "cool tech demo we're still working on" kind of thing.

dhammack · on Nov 14, 2013

Did you roll your own version of t-sne to use in Erstaz? Now that I think about it, I've never seen a 3D visualization with t-sne before (and searching just now didn't find any), it could be that t-sne doesn't work as well in 3D (or has bugs which only appear in >2 dims).

dave_sullivan · on Nov 14, 2013

We're rolling our own for release, but right now we're borrowing liberally from the Barnes-Hut-SNE implementation here: http://homepage.tudelft.nl/19j49/t-SNE.html Barnes-Hut is a modification that makes T-SNE a lot more efficient, so I could train on hundreds of thousands of vectors instead of <10,000.

However, I'm wondering if there's something I'm missing that makes it not suitable for 3d--IE, maybe the assumptions being made to speed things up break down after the 2nd dimension. Also, there's interesting discussion in the literature about whether or not T-SNE is a good dimensionality reduction technique in general (as opposed to only a very powerful visualization technique), so my next step is probably going to be running the vectors through an autoencoder to generate 3d coordinates and then plotting those and comparing the visualizations.

Re: another example of TSNE w/ text--yeah, I've only seen this http://homepage.tudelft.nl/19j49/t-SNE_files/semantic_tsne.j... which seems to work but isn't interactive. Frankly, I'm surprised we got it to work with three.js--we're able to render as many as 250,000 unique words and it runs smooth (it just takes longer to download--this demo has 25,000).

a1k0n · on Nov 14, 2013

Barnes-Hut uses a quadtree, doesn't it? I don't know whether the code was adapted to use an octree in 3D instead; maybe it was? FWIW there's a really interesting Google techtalk on how it works: http://www.youtube.com/watch?v=RJVL80Gg3lA

sp332 · on Nov 14, 2013

How did you extract text data from Wikipedia? I've been trying to clean up an XML dump but I'm having trouble getting rid of the wikitext markup.

dave_sullivan · on Nov 14, 2013

Check out this page: http://mattmahoney.net/dc/textdata.html

sp332 · on Nov 14, 2013

Thanks, but I've tried that and it gives really messy results. For example, every accented character turns into a space, dividing the word into two meaningless pieces.

Edit: it also seems to put the entire article on a single line with no punctuation. But Google's word2vec examples run with each sentence on one line. Wouldn't this make a difference in training?

a1k0n · on Nov 14, 2013

Have you tried messing with the target perplexity? That seems to make a big difference.

dave_sullivan · on Nov 14, 2013

Yeah, I've tried from .5 to 50--3 seems to be the best value currently. I may try it with the non Barnes-Hut implementation too, maybe on a smaller number of words. If that works, then I'll bet it is something related to how the quadtree is constructed...

Btw, nice LDA visualization!

a1k0n · on Nov 14, 2013

Ha, I made something very similar with the goal of mesmerizing passersby at a conference where we had a booth: http://www.a1k0n.net/spotify/artist-viz/

Used t-SNE on a vector clustering which came from LDA.

dhammack · on Nov 14, 2013

Nice! I assume LDA was run on the lyrics?

a1k0n · on Nov 14, 2013

Not at all... "documents" were the user's play history and "words" were artists. So if you played one artist twice and another artist once it'd be like a document that says "artist1 artist1 artist2". The assumption is that document topics are analogous to music genres, and each artist creates music within a small set of genres each user prefers music within a small set of genres.

dhammack · on Nov 14, 2013

Ah, that explains why the groupings are so good. That's cool, I hadn't thought about topic models outside of NLP before.

cubaismymusic · on Nov 14, 2013

Green words on black screen look very intriquing and matrix-like, but I cannot wrap my head around what this clustering actually represents.

dave_sullivan · on Nov 14, 2013

Yeah, some explanation might be in order...

These are a bunch of "word vectors" (generated w/ word2vec https://code.google.com/p/word2vec/) put through a dimensionality reduction/visualization technique called T-SNE and then plotted in 3d. As far as what the clustering represents, check out T-SNE (http://homepage.tudelft.nl/19j49/t-SNE.html), but the short answer is--it's hard to say...

Here's a longer explanation from a webinar I did yesterday where I demoed this: http://www.youtube.com/watch?v=wmlj5uTUTFY (skip to 12:11 to get to the applicable section)

paulgb · on Nov 14, 2013

How are the word vectors generated? word2vec?

dave_sullivan · on Nov 14, 2013

yes sir, word2vec (edited parent)

alexbw · on Nov 14, 2013

Poked around the word cloud, but the output of the clustering doesn't seem to many any sense to me.

"boat" is by "scarecrow", "adopted", and "feelings"

"window" is by "insulted", "prize" and "arson"

Rotating around the word-in-question didn't pop up any more similar words.

t-SNE is known for producing immediately interpretable clusterings, but this seems a bit obscure.