This post is a great word2vec/keras intro, but it doesn't do one thing you should _always_ do before you break out the neural networks: try to solve the problem with traditional machine learning to establish a performance baseline and confirm the data is predictive.
78.9% accuracy on a sentiment classification of tweets with no neutral class is actually slightly _worse_ than you get if you do this in scikit-learn with plain old bag of words and Logistic Regression: https://github.com/williamsmj/sentiment/blob/master/sentimen....
I think you are missing the parent's point. A CNN is a more complicated model, not a simpler one- it is better to try simple linear classifiers on bags of word or bags of bigrams or trigrams before breaking out the more complicated neural models. Note that you can do this easily with VW or FastText.
the aim of my post is not to make a prediction based on a standard alogrithm.
my goal is to show that DL techniques make better predictions. What I wrote was a first draft of the algorithm, and I've not made my point yet, unfortunately. I'm working now on a better solution.
Again, you're missing the point. If your goal is to show that DL techniques make better predictions, you need to begin by answering the question "better than what?"
I have to point out though, that it's a bit dangerous to measure classifier accuracy as the percentage of correctly classified samples when you've no idea how the test data is skewed in favor of one class vs. the other (for binary classification, and can be generalized to multi-class problems too).
It's always much better to represent accuracy as the F1 score[1] or to just examine a confusion matrix of the predictions[2].
Haven't had the opportunity to measure the difference in quality, and I've mostly used word2vec until now (with vectors I've trained myself after lemmatization and PoS-tagging of a corpus), but the fact that GloVe provides you different trained models from twitter, Wikipedia and so on is pretty nice
Glove is great. Simpler and faster with a very small trade off for quality. Word2vec has an advantage in that you can produce document vectors with only a small change in the network infrastructure. Tf-idf weighted word vector averages will probably be the best you can do using glove.
Word2Vec is not deep learning. I think the neural network it uses has a single hidden layer. It's much simpler than deep-learning approaches and all the neural network is doing is dimensionality reduction in a way that can be done with SVD for similar results.
You're right. It's a shallow single network with the weight embeddings of words at the intermediate layer extracted as the vector-space representation of a word, depending on its context.
78.9% accuracy on a sentiment classification of tweets with no neutral class is actually slightly _worse_ than you get if you do this in scikit-learn with plain old bag of words and Logistic Regression: https://github.com/williamsmj/sentiment/blob/master/sentimen....