It's learning the meaning of words, and the relationships between them.
Word2vec is definitely an impressive algorithm. But at the end of the day, it's just a tool that cranks out a fine-grained clustering of words based on (a proxy measure for) contextual similarity (or rather: an embedding in a high-dimensional space, which implicitly allows the words to be more easily clustered). And yes, some additive relations between words, when the signal is strong enough.
But to say that it's "learning" the "meaning" of these words is really quite a stretch.
I wonder if anyone has ever run this system on Lewis Carrol's Jabberwocky[1] or even something like Anthony Burgess's A Clockwork Orange, both of which contain a large number of made up words/slang/re-use.
I remember that when I first read A Clockwork Orange, it took me a while but I finally started to understand the meanings of those words/phrases (though I may not have every encountered them before.) It did feel like my brain was re-wiring itself to a new language. It'd be interesting to see how some type of language AI would treat these works.
Word2vec may be crude, but it demonstrates that you can learn non-trivial relationships between words with even such a simple algorithm. What is the meaning of a word, if not the relationship it has to other words?
Gender was just an example. There are lots of semantic information learned by word2vec, and the vectors have shown to be useful in text classification and other uses. It can learn subtle stuff, like the relationship between countries, celebrities, etc. All that information is contained in a few hundred dimensions, which is tiny compared to the neurons in the brain.
I use word2vec a lot, and things like it, and I've always found it overstated to say that it "learns the relationships between things".
You say, as many people do, that the operation "king - man + woman = queen" indicates that it understands that the relation between "king" and "queen" is the relation between "man" and "woman". But there is something much simpler going on.
What you're asking it for, in particular, is to find the vector represented by "king - man + woman", and find the vector Q with the highest dot product with to this synthetic vector, out of a restricted vocabulary.
The dot product is distributive, so distribute it: you want to find the maximum value of (king * Q) - (man * Q) + (woman * Q).
So you want to find a vector that is like "king" and "woman", and not like "man", and is part of the extremely limited vocabulary that you use for the traditional word2vec analogy evaluation, but not one of the three words you used in the question. (All of these constraints are relevant.) Big surprise, the word that fits the bill is "queen".
(I am not the first to do this analysis, but I've heard it from enough people that I don't know who to credit.)
It's cool that you can use sums of similarities between words to get the right answer to some selected analogy problems. Really, it is. It's a great thing to show to people who wouldn't otherwise understand why we care so much about similarities between words.
But this is not the same thing as "solving analogies" or "understanding relationships". It's a trick where you make a system so good at recognizing similarities that it doesn't have to solve analogies or understand relationships to solve the very easy analogy questions we give it.
Well in my example the AI doesn't have to interact with the world at all. To pass the Turing test simply requires imitating a human, predicting what words they would say. You only need to know the relationships between words.
If literally the only thing you know is the relationship between words, but you have a perfect knowledge of the relationship between words, you'll quickly determine that "Day" and "Night" are both acceptable answers, and have no means of determining which is the right one. At the very minimum, you need a clock, and an understanding of the temporal nature of your training set, to get the right one.
A beautiful rainbow glimmering gently in the sky after a summer shower.
What do you see?
What do you smell?
What do you hear?
What does the landscape look like?
What memory does this bring up?
These are messages that the language is communicating. If an AI can't understand at least some of the content of the message then can it compose one effectively? I'm not certain it can understand the meaning from words alone, but we can certainly try.
Only knowing the relationships between words alone would just be a poor proxy for knowing the meanings of the words, e.g. what real world concepts the words attempt to represent. You might be able to get pretty far with this technique, but I would bet a lot of money you would not be able to get reliable, in-depth human level communication. The system needs to have an understanding of the world.
And then there is the fundamentally dynamic aspect of language, which strengthens the need for a rich understanding of the world that words describe and convey.
There are other tests for AI besides the Turing Test, some of which require more understanding on the part of the program. Check out Winograd Schemas: http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/... which hinge on understanding the subtleties of how words refer to the world.
But it and methods like it are still very limited in what they can learn. For example, they can't learn relations involving antonyms. They can't tell apart hot from cold or big from small.
Word2vec is definitely an impressive algorithm. But at the end of the day, it's just a tool that cranks out a fine-grained clustering of words based on (a proxy measure for) contextual similarity (or rather: an embedding in a high-dimensional space, which implicitly allows the words to be more easily clustered). And yes, some additive relations between words, when the signal is strong enough.
But to say that it's "learning" the "meaning" of these words is really quite a stretch.