why do you assume this is necessarily true? consider how word2vec results in ana...

inimino · on Sept 3, 2019

I'm not assuming! I know it.

Word2vec results in similarity scores you can use to find analogies. Great! Now how do you know which analogies to look for? Which ones to remember? Which ones to pursue? You need some kind of intuition for where to look. So you need (1) a way to move around the search space and (2) a way to know when to backtrack, or when a line of inquiry is unfruitful, and this last is precisely the capacity for boredom. If you can't come up with new ideas and you can't get bored, you'll suck at finding proofs just like you'll suck at math or problem solving in general. There's also a danger in getting bored by the wrong things. So how do you develop this boredom intuition?

We choose what to think about, but until we make conscious machines, we are probably the only creatures that have to make this choice.

DoctorOetker · on Sept 3, 2019

you don't use the similarity score to extract analogies from word2vec, it's offset vectors, linear relations the infamous "king"-"man"+"woman"=~"queen" (with reasonably low perplexity)

extracting the analogies is not that hard (a naive brute force is looping over combinations of 4 words and testing how close the relationship holds), but more importantly, one doesn't need to extract the analogies, the neural networks utilize these analogies implicit in their embedding.

I have never seen the boredom of neural networks investigated, the closest concept that comes to my mind is surprisal, which is widely understood since Shannon & information theory...

inimino · on Sept 3, 2019

Right, so the point is, you have your vectors and now you have a way to find 4-tuples representing analogous sets of words. Great! You built an analogizing machine! Now what do you do with the analogies?

Neural networks can't get bored because they don't decide what to think about. They are like a total functional programming language with no control flow constructs. This is why they always give an answer in the same amount of time, regardless of the input vector.

Surprisal is only a measure of information (given a distribution) and is only distantly related to what I'm talking about. However, if a neural network could choose to think more, choose to get new data and reconsider, choose to go back and look at that one from a couple minutes ago... then you'd also want it to have the ability to get bored. AKA to recognize when it is spending resources on an unprofitable line of inquiry.