Very much so! The two tasks are surprisingly interchangable. I once worked on a ...

Very much so!

The two tasks are surprisingly interchangable. I once worked on a project where we used a statistical MT approach to "translate" between image features and captions--and I don't think we were the only ones trying such things.

In a pleasing bit of symmetry, the attentional network used here looks like it was initially developed for image captioning.