Agree with you that this is by no means a complete solution and there is a long way to go to make it actually usable.
I think one big problem with image captioning could be lack of high quality training data. While in this case we can generate lots of good training data. Whether we will be able to generate enough good data and have enough compute power to train on them is something that we need to find out.
Playing go was considered a problem too complex to solve couple years ago, but it's now a solved problem. So I am hoping we can get a breakthrough on this sooner than we think.
Not wanting to dampen your high hopes, yet, the rules of Go seem a lot simpler than the rules and grammar of the current hypertext markup "language", particularly if taking the "browser dialects" into account, which are crucial for professional pages....
I think one big problem with image captioning could be lack of high quality training data. While in this case we can generate lots of good training data. Whether we will be able to generate enough good data and have enough compute power to train on them is something that we need to find out.
Playing go was considered a problem too complex to solve couple years ago, but it's now a solved problem. So I am hoping we can get a breakthrough on this sooner than we think.