Are there any difficulties in generating a program in standard languages in Python? Did you choose a DSL because neural network is sensitive to the output programming language?
It turns out the full grammar of Python (and almost all real programming languages) is quite large; this is very early and new work in neural program synthesis, and so we chose a pretty limited DSL to make sure that we could at least solve this one before moving on to more general ones that contain state, conditionals, for-loops, etc. In theory however, we can apply the exact architecture to Python programs and see what happens. We haven't tried yet. :)
Why does the final example in figure 14 fail completely? The outputs are correct as far as they go, but they're all incomplete.
Is it because the scoring metric has a point where enough of a good start outscores an alternative in the beam search that could lead to a more complete solution? In non-trivial real-world examples, would the be a major problem?
Theorem solving is very closely related to program induction (we just change the grammar). Just as with Python, the underlying search space would be incredibly large, and while in theory, we could simply change the DSL and it should work, it'll probably involve a few more iterations of the model or other insights to see this to fruition (but it's definitely not impossible).
It looks like this is a more comprehensive version of FlashFill (can do more tasks), and it is based on deep learning instead of previous rule-based techniques in FlashFill.