I just realised in the above comment I was making a mistake in an analogy to shortest path optimal substructure. In shortest path, starting with an optimal solution, "subproblems" have optimal solutions. This is not true starting from a non-optimal path, although length-1 subpaths are optimal. But still not sure of a particular way to phrase optimal substructure for code size. Sorry for the confusion!