I totally agree with this. We learned it pretty quickly that classification does not generalize across domains so we narrowed the problem space by focusing one domain at a time followed by predefined and fixed set of categories so that we can measure effectiveness of our solution as we experimented with different algorithms and deployment pipeline.