To be fair, most analytical models working with data already incorporate a lot of 'data science' techniques. That does not have to be ML or NN, but could be as trivial as a least squares regression to fit you model to a set of observations that overdetermine the system of equations.
A more advanced example of a technique that was used before it was called data science is data assimilation (DA)[1]. Here you assume that you have observations (e.g. sensor data) that you want to use to inform the model, but they are noisy in some sense. With DA you take a set of observations at t=0 and fit a numerical model to that. Then you time-step the model to t=1 where you have new observations. The model and observations don't necessarily agree, but there is value in incorporating information from both. Based on e.g. your statistical description of the sensor noise, DA techniques give you the tools to combine data and models.
A good example of DA is 4DCOOL[2], combining temperature sensors in a datacenter with a CFD model. Because the model is physics-based, after some time you get a good idea of the temperature distribution in the whole room, even if you only have pretty sparse sensor data. (disclosure: I work for the company)
Flexibility: the problem might change in ways that are difficult to model. Also, the model might capture "unknown unknowns". And before you say that you need massive amounts of data: so does an analytical model, assuming you want to verify it.