I worked on this for a long time - 1. it's possible to make it "easy to switch" ...

I worked on this for a long time -

1. it's possible to make it "easy to switch" by having common building blocks and only changing the "selector" across sites - lots of companies in the space do this

2. it's impossible to do "just DOM" or "just vision/text" if you want to be able to generalize "get the price of the items"

- DOM doesn't represent spacial positioning very well (see: fixed/absolute positioning, IDs and dom changing without the visuals changing, ...) so you'd need the equivalent of an entire browser rendering engine in your "model" anyways!

- vision/text is messed up by random marketing popups (see: medium, amazon, walmart, ...), it's significantly more computationally expensive to do, and can't currently get >95% accuracy (which makes it useless, scraping needs very close to 100% accuracy in most use cases)