Possibly. Having millions of URLs to populate each class is a good thing to star...

Possibly. Having millions of URLs to populate each class is a good thing to start with. Gathered through other means, our current dataset has around 10M URLs in the 'ads' category. The model we made available to the public was built from 2M of these URLs.

EDIT: of possible interest is that these models output a probability and possibly a confidence of having a URL blocked. Base on these, an blocker could ask for confirmation.