The site owner can allow such crawlers. There is the issue of bad actors pretending to be these types of crawlers but that could already happen to a site that want to allow google search crawlers but not gemini training data crawlers for example, so theres strong support to solve that problem