That would probably help, but it's also a continuation of the cat and mouse game. There are plenty of captcha breaking services out there, it only cost about $1 to programmatically solve 1000 captchas.
> There are plenty of captcha breaking services out there
Give it a try and see what happens.
People said greylisting against email spam wouldn't work, since spammers would just resend. It works since 20 years. To get your IP off the DNSBL NiX Spam you just have to follow a link. People said spammers would automate that process. Never happened in 19 years. Sometimes spammers are just lazy.
Agreed. I suspect that this is an arbitrage game on the part of the SEO spammers. Each search is cheaper for them than it is for a competitor who's using a major search engine with more extensive anti-spammer protections, and that difference equals $$$. A captcha doesn't have to be an unbeatable solution. It just has to provide enough of a barrier to equalize the cost.
I'm not so sure about this. The spammers goal is to build up as big a list of link spam targets as possible. If one spammer chooses to only scrape minor engines and another only major engines, the one scraping the major engines will probably come out on top despite the higher cost. Whoever is abusing OP's search engine is likely doing it to supplement the data they are already scraping from the major engines.
For OP, I think simply not returning results at all is a more practical measure because it removes the reward completely. Captchas and bot detection keep the reward in play, while taking away the results entirely makes the entire pursuit futile.
It might be a better idea to return low quality results than nothing at all. The idea is that it's pretty obvious when the bot is banned when it receives no results at all. Having to look at the results manually to determine whether one is banned is a much more time consuming endeavor.
Well what I'm suggesting isn't about blocking the bots, it's about removing the incentive. So in this case, I think the more obvious it is the better. I would want them to realize as soon as possible that they are 100% wasting their time.
If anything, it might be best to return a page that explicitly states "Sorry, this search engine no longer supports SEO footprint search queries."
On the other hand, making content difficult to parse is easy to do and a very strong weapon. Make them waste dev time... It is much easier to make variants of HTML than it is to parse it. You can even automate it to some degree.
Then monetize by setting up your own captcha farm, but instead of paying for compute, send the captcha to the spam bots, who send it to another captcha farm and solve it for you.
As I understand it, the main point of CAPTCHAs isn’t to keep out bots completely, but to give enough friction to make automated attacks or uses infeasible, while keeping the friction low enough that normal users can still use it normally.