tirreno (1) guy here. Our open-source system can block IP addresses based on rul...

reconnecting · 2025-12-22T10:16:08 1766398568

I believe there is a slight misunderstanding regarding the role of 'AI crawlers'.

Bad crawlers have been there since the very beginning. Some of them looking for known vulnerabilities, some scraping content for third-party services. Most of them have spoofed UAs to pretend to be legitimate bots.

This is approximately 30–50% of traffic on any website.

notachatbot123 · 2025-12-22T09:25:26 1766395526

The article is about AI web crawlers. How can your tool help and how would one set it up for this specific context?

reconnecting · 2025-12-22T09:32:41 1766395961

I don't see how an AI crawler is different from any others.

The simplest approach is to count the UA as risky or flag multiple 404 errors or HEAD requests, and block on that. Those are rules we already have out of the box.

It's open source, there's no pain in writing specific rules for rate limiting, thus my question.

Plus, we have developed a dashboard for manually choosing UA blocks based on name, but we're still not sure if this is something that would be really helpful for website operators.

Roark66 · 2025-12-22T10:31:24 1766399484

>It's open source, there's no pain in writing specific rules for rate limiting, thus my question.

Depends on the goal.

Author wants his instance not to get killed. Request rate limiting may achieve that easily in a way transparent to normal users.

mmarian · 2025-12-22T21:32:46 1766439166

> count the UA as risky

It's trivial to spoof UAs unfortunately.

reconnecting · 2025-12-23T11:21:18 1766488878

It depends. If you want to stop OAI-SearchBot/1.3, UA will be enough.

mmarian · 2025-12-24T17:44:46 1766598286

Why would you need tirreno if you just want to stop OAI's bot though?

reconnecting · 2025-12-24T20:04:46 1766606686

OAI's is just an example that's easy to explain.

I believe that if something is publicly available, it shouldn't be overprotected in most cases.

However, there are many advanced cases, such as crawlers that collect data for platform impersonation (for scams) or custom phishing attacks, or account brute-force attacks. In those cases, I use tirreno to understand traffic through different dimensions.

mmarian · 2025-12-22T21:31:55 1766439115

> block IP addresses based on rules triggered by specific behavior

Problem is, bots can easily can resort to resi proxies, at which point you'll end up blocking legitimate traffic.

reconnecting · 2025-12-23T11:28:35 1766489315

Again, it depends. Residential proxies are much more expensive, and most vulnerability scanners will never shift to them.

I believe that there is a low chance that a real customer behind this residential IP will come to your resource. If you do an EU service, there is no pain to block Asian IPs and vice-versa.

What is really important here is that most people block IPs on autopilot without seeing the distribution of their actions, and this really matters.