How would you actually use an anti-detect browser programmatically? Would you need to write a custom Selenium driver for it or equivalent for Playwright? Even if the browser is built off something like Chrome, you'd still need a way to interact with the anti-detect related features.
A good trick I discovered is using webkit thru Playwright to bypass fingerprinting and related anti-bot measures. Firefox/Chrome simply leaks too much information, even with various "stealth" modifications. e.g: have been able to reliably scrape a well known companies site that implemented a "state of the art, AI-powered, behavioral analysis, etc" anti-bot product. Using Chrome/Firefox + stealth measures in Playwright did not work - simply switching to Webkit with no further modifications did the trick.
Not exactly what you're asking, but my point is, that with a little time and effort, I've usually been able to find fairly simple holes in most anti-bot measures -- it probably wouldn't be terribly hard (especially since you're versed in scraping) to build-out something similar to what you're looking to achieve without having to pay for sketchy anti-detect browsers.
Yes, that’s what I’ve done up to now. When forced to use Playwright, I’ve noticed too that Webkit is less detected, but depends from website to website.
I tried the solution described on the substack, fundamentally the gologin browser, based on chromium, opens a port on your local machine and Playwright connects to that browser, automating the crawling.
Yeah, Chrome is the worst choice for this use-case - see my last comment on this thread for more on that. Can you speak a bit more on what you'd like to use a headless anti-detect browser for over regular headless browsers? Is it to leverage their built-in fingerprinting control, effectively avoiding anti-bot measure with little effort, or management of multiple "profiles", etc? My system effectively comes down to using webkit, and storing credentials (encrypted w/ symmetric key) as well as whatever information is needed by Playwright to reconstruct the session. Simply using webkit + DB effectively achieves a headless anti-detect browser, but you're right that webkit alone isn't always a one-and-done solution.
> I think using anti-fingerprinting is itself a fingerprint. I imagine it would be easier to hide in the noise of regular browsers.
That's what I thought originally too. The problem is the "leaky-ness" of Chrome and Firefox - they expose a large amount of information that can be easily used to train various ML classifiers. Chrome's DevTool Protocol is most commonly used when headless access to Chrome is desired and is inherently "leaky", by design as a protocol for debugging. Don't even try to use any flavor of headless Chrome, even with stealth plugins. Firefox isn't much better.
Webkit doesn't seem to expose as much information, and having a much lesser percentage of usage, I think there's simply less information to feed into a classifier to learn to detect it reliably. There's a few sites that offer fingerprint testing such as:
Try writing a script that goes to a page like this and have it take a screenshot, using Chrome, Firefox, and then Webkit to see the difference yourself. I use the Python port of Playwright personally. In the project I mentioned in my last comment, all I had to do was change the browser Playwright was using to webkit - i.e "browser = p.webkit.launch()" where "p" is a sync_playwright context manager instance. I tried Chrome and Firefox with many, many, attempts at stealth modifications and none worked. Removing my "stealth code" for the other browsers and changing it to webkit was all that was needed. Blew me away that it was that simple honestly. I've used this trick on other websites and have noticed webkit just gets processed differently by captchas/anti-bot, etc. Selenium should also offer support for a WebKit driver if you prefer it over Playwright.
A good trick I discovered is using webkit thru Playwright to bypass fingerprinting and related anti-bot measures. Firefox/Chrome simply leaks too much information, even with various "stealth" modifications. e.g: have been able to reliably scrape a well known companies site that implemented a "state of the art, AI-powered, behavioral analysis, etc" anti-bot product. Using Chrome/Firefox + stealth measures in Playwright did not work - simply switching to Webkit with no further modifications did the trick.
Not exactly what you're asking, but my point is, that with a little time and effort, I've usually been able to find fairly simple holes in most anti-bot measures -- it probably wouldn't be terribly hard (especially since you're versed in scraping) to build-out something similar to what you're looking to achieve without having to pay for sketchy anti-detect browsers.