How would you actually use an anti-detect browser programmatically? Would you ne...

DantesTravel · on Nov 18, 2022

Yes, that’s what I’ve done up to now. When forced to use Playwright, I’ve noticed too that Webkit is less detected, but depends from website to website. I tried the solution described on the substack, fundamentally the gologin browser, based on chromium, opens a port on your local machine and Playwright connects to that browser, automating the crawling.

jmt_ · on Nov 18, 2022

Yeah, Chrome is the worst choice for this use-case - see my last comment on this thread for more on that. Can you speak a bit more on what you'd like to use a headless anti-detect browser for over regular headless browsers? Is it to leverage their built-in fingerprinting control, effectively avoiding anti-bot measure with little effort, or management of multiple "profiles", etc? My system effectively comes down to using webkit, and storing credentials (encrypted w/ symmetric key) as well as whatever information is needed by Playwright to reconstruct the session. Simply using webkit + DB effectively achieves a headless anti-detect browser, but you're right that webkit alone isn't always a one-and-done solution.

anxiously · on Nov 18, 2022

Any tips/code examples for your webkit solution(s)? Where does one begin with using webkit for scraping?

I think using anti-fingerprinting is itself a fingerprint. I imagine it would be easier to hide in the noise of regular browsers.

jmt_ · on Nov 18, 2022

> I think using anti-fingerprinting is itself a fingerprint. I imagine it would be easier to hide in the noise of regular browsers.

That's what I thought originally too. The problem is the "leaky-ness" of Chrome and Firefox - they expose a large amount of information that can be easily used to train various ML classifiers. Chrome's DevTool Protocol is most commonly used when headless access to Chrome is desired and is inherently "leaky", by design as a protocol for debugging. Don't even try to use any flavor of headless Chrome, even with stealth plugins. Firefox isn't much better.

Webkit doesn't seem to expose as much information, and having a much lesser percentage of usage, I think there's simply less information to feed into a classifier to learn to detect it reliably. There's a few sites that offer fingerprint testing such as:

- https://amiunique.org/fp

- https://webscraping.pro/wp-content/uploads/2021/02/testresul...

Try writing a script that goes to a page like this and have it take a screenshot, using Chrome, Firefox, and then Webkit to see the difference yourself. I use the Python port of Playwright personally. In the project I mentioned in my last comment, all I had to do was change the browser Playwright was using to webkit - i.e "browser = p.webkit.launch()" where "p" is a sync_playwright context manager instance. I tried Chrome and Firefox with many, many, attempts at stealth modifications and none worked. Removing my "stealth code" for the other browsers and changing it to webkit was all that was needed. Blew me away that it was that simple honestly. I've used this trick on other websites and have noticed webkit just gets processed differently by captchas/anti-bot, etc. Selenium should also offer support for a WebKit driver if you prefer it over Playwright.