Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Any tips/code examples for your webkit solution(s)? Where does one begin with using webkit for scraping?

I think using anti-fingerprinting is itself a fingerprint. I imagine it would be easier to hide in the noise of regular browsers.



> I think using anti-fingerprinting is itself a fingerprint. I imagine it would be easier to hide in the noise of regular browsers.

That's what I thought originally too. The problem is the "leaky-ness" of Chrome and Firefox - they expose a large amount of information that can be easily used to train various ML classifiers. Chrome's DevTool Protocol is most commonly used when headless access to Chrome is desired and is inherently "leaky", by design as a protocol for debugging. Don't even try to use any flavor of headless Chrome, even with stealth plugins. Firefox isn't much better.

Webkit doesn't seem to expose as much information, and having a much lesser percentage of usage, I think there's simply less information to feed into a classifier to learn to detect it reliably. There's a few sites that offer fingerprint testing such as:

- https://amiunique.org/fp

- https://webscraping.pro/wp-content/uploads/2021/02/testresul...

Try writing a script that goes to a page like this and have it take a screenshot, using Chrome, Firefox, and then Webkit to see the difference yourself. I use the Python port of Playwright personally. In the project I mentioned in my last comment, all I had to do was change the browser Playwright was using to webkit - i.e "browser = p.webkit.launch()" where "p" is a sync_playwright context manager instance. I tried Chrome and Firefox with many, many, attempts at stealth modifications and none worked. Removing my "stealth code" for the other browsers and changing it to webkit was all that was needed. Blew me away that it was that simple honestly. I've used this trick on other websites and have noticed webkit just gets processed differently by captchas/anti-bot, etc. Selenium should also offer support for a WebKit driver if you prefer it over Playwright.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: