stayml's comments

stayml · on Feb 18, 2023

Thanks and good point! (it's not happening, but I can understand the concern). I'll think about open sourcing the code for people to self-host.

stayml · on Dec 17, 2022

I filter out any user agents that are invalid, but there's no way to see which are real or faked. The access logs include the useragent of every single site visitor - not only errors/bad actors.

stayml · on Dec 17, 2022

Hm! It could just be a parsing error. The useragent in question is like this: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.41

zerocrates · on Dec 17, 2022

Pretty sure that's the Edge on Mac user-agent.

stayml · on Dec 17, 2022

Coming soon! I separate them out for now as most scraping tasks require either desktop or mobile useragents and not both together

n4bz0r · on Dec 18, 2022

> most scraping tasks require either desktop or mobile useragents and not both together

Why? Do some sites serve completely different content? Or it's simply markup differences? Have never done much scraping and I'd expect the viewport size to be the decisive factor these days, not the user agent. But, again, I don't know much about that.

stayml · on Dec 17, 2022

Good point, thanks. I'll add that in

stayml · on Dec 17, 2022

Yes, this too. It should just be a -passable- sample of what's popular and seen on the web

kqr · on Dec 18, 2022

I would accept this argument if the sample was unbiased but noisy. In this case it's extremely biased but (potentially) low in noise.

If people from Uganda aren't part of the target audience of this site, we won't get Ugandan user agents even if they happen to be a fair chunk of web users worldwide (certainly more than in my small but high-tech country.)

alostpuppy · on Dec 18, 2022

I wonder if we can help crowdsource this.

stayml · on Dec 17, 2022

Thanks! And yep, fair comment, and I had noticed this as well even more so in last week's list. I have been thinking about how I could adjust the numbers in some way to counteract this or add another data source.