Hacker Newsnew | past | comments | ask | show | jobs | submit | stayml's commentslogin

Thanks and good point! (it's not happening, but I can understand the concern). I'll think about open sourcing the code for people to self-host.


I filter out any user agents that are invalid, but there's no way to see which are real or faked. The access logs include the useragent of every single site visitor - not only errors/bad actors.


Hm! It could just be a parsing error. The useragent in question is like this: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.41


Pretty sure that's the Edge on Mac user-agent.


Coming soon! I separate them out for now as most scraping tasks require either desktop or mobile useragents and not both together


> most scraping tasks require either desktop or mobile useragents and not both together

Why? Do some sites serve completely different content? Or it's simply markup differences? Have never done much scraping and I'd expect the viewport size to be the decisive factor these days, not the user agent. But, again, I don't know much about that.


Good point, thanks. I'll add that in


Yes, this too. It should just be a -passable- sample of what's popular and seen on the web


I would accept this argument if the sample was unbiased but noisy. In this case it's extremely biased but (potentially) low in noise.

If people from Uganda aren't part of the target audience of this site, we won't get Ugandan user agents even if they happen to be a fair chunk of web users worldwide (certainly more than in my small but high-tech country.)


I wonder if we can help crowdsource this.


Thanks! And yep, fair comment, and I had noticed this as well even more so in last week's list. I have been thinking about how I could adjust the numbers in some way to counteract this or add another data source.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: