Hacker News new | past | comments | ask | show | jobs | submit login

There is filter by date and sort by date. The former works. The latter, when enabled, even adds a banner on top of the page (in large but gray type) that says “Articles added in the last year, sorted by date”, and resets any filter you might have set before.



Was this change ever logged or noted some way? Or did it just show up one day?


If it ever returned time-sorted results without limit, that was long in the past. It has truncated results to one year for the last several years I have used Scholar.


It seems so intentionally "broken", I can only guess it is to prevent scraping? Since searching for generic-ish search terms and sorting by date is a common scraping strategy.

Still, you'd think they'd do a cutoff of e.g. 500 or 1,000 items rather than filter by the past year.

So I can't help but wonder if it's a contractual limitation insisted on by publishers? Since the publishers also don't want all their papers being spidered via Scholar? It feels kind of like a limitation a lawyer came up with.


pubmed is literally built for academic scraping. It even has a command line interface to access it. If publishers were worried about scraping they'd target that, but they don't. In fact when papers go on pubmed after a year they are rehosted by pubmed central and made freely available to anyone in the world.


Unlikely, since the easy work-around for scrapers is to search by date range and grab things that way. That's what I do now manually.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: