Hacker Newsnew | past | comments | ask | show | jobs | submit | dmingod's commentslogin

Yea, its sad cause you have this cool middleware that can do so many things with the whole EIP theory and all the premade plugins.

What is eating this guys lunch really? and I think there is a void there somewhere thats not nicely filled currently. Some pain points that are exposed because of its demise. Thoughts?


Is there a way to download the whole archive? People can do cool stuff like visualization etc. on it.


Yes, but it's huge. At one time you could torrent it piece by piece but now the link appears broken...

http://libgen.io/scimag/repository_torrent_notforall/

Anyone have the current link?

Good luck extracting much of anything useful out of older PDFs though.


This? http://libgen.io/scimag/repository_torrent/

I'm not sure what the numbers mean, but the last-modified dates on those torrents span a range of 3 years ago to this month.


Yes, that's the one! Those numbers refer to the number of papers in each torrent, so each one contains 100,000 papers giving a current total of 66+ million.

The torrents of 100,000 are broken into 1000-paper zip archives that can be downloaded individually, so it's pretty manageable if you want to just check out a random sampling of the papers.

I would love to see somebody do some kind of massive scale analysis of the papers, but just extracting plain text from all those PDFs is a pretty herculean task considering that many would need to be OCRed, and others end up pretty garbled / misformated with pdftotext and the like.



I could have sworn they were uploaded to usenet too. But I can't find it for the life of me.


I thought about mirroring it, the repository db is 200MB and simple in structure, but then you have to have quite a lot of hdd on your side (20, 200TB maybe more, can't recall)


Your logo is just a reversed codeigniter logo... hope you are aware...


+1 to that


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: