Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Please, tell me why scrapy better than wget? I can easly call wget from my python scripts...


wget is synchronous while Twisted is an asynchronous networking engine. This means that you don't need to wait for a request to finish before making another one (or making pancakes, or doing whatever you want).

I essentially wrote a parallelized version of scrapy which has the ability to make hundreds of requests per second, depending on available CPUs. You could never achieve that level of performance using wget.


This is great. I was running threads on a current crawl job but the real bottleneck is BeautifulSoup and not the network. So splitting the project into threads(while it helped about 10%) wasn't really necessary and Twisted probably would have done the trick.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: