I concur with the Nutch vote; but more specifically, take a look at the crawler code written in the src trunk for use with Hadoop. That is probably a good place to start. Also worth a look is Heritrix (crawler for archive.org). http://sourceforge.net/projects/archive-crawler
Sadly, this too is written in Java.
Edit2: Polybot is another Python based crawler, but no code. However, the paper has some interesting ideas:
Design and Implementation of a High-Performance Distributed Web Crawler. V. Shkapenyuk and T. Suel. IEEE International Conference on Data Engineering, February 2002. http://cis.poly.edu/westlab/polybot/
The only Python one I am aware of for which code is available is: http://sourceforge.net/projects/ruya/
Edit: You might also want to take a look at http://wiki.apache.org/hadoop/AmazonEC2
Edit2: Polybot is another Python based crawler, but no code. However, the paper has some interesting ideas:
Design and Implementation of a High-Performance Distributed Web Crawler. V. Shkapenyuk and T. Suel. IEEE International Conference on Data Engineering, February 2002. http://cis.poly.edu/westlab/polybot/