Pretty cool analysis! Scrapy is great but if you're looking to extend this further, I recommend checking out the 3taps API. 3taps is the only API I've seen that breaks down the different components of a Craigslist post (e.g. heading, price, number of bedrooms) and makes it available for scraping. Using 3taps will also make it easy for you to extend the analysis to new cities (you literally change a single parameter). And you should set up a cron job (or Heroku scheduler task if a web app) so you don't have to run the scraper manually every day :-)
If you want to check out an example of automated Craigslist scraping, you can check out a search tool I built (craigslist-scraper.herokuapp.com) and the accompanying tutorial (baserails.com/apiscraper). Both are best viewed via laptop.
I like the varied analysis! Though, if you want to build a prediction model, it's probably best to avoid variables that can be gamed, e.g. # of pictures in a listing... For the larger zip codes, a heat-map of prices might help cut them down to more consistent behavior for prediction. Keep iterating - would love to see your progress over time!
I'm curious about this too, especially since OP says they'd like to continue scraping. I've heard of other people crawling craigslist getting banned pretty fast and even one site getting slapped with a lawsuit [0]. (Which looks like was not entirely successful for craigslist.)
Craigslist has even taken some measures to get their users to assign copyright of the content over to craigslist themselves [1].
I scraped it probably at least 50 times and nothing has happened. I heard that craigslist only blocks IPs that direct a lot of traffic toward their sites like Padmapper did.
Fun fact - many apartments run by management companies / owned by institutional investors dynamically price their units on a daily basis based on the number of phone inquiries, walk ins to the leasing office, occupancy rates in the neighborhood, and many other factors.
Yes, and scraping their publicly-posted apartment rents daily for a couple months can reveal patterns that'll save you a decent chunk of your lease.
Biggest determinant is supply of a particular unit size, though. If there's five one-bedrooms available in a yield-managed complex, you're getting a better deal relative to the building than if there's only one one-bedroom remaining - sometimes to the tune of hundreds of dollars a month.
A few other products out there apart from YieldStar include LRO and Rent Maximizer.
It's fascinating how much goes into the pricing. It's all based on quite a few variables, ranging from availability (like you mention), competing communities in the area, lease end date optimizations (I don't want >X leases expiring in one month, since that's a lot of work for my leasing staff), time of year, and price history. And on top of this, prices can change daily.
I've had the chance to work on a revenue management product myself--and I can safely say that it's changed my perspective as a renter. There's a lot that can go into the pricing.
My roommate swears that checking the posting a few times can raise the price. Is the number of clicks/views on a posting taken into account when prices are changed daily?
No systems I've seen use that specific data point. Sadly for communities (and probably good for prospective renters), the third party sites you might find listings on (Craigslist, Apartments.com, ForRent.com, etc.) don't share that data that I've ever seen.
However, demand, which is usually a variable in the pricing equations, will definitely be generated by the lead tracking the community does, which can include: # of incoming phone calls, # of walk-ins, # of incoming emails.
tl;dr: Clicking on the ads won't do anything. Taking action and reaching out to the community can do something. And a lead is usually tracked on a unique user basis--so one person calling 50 times vs. 50 people calling once is vastly different.
Insightful read. Your mention of text mining in "Future Look" reminds me of Levitt's findings regarding sales correlations with keywords in real estate listings.
If you want to check out an example of automated Craigslist scraping, you can check out a search tool I built (craigslist-scraper.herokuapp.com) and the accompanying tutorial (baserails.com/apiscraper). Both are best viewed via laptop.