I thinks it's great that 2 well known economists with household names in the academy have taken to marketing programming to economists. I just wish they would have called this venture something other than Quantitative Economics when they launched it last year. It's not that it's the wrong name, it is just misleading.
A better name would have been Quantitative Macroeconomics, because that is clearly the target market. Yes, I know there are some generic topics in there like Linear Algebra that both micro and macro economists could use, but it's not like there is a lack of numpy and scipy examples on the Internet.
There are applications of job search in both micro and macro (I happen to have written a PhD dissertation involving a micro approach to job search).
The treatment in the book we are discussing is decidedly macro, and looks generally similar to the treatment in a previous Sargent book titled "Recursive Macroeconomic Theory."
What are you talking about? Did you even look at their section on job search? It is most definitely a macro application.
Disregarding the fact that job search is a subtopic of unemployment, one of the main concepts in macro, you'll notice their model parameters are human capital, investment, and wage. The only quasi micro flavor of that model is search effort as these variables will almost always be in equilibrium based on some kind of an optimal stopping rule. Whether that rule has to do with finding a new job or discovering information about, say, the prices in the market dictates whether it would be a micro or macro use case. Again, if you actually looked at the model, their optimal stopping rule is when the person finds another job. And once again, the idea of jobs are the main ingredient in the concept of unemployment.... a macro topic. Not to mention, whether or not I'm searching for a job dictates whether I'm factored into the unemployment rate or not.
In microeconomics, search theory studies buyers or sellers who cannot instantly find a trading partner, and must therefore search for a partner prior to transacting.
Search theory has been influential in many areas of economics. It has been applied in labor economics to analyze frictional unemployment resulting from job hunting by workers.
Yeah, this is why wikipedia is not always the most trustworthy source.
I definitely studied these types of models in macro classes from Pissarides, one of the pioneers in the job search field. I think he'd be amused to learn that he had become a microeconomist.
Also, if you look at the example, it is taken directly out of Sargent's textbook: Recursive Macroeconomic Theory.
You bring up a good point in that the use cases for each language varies. R is a good example because one of the main reasons it is chosen is due to the fact that it already has the largest set of drop in econometrics operations for tons of extremely specific situations. In any other language, you'd often have to code a bunch of complex linear algebra by hand, which would totally negate any marginal benefits due to computation time.
In terms of econometrics, Python is going to surpass closed source software packages like SPSS, SAS, eViews, etc soon simply because the rate at which econometric procedures are being implemented in Python is growing steadily (e.g. via a statsmodels/pandas/numpy/scipy based stack). I don't think Python will ever pass R in this respect though as R has a large share of the statisticians helping add the implementations relative to Python.
I used to be gung-ho about convincing economists to use Python (started econpy.org in the 1st year of my PhD in economics, stopped updating it after 2 years). But now I just don't care. The vast majority of economists are horrible programmers, however most know how to script in at least 1 language, and a small number of them are actually really good at scripting. An extremely small number of economists know they're way around at least 1 entire general purpose programming language.
It seems that every time this gets brought up online a sea of infamous Fortran-programming-economists always tell you how fast Fortran is compared to everything else, signing it with "Fortran or GTFO". I've met hundreds/thousands of economists and econ grad students at schools ranked from 1 to 200, and I don't recall a single one that actually used Fortran (I'm sure there is a non-zero quantity -- I probably just haven't cared to talk with them b/c they are most likely macro theorists).
The fact of the matter is that in economics graduate school, your professors could care less what language you use. It isn't an algorithm competition, it's a story telling competition. But the type of story telling economists do is no less noble than writing elegant algos. Writing a story based on economic mechanisms and behaviors and supporting it with quality data, sound statistics, and logical/exact theories is no simple task.
"The vast majority of economists are horrible programmers, however most know how to script in at least 1 language, and a small number of them are actually really good at scripting. An extremely small number of economists know they're way around at least 1 entire general purpose programming language."
As an economist whose hobby is the study of programming languages, you can imagine the frustration that I feel. What I have found works best is to make grad students use basic functional programming techniques. Everything they do is just a few lines long. As opposed to a 250-line heap of garbage with three nested for loops.
I'm sorry, but that it's a wasted emotion. Economists are not software developers. They do the minimal they need and nothing more. It's like complaining that a masseur has poor baking skills.
Get over it! It's a support tool for them and until they demand that you take their code as a perfect example, developed have no need to complain.
It's more like saying a masseur doesn't have a deep knowledge of anatomy. For most healthy people it wouldn't be an issue if the masseur just did what they were taught, but they'll consistently be suboptimal and will occasionally do some real damage. See the recent high profile errors with economists using excel.
When you have to review their code, you have a very good reason to care about what it looks like. I also feel that I have an obligation to do what I can to improve the accuracy of our research.
"They do the minimal they need and nothing more."
In other words, they work on their program until it runs to completion without throwing an error message.
Hey, that's really neat. I'm doing a post-masters right now in energy economics and have been transitioning into using python for everything (from previously using Matlab or whatever was available), but I'm eyeing an economics PhD down the line.
Do you have any specific questions? I teach in a PhD-granting department and do research in energy economics. (I'm an associate editor of one of the energy field journals.)
Huh, some of your papers sound like ideas I sketched out in my commonplace book (particularly the ones on oil prices in the context of the larger economy), looks like I'm a few years behind.
I guess my biggest concern is the viability of pursuing a PhD in economics without having a previous degree in econ. I have a BS and MS in environmental science and an MPA in energy policy/economics. I think I have stronger scientific chops than many economists, but I don't know how valuable that actually is.
Also, it seems to me that the general market for energy economists is very strong, and will remain that way for quite some time. Do you think that's accurate?
I love plowing through data and developing methods and analysis to try and draw out relationships and show why things are what they are. I know that sounds incredibly vague, but as far as I can tell economics is the field that most closely aligns with this in the context of energy and the environment.
"I guess my biggest concern is the viability of pursuing a PhD in economics without having a previous degree in econ."
It's hard to say without knowing more about your background, but many have done it before. A degree in econ is useful but by no means necessary.
You should probably worry more about your math training (have you taken real analysis or another course with real proofs?) than your econ training. I had three calculus classes and linear algebra. The first week of classes the professors were talking about quasiconcavity and the Bellman equation. I was lost. I worked 80-90 hours a week the first year in order to catch up.
"Also, it seems to me that the general market for energy economists is very strong, and will remain that way for quite some time. Do you think that's accurate?"
I have no specific data, but yeah, the market for economists in most fields is strong. Our program is not highly ranked, yet almost all of our PhD's are able to get tenure track jobs with reasonable teaching loads. To my knowledge environmental/energy policy is a hot field right now, particularly if you can bring in grants.
"as far as I can tell economics is the field that most closely aligns with this in the context of energy and the environment"
You should also consider PhD programs in agricultural economics or environmental economics. Highly-ranked economics programs often have a theoretical/mathematical focus that doesn't fit with what you want to do. I know from experience that energy isn't fully respected in economics departments.
I actually have a friend in Cornell's PhD economics program right now, and he's warned me about math (and he majored in math) and that Real Analysis in particular is crucial. That's what most concerns me, as I've leaned on my physics-PhD significant other in the past for some math-related help.
That's a really good point about field-specific economics programs, I hadn't really considered that, but it makes a lot of sense given my interests and current abilities. It seemed to me that energy was weirdly neglected when looking through programs (even though it has so many neat peculiarities and inefficiencies), so it's good to know I'm not the only person who has noticed that.
Thankfully, I'm not looking at it for the immediate future as my current position will last a few years (we have boatloads of funding and bipartisan support), and I'm at possibly the easiest national lab to get a staff position and advance without a PhD (Oak Ridge). Down the line though, I know I love academia and teaching, so I'll probably go back, and your advice is super helpful to think about in the interim, thanks again!
I have been researching the used car market for the last year in an academic lab. I would love to share stories about data collection and management if Carlypso folks are interested. Not looking to be a consultant or being consulted or any kind of a "gig". As I'm sure they know, used car data is a crazy world of imperfect data points, especially as the car becomes older (>6 years old I'd say data quality begins to break down fast). What proportions of your data are scraped, purchased, or maybe even obtained from a free API like Edmunds.com? I doubt you're using Edmunds as I didn't catch any affiliation info on the site.
I laughed at the "oh yeah that makes sense" story of the big truck being priced higher in TX. My goto story is always the price of convertibles in the winter in MN vs. TX, or the price of an AWD sedan in the winter in MN vs. TX.
Speaking of AWD sedans, I know it's a rare car so this is nitpicky but your 2008 Saab 9-3 data classifies a Turbo X as an Aero. Also the Turbo X trim isn't listed for the wagon. At any rate, your estimate for the Sedan Turbo X is about 20% too low. Granted they only made 600 for the US market, I was still bummed that a car which is so dear to me had this issue. Haha, I'm not trying to be that guy whatsoever! In fact I was happy when I saw Saab was even listed as a Make as many don't even acknowledge them anymore! :(
Anyways let me know if you guys are up for a friendly talk.
@ Zissou - I'm Chris a Carlypso Co-Founder. I'm always up for a friendly talk!
Getting the pricing to be reliably accurate was one of the biggest challenges. We tried many of the third party tools only to realize that most were not accurate for our needs.
We do a few things that distinguish our pricing from other sources after realizing no third-party tool worked particularly well. Granted, there are anomalies but here's a few brief guiding principles:
(1) As you pointed out the region matters - a truck in TX is not the same as one in central San Francisco.
(2) A listing is not the same as a sale. Dealers often attempt to hold gross margin when a car comes in, then gradually reduce to market rates to sell the car in a period of 30-45 days (e.g. price it high to start and then lower it). We measure how long a given car has been listed on a dealer's site, and often cars listed for a higher price only sell after they are reduced to a lower price. We can do some validation of the final negotiation with DMV records and comparing those values to final listing prices. Fewer and fewer cars have large negotiating margin.
(3) You can measure the relative demand and supply of the vehicle market by looking at the flow rate of the vehicle relative to the total local market supply (E.g. measure how many civics sell in a given month relative to how many are available that month).
(4) Everyone claims their vehicle is perfect but every vehicle we inspect needs some level of refurbishment so we factored that into our pricing to give "average levels". If a car truly is perfect, we're more than happy to help sell it for more than we predicted, we just prefer to be direct and honest upfront rather than reduce prices after the inspection occurs.
(5) The price floor is always set by what someone else would pay in very short notice. This is most easily observable by looking at auction values.
As a side-note, pricing a rare car is virtually impossible --- we can only price cars where there's a significant market and low levels of heterogeneity. The variance on a 1967 Porsche 912 can't be estimated by traditional models since a numbers match car with Fuch's wheels and three gauges is worth more than a restored car with non-numbers match engine, 5 gauges and re-welded floor pans.
I just checked your site out and wanted to give a little feedback since I like the idea after some mulling.
To buy a car it's somewhat annoying. There is no good way to filter based on things such as transmission for example or search based on anything at all such as location or price even. I have to go through the whole list and as you grow that will be no longer viable, it was already a pita.
Also when I go to an individual car there are very few details. The photos are good and what a lot of terrible CL ads lack, but it would be nice to have the specific engine in the car for example since in many cars there are many options there. I also expected a list of options and standard items like carmax gives.
Finally you should include the VIN, I always have a friend run it for me before I even see a car, that has saved me so much time.
Thanks for the info and I do look forward to the new API!
However my question still remains with regards to historic posts/comments. The historic aspect is really the import element here. Generally speaking, building an ngram viewer requires a collection of texts over time, with each text having some kind of metadata that is categorical, boolean, datetime, or numeric. Categorical data can always be made of numeric data by creating bins -- i.e. posts by people with karma or a ranking of 1-50, 51-150, 151-300, etc at the time the comment/post was created. Datetimes can also be made into useful categorical variables for an ngram viewer such as day of the week (to spot weekly seasonality trends) or day of the year (annual seasonality trends).
If I was allowed, I would be willing to write a scraper/crawler to discover as many historic threads (since: threads -> comments) as possible using HNSearch, but this could take a long time depending on rate limits and/or be subject to unknown biases within my discovery method. I'm sure you can understand why a "top-down" approach like a database dump would make for a much higher quality corpus than attempting the "bottom-up" approach of a crawler. I have no idea if a "database dump of everything" is even feasible as I don't know anything about the HN's backend infrastructure. However, if it is feasible, then I'm certain that I can work with whatever would be available. Adding structure to unstructured data is my bread and butter.
I really think this would be a very cool tool that a lot of people would enjoy, so I'm willing to do what is needed on my end to help make it work. After all, I'd be on the clock while working on this rather than just a hobby project, so the incentives are definitely aligned on my end.
If you want to discuss anything in private, I can be reached at the following reversed address: moc{dot}liamg{at}yalkcin{dot}wehttam
In your original business model you wanted to understand the price of everything. In what ways did the problem of a lack of information on the demand side come up? That is, it is easy to scrape the price in many markets (supply side), but what kinds of conversations came up within your team about the lack of information on how many units were actually sold at a posted price?
By the way, glad to see you guys were able to make a business out of crawling. I've landed a handful of freelance gigs since leaving grad school based on scraping data for clients, but never tried to expand it to anything beyond consulting projects.
Not an economist, but I have been mulling over a project surrounding scraping and pricing.
Without having access to the actual monetary transaction data, how does one know what was sold and for how much? Without this (or a mechanism by which the lister closes or updates the listing), how do you know anything was actually sold?
"how does one know what was sold and for how much?"
For example with domain names sales prices only a small amount of transactions are public. For example I've been doing it for 16 or 17 years and have never made any of my data public nor have people that I've consulted for.
Another example might be commercial rents. You can track asking rents but you can't really get a handle on actual rent paid since there are many deal factors (renovations, free rent, triple net etc.) that would change the numbers significantly.
Also an economist and a data scraper/consultant here -- depending on the data, some times all you need to figure out is correlation -- frequency of updates, listings being live for X time; clusters of listings around Y days, etc.
In terms of a few real-life examples, on the one hand you have eBay which provides you with sold data (API through Terapeak). On the other hand you have Craigslist, which is kinda opaque, hates scraping, but you can monitor listings and their half-life. (Listings that disappear quickly presumably get sold quick; listings that stick around for weeks relisted over and over have lower liquidity presumably and/or are priced high.)
eBay's completed listings is definitely one of the best applications of obtaining sales data on the Internet that I'm aware of. Besides that, in some cases there are ways to imperfectly estimate quantities when best seller rankings are available (e.g. at Amazon) -- Chevalier and Goolsbee where the first to suggest this approach back in 2003.[1]
As you mentioned, monitoring half-life is another imperfect approach, but it is of course plagued by false positives (a listing goes away but no sale was made). There was a Google Tech Talk many years ago where some economists took this approach[2], except they were looking at pricing power instead of measuring quantity sold.
As a long time pandas user, I'd say this is one of the better write-ups I've seen that illustrates the versatility and functions of the Series and DataFrame objects without being too long winded.
Just one thing to point out regarding the final example: read_csv will actually fetch a URL if it's given as input, so there is no need to use urllib2 and StringIO. Instead, you can just do:
It's my creation. :)