*deadline because for 500th time someone promised something impossible to the cl...

NyxWulf · on April 28, 2012

I couldn't agree more. Accuracy is a problem, variation is another problem. Dealing with layers in the business who have no math or statistics background but very strong opinions is yet another complication.

These types of conversations aren't uncommon.

Other - "I need you to prove our stuff does X, Y, and Z".

Me - "Ok.."

Me - "Ok the data shows our stuff does X but Y and Z are just random noise"

Other - "We ran it once before with this other guy and it showed our stuff did X,Y and Z. We've been promising it to our clients for a year. He gave us several examples, but when the clients asked to see the underlying data he couldn't produce it. So we just need you to prove it does X,Y, and Z."

Me - "The data only shows it does X. Y and Z are impacted positively through X, but once you condition on X, Y and Z are not causally affected by our stuff"

Other - "Yeah...well I promised client we would give them a report by {{insert random ridiculous date here}} proving it did X, Y and Z. We are going to lose them if we don't deliver a report saying that"

Me - trying for the 50th time to explain they shouldn't promise a positive result when we've never looked at the data.

There are hundreds of variations on this conversation. Your code is wrong is one variant (which depending on the timeline is hard to dispute). Of course if you take long enough that your code is correct, then you are going to slow. This isn't a science experiment, just make it work is another. Watching someone go slack jawed and start drooling because you accidentally used a math term is always interesting.

I have a whole new perspective of being on the cutting edge. It seems like it mostly means you are on the cutting edge of comments from people who don't know how ridiculously hard what you are doing is.

pgroves · on April 28, 2012

Man, this is statistics. You should be able to get any result you want!

I'm only half kidding. I can remember writing my first report (project summary) when I did a contract right after grad school. I put in maybe 5 graphs. Two looked good, three looked bad. The project manager just deleted the bad looking graphs and sent it on to the client.

reinhardt · on April 28, 2012

"Some people use statistics like a drunk uses a lamp post - for support rather than for illumination."

tgflynn · on April 28, 2012

The company I used to work for had a performance based product. They only got payed if they actually showed improved accuracy against a given evaluation set. Then they got a fraction of the cost savings (say 1 year's worth).

This seems like it could be a good model for machine learning consulting, and one that I would certainly be willing to explore.

It would work something like this :

  1) You show me your problem and your data.  

  2) We  come to an agreement on how accuracy would   translate

into financial results and on a fair split of the savings or earnings.

  3) I develop a model.

  4) You evaluate it based on 2.

  5) I get payed based on 2.

If my model doesn't meet minimum performance criteria I don't get payed. If it does very well, and assuming the problem was economically interesting in the first place, you save a lot of money and I get a fair sized chunk of it.

Feel free to explain why this business model wouldn't work.

Edited for formatting.

NyxWulf · on April 28, 2012

Most business people aren't interested in model accuracy as a term. They want something that provides benefit, e.g. cost savings, increased revenue, increased profits, etc.

The sales process of convincing someone they need an accurate model is tough, especially because robust models are time consuming and expensive to build.

If you can come up with a model that shows good results, and people know they need those results, then you can start a company selling either a service or product to get those results. If people don't know they need your results - then you have to educate them, in which case it's a much more difficult business to start.

I don't know many business people with the temperament, understanding, or the pocket book to deal with general research type problems.

pgroves · on April 28, 2012

I think there will be a day when this will work. But right now my concerns would be:

1) The people looking for outside help probably don't have ANY model, so baselines are difficult. (I use synthetic baselines like just predicting the average of the predicted variable every time, and that's a very valuable tool, but I don't think you could get paid by beating them.)

2) When would you cut your losses on a failed project and move on? That would be incredibly difficult to do on a project that you had spent weeks or months on and not been paid. It's like cutting your losses on a failed trade in the stock market... it seems like it would be easy and obvious until you actually experience it yourself.

3) Once a company was in a position to take you up on their offer, they might as well post it on Kaggle and get a hundred people to work on it for peanuts. I'm really rooting for Kaggle b/c one of their long term goals is to let people like you (and me) make a living doing analytics work like you describe. But right now they just don't have the volume and all the projects pay out just a few thousand dollars (and only if you beat the other hundred participants).

4) If I was a company, and didn't have the expertise in house to build the model myself, I'd be wary I was really getting what I paid for. If I'm paying for a 10% boost in accuracy, how do I measure it rather than just taking your word for it?