Hacker News new | past | comments | ask | show | jobs | submit login

> A very nice thing about Monte Carlo simulation is that at the end your distribution of results are all within the feasible range.

I guess Monte Carlo helps provide conservative estimates behind a façade of rigour, but the truth of the matter is that in the end it's still GIGO.

Any empirical distribution only reflects the empirical measures that were used to generate it. If you bundle everything from the time it took employee A to walk the dog while allocated to project Foo to the time it took employee Z to fix a nasty Eisenbug while allocated to project Bar, and Foo and Bar used totally different tech stacks and team members and even approaches to project planning, that distribution is meaningless in estimating, say, how much time it will take employee G to implement a React widget.




I’m smiling a bit while reading your comment. You are completely correct that that estimate is virtually meaningless. And yet… that doesn’t mean it’s completely useless!

I’ve been using a somewhat meaningless Monto Carlo inspired approach to project planning for a while in my consulting company. The vast majority of the projects I do have only a small amount in common with previous projects, so when I’m estimating I’m informed by past estimate vs actual numbers, but not really relying on much other than intuition and gut feelings from past work.

My basic approach is to estimate 2- or 3-sigma upper/lower estimates for each task, modelled (incorrectly) as a Normal distribution in hours, and then sum the random variables to come up with a final distribution (whose variance is quite a bit smaller than what you see on the original bag of tasks). From there, if I am making a quote, I’ll quote at +3-sigma x hourly rate as a high-end effort estimate. If it seems like a very meeting-heavy client, I’ll either add meetings in as development tasks or just pad it out by a %age or fixed hours/week.

This technique has worked amazingly well for me, and it’s been quite rare that I blow the estimate, and in the… one time I can think of, we missed by very little and there were tasks that we hadn’t thought of when we did the estimate.

To your point though (with walking the dog), there’s a really key thing that this process doesn’t capture, somewhat on purpose: the resulting estimate is in effort-hours, not delivery date. While modelling per-task effort hours as Normal is suspect, modelling task delivery dates as Normal is completely irrecoverably wrong: delivery dates, in my experience, only ever slide in one direction. People get sick and the project slips a week; people don’t ever get super healthy and effectively knock out 80 or 120 hours worth of tasks in a week.

I honestly haven’t found a good way to estimate delivery dates very well. At one point I did put together some regression on my “actual billed hours per week” based on my billing, but ran into the same problem. “Oh, my dog died and my mother-in-law got sick during that project.”

Someone who’s better at statistics might have a better way to model that as a high-skew distribution, but when I tried doing that myself I ended up with a distribution that didn’t feel like it did a good job of capturing the non-negligible long tail of things that slow down calendar estimates without burning hours.


This sounds like a really useful approach. But I'm bad at math and statistics and also not my native language.

Could you maybe provide an example with figures?

Thanks


The traditional distribution for costs and dates is Beta-PERT. Douglas W. Hubbard uses log-normal distribution. I am a pessimist, so I like Beta-PERT with a fat, fat tail.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: