Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

According to this reasoning you should set targets based on expected time, not "median" time. You're saying the tail of "completion time" is long, and so the median is a poor estimator for expected time. But if you actually knew the distribution of times for task completion (which is implied if you know the 'median time'), you could simply use the expected value of this distribution and your projects would then tend to complete on time.

I don't think this is the correct explanation. I think it's far more likely that (1) people don't know the true distribution of task times, so estimates are just crap guesses based on hubris, what managers want to hear, etc., and (2) scope creep.



Another reason is a kind of selection bias: an organisation wants something built, gets offers from various software companies, and picks the cheapest.

That is, if you are doing the project, your company's initial estimate was the one that had the highest chance of being too low.


That's not only that, the field is full of bullshit. Advertising feature as available when they have not even been discussed seriously is the usual in the companies I have worked for the last 10 years. Our website and sales all go advertising "We support X. We are ready for regulation Y", while we are actually waiting for a paying customer to buy it to even look into it.

And that's all companies. You can't really be honest as all your competitors are similarly bullshitting. ( I used to be upset about that thinking I was always working for the black sheep, then over the years, you always end up working with your competitor one way or another and you find out it is the same everywhere )

There is also the deadline game. The client will push for earlier and earlier release date. The provider will accept because the provider knows that the client will not be able to test the product. I used to be upset to deliver code that would not even compile. Then over the years, we have had client not ready to test for several years. An extreme case, is a client that took a package I developed 5 years after delivering the working version of it.

That's the biggest problem I have had with Agile. Very often companies are not ready to support the lack of bullshit even internally - no more schrodinger status, no creative budget allocation - developer appear to be slower and cost more.


> An extreme case, is a client that took a package I developed 5 years after delivering the working version of it.

You completely lost me there, what happened exactly?


A client wanted to run our application in windows at the latest, a week before I joined the 2 men project. 2 months (?) later we delivered a "patch" (the first version didn't even work at all) of the initial solution using the same unix scripts running in interix.

Seems like they were not in any hurry after all. Some 5 years later, they contacted my company again asking how to install it in their test environment. They were apparently not happy that after 5 years, the solution was still a bodge solution using interix rather than a proper port in windows. I don't know what happened from there, I had moved on to another project right after that delivery, could not remember anything and to be honest only painful memories could come back from that shitty codebase. ( the company was making something like 5K a year gross from that application )


I think he means (and that is also my experience in running services companies) that even if you deliver the software to your client, the client does not have the time scheduled internally to actually test and report issues. So you delivered all the client wanted and they will not even look at it because the internal manager who issues the project and his team are too busy.

We delivered a project a few weeks ago and heard nothing; I heard only yesterday that the manager went on holiday and will be back 3rd week of january. And when he comes back his inbox will be full so I do not expect any testing till the 2nd week of feb...


Exactly. All this article is really saying is "most of the time, adding together median expected times underestimates the sum of mean expected times". Well then, easy fix: add together the mean expected times off the bat.


Bingo. When we think about the steps intuitively, we tend to gravitate toward the median. "How long does this usually take?" is asking about a median in surprisingly precise terms.

Instead we have to understand the distribution, and the mean as well, and add up how long it takes on average, rather than just in the most numerous case.

Further, and perhaps more importantly, by understanding each distribution, you have far greater knowledge about what the expected time will be, and where the delays are most probable; and you can begin to start analyzing where those delays might be coming from in the interactions of all the parts of the system. This is straight-up quality talk right out of W. Edwards Deming, and it's how you start to improve how you produce in general. Good stuff.


The problem with high estimates, if accepted by the stakeholders, is the risk of over-engineering. When given lots of time, development teams often will find ways to use that time which don't add to the value of the features being delivered: over-testing, making things "re-usable", a chance to try something new. And the project is late anyway.

Maybe optimistic estimates are a motivating factor for teams. The desire to finish faster, to be more efficient than you were before, a commitment with a challenge.

In my experience, it takes a strong product owner and a mature development team to meet deadlines. Shipping on-time is always a game of tradeoffs. Accuracy in estimates is usually the result of doing things in a known, measurable way. And by keeping the stakeholders close, you can make critical decisions together to keep a project on track.


And account for correlations. It would be interesting to see if multi step projects with overruns tend to overrun in more steps then expected. I could imagine a stakeholder management heavy project to be more likely to overrun in every step. In other words, even by summing the distributions you might be underestimating the expected time of completion.


There's more to it than your easy fix, the article alludes to it but does a poor job of covering it: if you know the distributions of the intermediate steps, but you only sum the means of the steps, then you are throwing away data that could help you calculate a better estimate.


Oh of course! More data (sanely applied) is almost always going to result in more accuracy. I just found it very strange that the core assumption of the article -- that the default setting is to add median times of subtasks -- was never questioned. Why was that the chosen method to begin with? Most time estimating tools (Pivotal Tracker, etc.) work with averages, not medians, for precisely this reason.


I think what he/she's saying is you can't predict how badly something can go (since it can go infinitely badly), you can only predict how well it can go (because it actually touches the finite minimum). Which passes the logical sniff test and is in fact the correct explanation. At least it has been for absolutely every project I've ever been on.

You could feasibly be on a project that could last until the heat death of the universe.


if it's a long tailed distribution knowing the expected value isn't going to tell you anything you can count on


If you're working with censored data, it's possible to know the median but not the mean of your data.

https://en.wikipedia.org/wiki/Censoring_(statistics)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: