not always. Just finished up a 7 week deadline in 12. 2x in my experience has been a fairly reliable estimate. A 3-4x project has either gone completely out of scope or was poorly estimated. And it’s not always the development team’s fault. Sometimes the client is not timely, or even habitually late with deliverables. Unless you’ve worked with them before and know what to expect, give the client a timeline of X but quote for 2X just in case.
Estimates are interesting and I encourage anybody to play with the statistics to see how it works. There are 2 scenarios you need to consider. First is the scenario where the stake holder asks for something, the thing is possible (i.e. it's a matter of working on it until it's done), they don't change their mind, the team doesn't have some kind of emotional incident (like people refusing to work together, someone having a serious health issue, etc). In that case you can split down the overall task into many sub-tasks and estimate each one. Each sub-task estimate will have a high variance, but the more sub-tasks you break it down into, the less variance the resulting overall estimate will have.
The key thing to understand is that while the overall variance will be small, you don't know what the multiplier will be. For teams that you are familiar with, technology that you are familiar with, etc, etc, in my experience people do tend to come out with about the same overall multiplier. Different teams are vastly different, though. I've had teams that operated anywhere from 0.5x (i.e. they overestimated by a factor of 2) to 4.0x (i.e. they underestimated by a factor of 4). It's always best to keep measuring your team's performance to figure out what the best multiplier to use (this is what's known as "load average" in some "agile" systems -- the inverse is "velocity"). Care must be taken to dissuade people from trying to optimise the load average/velocity because it will undermine your estimations -- usually I don't tell people what it is.
Now the other scenario is also fascinating, but requires much less statistics to understand well. Basically this is when the stake holder constantly inserts urgent requests, or cancels tasks mid-development because they thought of something better. Or when your team sits around for 5 days arguing about whether to indent with spaces or tabs. Or upper management decides to "accelerate" the process by constantly inserting new members into the team. Or someone intentionally tries to sabotage your project because they have a "competing" project in the company (or possibly just because they see you as a threat to their eventually triumphant march to the heights of the org chart). Or your lead developer decides that today is the day to institute a national "bring a firearm to work" day. In those cases your schedule is fiction, no matter what process you used to create it.
BTW, all of those things have happened on teams I was on in my career :-). Incidentally, not having a firing pin in a weapon is not considered evidence that it was "just a joke", strangely enough...
Excellent points, I think the whole point of SCRUM planning is to get estimations correctly and it is not as easy reading few articles on it. Hence involving a scrum-master is highly recommended who is usually a person like mikekchar who has done it multiple times atleast.
The scrum master has to have a fundamental understanding that 8 hours of work is not one day. It's generally interrupted by meetings, helping out a coworker, crisis mode on some bug and also generally fucking around with coworkers. When a Dev says 8 hours that should automatically be translated to 16-24 because they won't actually be able to be "heads down"
The interesting thing is that statistically, the things that interrupt a developer occur at a predictable rate. What's not obvious, though, is that you can't predict the completion of a single task. You can predict the completion of a statistically significant collection of tasks. If someone is asking "How long will it take to do X", where X is a single task and is banking the company on the reply, they are a fool. But if you ask "How long will it take to do 30 x" and you have some data to show how long x usually takes, then you can actually give some good replies.
Even more interesting (and I was thinking about this only 5 minutes ago), there are models for defect discovery rates. This basically means that when we write code we make software errors. The amount of time it takes before we discover the error is a random variable with a particular distribution. There are several models which try to allow you to determine how long it will take before you find 90% of the errors, for instance. I was just thinking that I bet the model for errors in requirements is probably very similar to the model for software errors and you can probably fairly easily model how long it will take you before you discover 90% of the remaining functionality that you need after you make the initial plan. Indeed, I've measured in a few groups I've worked on that for those groups they needed another 30% of the overall project time to complete tasks that were not anticipated at the start. Unfortunately I no longer have access to that data so I can' look at it in more detail.
(NB: 30% was just for those teams and those projects -- I don't for a second believe that the number is universal. The fact that it was similar for a few teams is probably coincidental.)