Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Doing It Wrong (tbray.org)
114 points by gthank on Jan 5, 2010 | hide | past | favorite | 45 comments


I think the author is misled by survivor bias. How many Basecamp clones have been released and failed? How many more have been planned, half-implemented, and shelved?

Also, how many "enterprise" development teams can commit to creating and supporting one and only one (1) app over a multi-year period? How many can depend on thousands and millions of users to discover bugs, do load testing, penetration testing, usability, etc etc?

How many "massages" has Flickr had in the last 5 years? Is that uptime acceptable to a bank?

How many shoe retailers and state governments even know how to recruit and retain the type of talent that produces great software? How many could admit they don't?


How many shoe retailers and state governments even know how to recruit and retain the type of talent that produces great software?

IMHO, this is the truth nugget. Where the rock stars go is where you will find great software and the company culture that works is "stand back and let the rock stars do their thing". That culture can't exist in the enterprise. It doesn't even exist in most startups. I never wanted to believe this but after years of first hand experience, I find the conclusion inescapable.

Agile methodologies can't extract excellence from mediocrity, but they might allow half-decent software to be made without the use of rock stars, by plugging the holes that mediocre developers typically fall into. But is half-decent software even worth making?


Where the rock stars go is where you will find great software and the company culture that works is "stand back and let the rock stars do their thing". That culture can't exist in the enterprise.

XBox is an interesting counter-example. Microsoft completely let that team do their thing. They had total freedom to design and market it. Notice they didn't even have to call it Microsoft Gaming Console.

Source: Chris Capossela's talk at the Business of Software 2009 conference.


From what I've read, Microsoft's original corporate culture wasn't that far removed from "stand back and let the rock stars do their thing." For example, Spolsky likes to talk about how management's job at Microsoft was to keep people from bothering the developers so they could keep writing code. I think he also told anecdotes about developers coming into a manager's office to resolve a technical dispute, and the manager responding "Why are you asking me? Of the people in this room, I'm the one who knows the least about it." Then the manager sent them away to work it out between themselves.

That attitude probably had a lot to do with Microsoft's success.


   Is that uptime [Flickr's] acceptable to a bank?
I would love a Flickr-like bank. Banks are only open about 8 hours out of 24, and (at least here in NZ) even when I do an online money transfer, it only goes through in their batch that midnight. I would love a front-end to our bank system that did things right, especially instant and easy transfers.

I guess you could say PayPal tries to do this, but you have to get money in and out of PayPal, and they have high fees.


Yeah, I've only ever found one bank that had a web interface that let you do immediate transfers 24/7 between accounts and credit cards. Unfortunately it's a 2 branch credit union 300 miles from where I now live, but I still hang onto that account just because.

I know they're not playing with a lot of resources, so I'm always amazed at how much better they are at doing EVERYTHING than Chase, etc.


Ok but I think his point is that in the enterprise software space even the survivors fail in the sense that users hate them.


He then oddly goes on to talk about the buy vs build argument for things like Oracle/SAP financials/ERP - which are deeply hated by those that use them (granted, may not be able to make that kind of BigCo HR/financial stuff fun !) - but what is interesting is that those systems are not really off the shelf. They are usually multi-million dollar "implementations" and sometimes (seriously!) "re-implementations" of them - which cost orders of magnitude more then most IT bespoke projects - often due to licencing costs. This makes me think that the numbers make things look better then they are. For example: if the cost of an setup for SAP financials is 2mil, and 400K "implementation", then a blow out of 100% in the "implementation" cost is not, percentage wise, as big a deal as if it was a bespoke project (ie sans the licence cost) - as the bulk of the cost is a fixed licence.

This is also why building (as in skyscraper style) blow outs don't appear as huge - they labour cost is generally a relatively smaller portion of the project cost then the materials compared to IT projects.


You know, a lot of banking systems still shut down for nightly "maintenance".


True, but it is a known quantity. I mean failwhale "unscheduled maintenance"


This is right on.

I've worked at big and medium size companies in the past decade. I worked at a startup that was acquired by a big company. I've implemented enterprise systems from design through support for governments and dow30 companies.

Those of you that think that these systems are symptom of the corporate culture are wrong. I'm not defending the poor corporate cultures that I saw. I am defending the argument that better technologies and processes could lead to better systems.

From a process standpoint the problem is it's counter intuitive to most management that smaller, higher paid teams using obscure* technology could out produce the highly polished cogs-in-the-machine process. For example, this usually isn't true for (most) mechanical engineering projects. We all know it as fact on HN but it doesn't apply to many enterprise industries. I remember times where as a team we were denied the ability to create scripts or tools to do work for us. Or we were denied the use of open source tools. Instead we spent days doing it manually. Then, when it needed to be done for the next project, we would again do it manually. We couldn't push a small but critical change without completing a multi-day code move process. When we did write scripts, IT had to approve them which meant you were at their mercy. There were many more of these sorts of poor process decisions. And there were smart capable programmers working there, often pushing to make the right decision.

When I run into old co-workers many still believe it's impossible to implement enterprise code in something like ruby, python, lisp, haskell, etc. It's Java or C#. Common reasoning - What happens if I need a new Clojure or Lisp programmer!? Or, where will ruby be in 20 years? We know today where Microsoft support will be in the future. Look at Microsoft's support plans. Etc.

I'll add a little bit of my own opinion in here too. It is my hope that eventually there will be a few elegant generic enterprise solutions where DSLs are implemented for business analysts. It's insane how much of this work is still done by programmers. How many more times does a programmer have to code a { main-menu customer organization admin ... } window or custom approval process, etc. (Yes, I know this has been attempted often without success, it's still a good idea. And I think in the end it'll happen. It just takes time to figure out what will work)

* obscure to management.

EDIT: Grammar


> "I remember times where as a team we were denied the ability to create scripts or tools to do work for us."

When people say 'corporate culture is the problem', that's what they're talking about. Of course better technology and processes can help. But if corporate culture actively prohibits anything new/unknown/non-standard, what is the true problem? That the tools aren't being used, or that the tools can't be used?

You're not actually disagreeing with anyone who points to corporate culture as the culprit. You're just not at the root cause yet.


Step 1 is to get out of the mindset of using the same solution for 20 years. You often don't, and when you do, it almost always costs you more in lost opportunity than it saves.

Yes, yes, I know many many legacy systems are out there that are decades old. And every single one of them represents major lost opportunity.

Release early, release often isn't just for startups.


Common reasoning - What happens if I need a new Clojure or Lisp programmer!?

That's a reasonable argument when read as "What happens to our Erlang+Ruby crystal palace when management takes over and talented devs won't touch us with a ten foot pole and we are forced to hire Java monkeys who think lambda is something you barbecue on a skewer?"


Lot of good points you make here.

One enterprise that I spent some time at has some systems that are very old and have been continuously running. Some are on the order of 40 years old. It wasn't until the 90's that they finally decreed that new assembler code was not allowed. And lots of COBOL.

So as functionality has been added to these systems, the word "Legacy" doesn't quite give the feel of the sheer size of them.

I would agree that management resists smaller more talented teams, and add that the obscure tag is supplied by gartner, which is about as risk adverse as one can imagine.

And as you point out, there is the whole change control issue. I remember two hour meetings weekly for a small division in which changes had to be fought for and coordinated.

I don't mean to sound pessimistic, but let's say that Sun or other vendor does exactly the right thing and offers the right kind of solvent for the enterprise monolith. The systems that exist are so interwoven into the fabric of the business that not only do the IT portions of the problem change, but also the structure of the business itself. Not clear that Sun or IBM or MS can add much here.

A useful thought experiment here is to consider the spreadsheet and how, as a significant piece of new technology and forward looking tool, did change the way that business is done. Did it reach down to the core of the business to alter the monolith?


Maybe I am misreading, but you seem to be contradicting yourself. You say that it isn't a cultural problem, but then describe how what I would call cultural problems ! I agree with the gist of your post though - and yes, choosing appropriate technology would help, a lot, but it is a cultural problem preventing it ! (culture of management perhaps?).


Yeah you are right. It reads as a contradiction. I thought of that when I was posting but couldn't reconcile it at the time. Let me clear it up.

What I meant was it's possible to implement such a system in the framework of a dow30 type culture. The few times that I've seen it implemented, management supported it. But without someone just taking the initiative and doing it, it doesn't happen. The average newly graduated junior enterprise developer with an average lead enterprise developer probably can't pull it off. So it could be enterprise just has to catch up and they are just not there yet. Maybe that makes it a cultural problem, I'm not sure.

No doubt, management has yet to see the light. They want data to show that change in their process and new technology works before they commit. There should be a place for it when management finally realizes it's not a fad but a trend.


The thing is, those companies are not technology companies. They would see such practices as things that software companies would do (and they don't believe the potential savings - as often that is all they are, savings, not new revenue). Consequently, the talent goes to companies that are technology companies, and the cycle continues.


Do you have personal experience with mechanical engineering projects that suggests that most work best with large, lowly-paid design teams using management-friendly technology, or are you just assuming that they would?

I'm just curious, because I would have thought that this would work about equally well on software and mechanical engineering projects.


Do you have personal experience with mechanical engineering projects that suggests that most work best with large, lowly-paid design teams using management-friendly technology, or are you just assuming that they would?

Yeah I think they work differently in general. From my experience they do not benefit from iterative development the same way we do. They do but they iterate when solving the problem, usually on paper. If they make a design mistake and it's in production or prototype phase it's much more costly to address. And because of this development cycles are longer and a lot more time is spent upfront.

I've watched and assisted mechanical engineers that spent days working out the math on the reliability of a class of lasers (and belts, motors, fans) subjected to sitting in a warehouse, followed by periodic motion and change in atmospheric conditions then launched (missile launch or space launch).

After the analysis was complete I could code up the math in a matter of minutes. Then expose it to all sorts of boundary cases and add it into a mathematical model of the entire system. Sometimes, at this point, it would go back to analysis to address some new found issues.

I have great respect for such engineers. Solving real life problems, like this, thoroughly is worlds away from what most programmers enjoy - creating.

EDITED: Clarifications ~


Yeah I think they work differently in general. From my experience they do not benefit from iterative development the same way we do. They do but they iterate when solving the problem, usually on paper.

This is partly true in my experience and you will struggle to find a Mechanical Engineer that doesn't know what Kaizen is. With engineering iterating is expensive so iteration normally would happen at design stage. However, when it comes to Construction there are different levels of iteration, for example the Consultant will produce a set of specs and drawings but is up to the specialists to produce the detail studies and shop drawings.They will also spent months with mock-ups in order to confirm compliance with requirements.

However, I think every Engineering discipline (and I include Software Engineering here) can learn a lesson or two from Sanitary Engineers:) Systems produced by Drainage Engineers virtually outlive the buildings they are installed, large part of such systems are standardized (use your standard libraries). They run for almost ever and anywhere (ie, quality assurance brings reliability), they require almost no maintenenace (minimize bugs). They have very well defined objectives (no feature creep). If they don't perform well they stink so someone notices and does something about it! They employ 'set of primitives or principles' like gravity that always works, they are simple and construction can be delegated to cheaper labour (outsourcing). It is not a very glamorous field but civilization progressed when this field progressed (social responsibility). They are not expensive to design and produce. They use standard details (ie DRY principle) and I have seen .. a lot of these Sanitary Engineers debating if their discipline is ... art, engineering or science. Rings a bell?

Mechanical Engineering like IT - should be IE Information Engineering:) - has is own disasters and Management problems.

It is a very creative field and the only unfortunate thing is that it does not benefit from - open source. In this respect anyone in Software Engineering, should count themselves lucky!

Qualifier I hold a PhD in Mechanical Engineering


They do but they iterate when solving the problem, usually on paper. If they make a design mistake and it's in production or prototype phase it's much more costly to address. And because of this development cycles are longer and a lot more time is spent upfront.

This I agree with. Most engineers would kill to have the kind of feedback cycle that we have. (Disclaimer: my experience, such as it is, is with electrical rather than mechanical engineering.)

Solving real life problems, like this, thoroughly is worlds away from what most programmers enjoy - creating.

This I disagree with. Of course (mechanical) engineers enjoy creating - for example, read Florman's Existential Pleasures of Engineering.


I worked in the US aerospace engineer as a stress analyst: I can confirm that bit about feedback cycles. For missiles, the feedback cycle is years.


Yeah I think they work differently in general. From my experience they do not benefit from iterative development the same way we do. They do but they iterate when solving the problem, usually on paper.

Yup, I've seen it too, from my experience. I was once writing some desktop engineering apps, to help mechanical engineers / designers do analysis on materials etc. They spend enormous amount of time doing iterations/studies on paper. It is worth it, because it usually saves lots of money and time down the road. They don't have the luxury that software developers have.


Everything tbray says is correct, but I had the feeling I'd heard this before. Then I realized, this post is a combination of Gall's law "all working complex systems evolved from working simple systems", and Joel's advice of "never outsource core competencies"

http://en.wikipedia.org/wiki/Galls_law

http://www.joelonsoftware.com/articles/fog0000000007.html


"Of course, we’re not in the Promised Land yet. I’m actually surprised that Salesforce isn’t a lot bigger than it is; a variety of things are holding back the migration to the utility model"

Many organizations still attempt to 'build' their own highly customized CRM system using SFDC as a starting point. It doesn't matter if they buy a system off the shelf, the IT organizations for big companies will manage to make these projects just as unsuccessful as their built from scratch counterparts.

The success of these implementations, and in turn the companies that sell the systems like SFDC, are dependent on how willing their own customers are to change their internal processes to match what the system does well. The biggest project failures for these systems occur when these organizations refuse to change how they do business and instead spend millions of dollars in customization fees to the vendor. SFDC isn't immune to the failures that the author describes, they just experience it from a different perspective.


I think the author is missing the point. Large, overly complex, ineffective systems a symptom of corporate culture. It isn't something that you can just fix without fixing the underlying cause.


Exactly - it is NOT a technology problem. Well other then vendors capitalising on the corporate culture in selling "solutions" which just make things worse.


Been there for quite some time. I can't agree more - a lot of things are wrong in enterprise world. Couple big ones from my experience:

- Process-centric culture. Most enterprises have processes for making processes. Process always complicates everything. And most people inside will blindly protect the process, so it's really difficult to question it.

- Artificially complicated solutions from "big vendors" (read Oracle, IBM etc). They keep adding new "features" over an over only to have something to sell constantly. I have never seen an enterprise software being simplified. A lot of people inside enterprise will again protect these since their jobs may become redundant if simpler solutions are introduced.


And this is why I prefer to work for small companies/startups vs. large companies.


He definately has a point, but then again to make an accounting package as simple as twitter you'd have to make the tax code as simple as twitter too :-)


Really, the problem boils down to this:

Enterprise software is not written for end-users. It's written for companies. It's not marketed to the people who will actually use it and it's not paid for by the people who are going to use it.

So it should come as no surprise that the incentives and motives that shape enterprise software are completely orthogonal to the incentives for creating good software.


Even private failures (outages, data loss) involving enterprise systems can cost an enterprise business gigantic sums of money directly or in fines or in customer perception or (for some applications) can put patients' health and lives at risk. Public outages can involve fires and building collapses.

Various of the customers I've worked with over the years have calculated their direct losses at thousands or more dollars per minute of outage.

This is a whole different world view from a web outage or a PC reboot; of how to upgrade an application or a server environment without restarting it. Up-times here can be measured in decades.

Your development processes and your management accordingly become conservative. Sometimes hidebound. This is a whole different world from what most folks are accustomed to working in, too. This in terms of scale and cost and speeds and feeds involved; thirty million lines of even the best source code around is not the most agile of platforms ever constructed, but it can be critical to the operations and profits of a business.

There are numbers of (quiet) successes here and there have been many successful COTS migrations; I've worked on a number of these migrations and application replacements over the years, and these projects just don't get the press attention of the screw-ups. And yes, I'm aware of both successes and screw-ups. And of one box that was up and running for seventeen years.


So the silver bullet is to reduce the cost of failure? So that you can allow failure, which seems to be a necessary requirement for fast iteration and hence fast development?

i.e. the hard thing to do is to find a way to slice off 5 or 10% of the problem space so you can have a fast-iterating pilot system which can be a little flakey.

As it beds down into maturity after it's period of rapid development (and higher-than-acceptable downtime), the new system can then be rolled out to the wider audience.

So the trick is to designate an "early adopter" section of your enterprise (maybe one branch office), give them a free reign and then scale their good solutions enterprise-wide.


The easy stuff usually gets sliced off and dealt with.

The folks working on these boxes are Not Dumb.

When you're dealing with financial transactions or stock trading or with medical records, you're either able to deal with the fire hose of data, or with the uptime requirements, or the scale of the data. Or not.

These servers and clusters are very different than what most folks are accustomed to dealing with; vastly larger servers, application environments, communications, storage. And unfortunately for upgrades and migrations and incremental work, usually also involving (often fragile and ill-documented and poorly understood) interconnections all over the place. And critical.


> When you're dealing with financial transactions or stock trading or with medical records, you're either able to deal with the fire hose of data, or with the uptime requirements, or the scale of the data. Or not.

But you might find that there is one stock market with a smaller trade volume than the others. Or one of the 50 hospitals in your district is smaller.

If you can peel off a bunch of "early adopters" and make them guinea pigs, you can get some of the benefits of rapid development. You either need the buy in of the early adopters or be able to push them around.

This happens all the time. The UK used to test their laws on the Scots before rolling them out to the rest of the country.


The folks working on these boxes are Not Dumb.

No-one's saying they're dumb. They're saying they're Doing It Wrong. And the things you're talking about - banks and stock - are edge cases of the enterprise world. I would actually argue that most stock markets are actually very well implemented. Banks usually have competent IT too, since IT is a core competency of theirs.

These are edge cases though; most corporate IT is orders of magnitude less sexy than real time trading systems.


They estimate the costs of downtime, but do they estimate the cost of having bad systems?


No, they don't. And what is not said here is that they don't acknowledge failures, even when they are obvious, thus depriving the organization a chance to learn from the failure.


Then again there are businesses which don't lose money on a per minute basis. Consider an insurance company. Its cycles (except for web portals) are monthly. And how much money does your bank lose when you have to stand in line for their system to reboot? Maybe not all that much.

Then there are businesses, like ITA which is used to sell seats on airplanes, where there could be an effect of cascading downtime could cost money all along the line.

Much more likely is the project failure cost. Dozens or hundreds of programmers working on a large project that gets cancelled. This is the real nub of enterprise software risk.


If my banks ATM's start rebooting more than about once then I'm finding a new bank.

The Fail Whale is fine for twitter - less so when there's real $$ involved.


The design of a system will reflect the organization that creates it.


The time between having an idea and its public launch is measured in days not months, weeks not years.

I like and totally agree with most of the article but this myth annoys me. Exactly none of the the huge complex web apps he namedrops could have been made in under a week. And even if they were, that time omits what must have been an extended period of contemplation and conceptual design by a definite "rock star" level programmer.


Sure, creating an entire new product can take a lot longer, but almost everyone working in web startups has the experience of launching a new feature (for an existing site) the same day it was first suggested.


Sure, if it's a small feature that fits in well with the existing structure. But Tbray's talking like people can just knock up facebook or basecamp from scratch, in a matter of days. No-one can do that. Oh yeah sure, you might be able to make a decent stab at re-implementing one of them in a few days but it's a different story when you have to just make it up.

Anyway I do agree with him generally, just that I see this stuff a lot. Good web developers are stunningly productive compared to your average .NET corporate seat-warmer but they're not magic!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: