We Just Undid Three Months of Dev work. Here's What We Learned.

stingraycharles · on Oct 7, 2009

I think the real lesson to learn here is to never assume anything. They assumed performance wouldn't be a problem, so they didn't test for it when prototyping, and they didn't make those tests part of their development cycle.

It's tough, these things, especially if you're talking about a feature you personally appreciate a lot. The article talks about the performance problems leading focus away from actively marketing these solutions, which makes me wonder: if properly marketed, would this have been a killer feature ?

The fact that they decided to get rid of it suggests no. However, are they putting the code in the freezer for a while until they fix these issues and re-release it ? Or is it simply a problem that can only be properly solved by giants like Google ?

acl · on Oct 7, 2009

They assumed performance wouldn't be a problem

We did pretty extensive performance tests, but not for long enough. We load tested tested for hours at a time, and the problems started to show up after days of production load and really compounded after that. It's probably a topic for another post, but the performance of database tables with a lot of churn (rows being frequently inserted and deleted) really degrades over time.

stingraycharles · on Oct 7, 2009

Guess that's my bad for assuming you thought performance wouldn't be a problem. :-)

fizx · on Oct 7, 2009

> ... never assume anything ...

Maybe you meant "Question your assumptions, often." You can't get anywhere constantly verifying things ahead of time. In the extreme case, imagine writing a test suite for printf just to be sure it works as advertised.

The way forward for a startup is to make only the right assumptions. Failing that, make as many assumptions as possible, but correct the failed ones in a reasonably short amount of time. Some minimal cost/benefit analysis on the assumption would also be good.

camccann · on Oct 7, 2009

Failing that, make as many assumptions as possible, but correct the failed ones in a reasonably short amount of time. Some minimal cost/benefit analysis on the assumption would also be good.

As a corollary to this, try to arrange that any failures will occur as quickly and obviously as possible. For instance: Component X, which is essentially a black box to you, will be a small but critical part of your project. You don't have time to extensively test it, so you make assumptions about its behavior. Ideally, you should start integrating Component X into the full project immediately such that it's heavily used in development and testing environments, so that any violations of your assumptions will show up incidentally to other work.

On at least a couple occasions, I've been bitten by assuming that a Component X (which I thought I understood) would do what I expected and thus leaving the final integration until near the end of the project. This sounds like an obvious, easily avoided mistake, but it's surprisingly easy to make in the heat of the moment.

toadpipe · on Oct 8, 2009

In my opinion, a lot of The Software Problem(tm) can be traced back to this tendency to underestimate the difficulty of using magic black box X. Maybe this is because most programmers never try to make a real magic black box X that will be used by random person Y in random situation Z themselves. If they did, they might have a little more respect for the fact that it's really fucking hard, the interface will inevitably have all sorts of corners, and the result is never going to be as magic as you would like.

NathanKP · on Oct 8, 2009

These are some really good points to keep in mind. I'm working on a fairly serious website system for a client right now and just reading over the article and the comments here makes me wonder if I should perhaps think ahead and make sure that I don't have to end up rewriting things.

At the moment I feel that my code is fairly future proof, but the truth of the matter is that when it comes to performance its hard to tell how things will work out in the long run.

megamark16 · on Oct 7, 2009

I wish I could send this to my boss. The functionality I'm working on right now was requested by a single client and is really only applicable to their unique billing situation. Why am I spending so much time on something that is going to have so little payoff, and yet which adds so much additional complexity to the system? But I've only been here a month and I don't feel like I can cast a dissenting voice. Maybe I'm just a coward.

chriskelley · on Oct 7, 2009

Being vocal about imperfections in the development cycle and suggesting improvements that matter to the bottom line is not dissent.

It's important to have open communication about this very topic - it improves the pipeline and keeps margins where they need to be. Be confident and prepared with data, and make yourself an asset to your company, not just a keypusher! :)

cmos · on Oct 7, 2009

When discussing the merit's of this feature be open minded that you may not have all the information they had in deciding to implement it. Perhaps this is heavily requested by the sales department or by other customers further up the pipeline. Maybe the company threatened to take their business elsewhere.

It's always less obnoxious to approach potentially illogical situations by giving someone the benefit of doubt. So, while you collect your 'data', also do some brainstorming on area's where the feature your adding makes a ton of sense, and might even open up new sales channels or markets.

Instead of assuming there's been some catastrophic mistake you must remedy, and instead of assuming that management puts no value on your time, try assuming that they have good reasons for it that you, especially being their for a month, might not have been fully explained. Quite often the problem is communication, not intent to waste money.

At the end, if all your effort is for seemingly insignificant return, smile and do an amazing job. Repeat until you grow weary and embattled, then leave for another job with a different set of problems.

chriskelley · on Oct 8, 2009

Absolutely. It's certainly something that needs to be a dialogue, not at all "I know something you don't, here is why you are messing up."

There are always circumstances that affect decisions that you may not know of. This is one of the reasons I am always encouraging artists/developers to be knowledgeable of their project at a higher level. The more you know about what's going on in the big picture, the more of an asset you can make yourself. It's also important for sanity! The OP sounds like they are stewing daily about disagreeing with the feature -- but if there is in fact a relevant reason, some of that burnout-causing heartache could have been avoided.

Dialogue and accountability two huge keys to the management/artist relationship.

kogir · on Oct 7, 2009

I don't know the details of your particular situation. That said, I've been in situations where that "single client" accounts for 75% of the user base and more of the revenue. Sometimes you give the client anything they ask for :)

mr_luc · on Oct 7, 2009

    The database operations on the nested data were 
    just taking too much processing power.

Given that this is Rails, and given that it's certainly SQL involved here, I just have to ask (and I know the answer is probably "yes") -- did you try implementing nested sets?

I ask because my experience has been that nested data is (with the kinds of nested data I've been handed, anyway) not a performance problem. Selects and updates of nested data can be as responsive as any range query on flat data, and when it comes to managing the performance of inserting new nodes, deleting or moving subtrees, there are a lot of options depending on what you want to optimize for (like spreading out the range from 1 to the max integer supported and periodically re-packing; that way, inserting leaves and any kind of deletion is as fast as with flat data).

I'm curious about what your nested data looked like. Sorry to get distracted on a minor point, but I'm intrigued! When I'm developing, I just always feel as though I should only worry when I start seeing data that has to be a graph and can't be represented as a tree, but as long as it actually is hierarchical then I won't have to worry about speed too much; but now I'm wondering if that intuition will bite me.

acl · on Oct 8, 2009

did you try implementing nested sets?

The big thing we needed to do was a rolling archive to progressively broader timeframes. As metrics come in, we keep every single datapoint for the first 6 hours. After 6 hours, data gets rolled up into 5-minute archive. Each datapoint in the 5-minute archive then contains avg, min, max, etc for all the points that lived within that 5-minute span.

The archiving carries on through progressively broader windows as time goes on -- a 10-minute archive, 1-hr archive, etc. This progressive aggregation is the only sane approach to storing the massive amount of data we get. And, it reflects the need for higher resolution for recent events -- it's rare you need to see what happened at one exact minute 6 months ago.

It was this progressive archiving that bit us, specifically as DB performance degraded over time with lots of insertions/deletions. Nested set didn't/wouldn't help with aggregation costs and degradation from churn during the archiving process.

Hope this helps -- I'm going to try to do a more technical post on this in the future.

mr_luc · on Oct 8, 2009

Huh, that's interesting. I guess I've never worked with really "churny" data like that before.

Sure, sometimes I've had cleanup/integrity/whatever tasks that run every few minutes, but the amount of records affected has always been pretty small.

That's an interesting conundrum. See, this is why we're all messing around with Cassandra et al; sometimes, in SQL, the answer is "don't do that", because it'd be too hard to tailor the db's behavior to suit your needs. Although frankly, with a design that deletes and updates a significant percentage of records in the system on a certain schedule, I can see any number of storage solutions having trouble.

That's interesting. I'll be thinking about this at "work" today. ;) I have a bunch of comments, but they're of the half-baked "oh, what about this?" variety.

fizx · on Oct 7, 2009

Haha, I'm doing the same thing right now. When one feature costs > 10% of dev time, and you can afford to get rid of it, do it! In my case, I was spending 50%+ of my time for a few weeks on stability issues caused by a feature that <10% of my users need. Bye!

smokinn · on Oct 7, 2009

On the other hand, punting on any hard problems just makes it that much easier for your competitors.

zaidf · on Oct 7, 2009

You mean hard problems that matter.

There are a lot of hard problems startups try to solve which no one cares about. That's the trap to avoid.

Fixing something hard does not inherently make your company valuable in the marketplace.

evgen · on Oct 7, 2009

The problem with this ideas is that you don't get to decide which hard problems matter, but your users will not really know if the feature matters until you have made a real effort at providing the feature. There is a particularly insidious meme going around (usually from the so-called "lean startup" crowd) that building a lame/simple version first to see if people like the feature is how you learn what your users want. The problem with this is that you never know if people don't like the feature because the problem you are solving is not important to them or because your half-assed "iteration" has led them to decide that you don't have the chops to solve the problem so they should look elsewhere. I can't even count the number of times I have seen one company introduce a poorly implemented version of a feature, pull the feature or let it languish (presumably "because our metrics show no one wants it"), and then watch as customers flock to another company that actually solved the hard problem.

Fixing a hard problem does not automatically make your company more valuable, but failing to fix a hard problem will never increase the value of your company.

zaidf · on Oct 7, 2009

your half-assed "iteration" has led them to decide that you don't have the chops to solve the problem

While this theory sounds good, it is disproven time and again by initial half-baked versions of sites that then go on to take off. Just check the original launch of YouTube, Digg, facebook.

Also, a HUGE idea coming from lean startup way is to invest very little in marketing until you have a product users like. You don't need to get one million users to tell you a product sucks. Often, 50 would do. Now if you are saying that 50 users writing off your product will doom it for its lifetime, the problem isn't the lean way it's that your market is too small. YouTube guys had very poor reaction to their initial site.

"because our metrics show no one wants it"

They have little idea on how to use metrics. Don't blame lean startup ideas for that.

ie. What lean startup would do is put up a button that looks as good as your best competitor can put up. Then see how many people click on it. What you measure is action until the click, not the engagement after the click to draw conclusions about the demand for that feature. Now if 1000 people are clicking on the link but only few are using it, chances are your product sucks. Take that insight and work on your product. Just one small example.

failing to fix a hard problem will never increase the value of your company.

If you are saying that you have to solve really hard technical problems to increase value of your company, I full disagree. Just look at the web2 companies that took off.

Craigslist did not take off because it solved a huge technical problem. Craigslist also has a lot of value as a company.

evgen · on Oct 8, 2009

> While this theory sounds good, it is disproven time and again by initial half-baked versions of sites that then go on to take off. Just check the original launch of YouTube, Digg, facebook.

It is easy to disprove the theory if you get to cherry-pick your examples. Would you like me to list the thousands of other companies that had a couple of poorly implemented features masquerading as a "beta" that were stomped into dust by others who worked a bit harder to do the job better?

zaidf · on Oct 10, 2009

Of course! I'd like to hear about them.

Btw, I don't consider using some of the most popular web2 properties as an example to be cherry picking. I'm curious to see your examples nonetheless.

edw519 · on Oct 7, 2009

This reminds me of one of my biggest internal conflicts: when to optimize.

"Premature optimization" has earned a negative reputation because it has a tendency to inflate dev schedules unnecessarily.

So I tend to just crank something, anything out just to have something. Once you can see what you have, it's often a lot easier to modify than come up with in the first place.

OTOH, I like to think that everything I build is a foundation for the next thing to be built upon it. I am constantly getting bitten in the ass by some grossly underperforming building block. If a prototype runs poorly a dozen times, it's a concern. If the same code runs poorly a million times, it's a disaster.

It's a constant trade-off. Get something running vs. build solid building blocks. Make a mistake one way and never release. Make a mistake the other way and have a time bomb to clean up. Sounds like OP has a lot of the same issues.

davidmathers · on Oct 8, 2009

Short version:

In our case, the move from flat data to nested data was the killer...We came up with a sweet way of storing the nested data and abstracting away most of complexities of dealing all kinds of data...However, the load on our database was far more than we envisioned...MySQL.

nestlequ1k · on Oct 8, 2009

Great, as a paying customer of scout they are telling me their going to now focus on getting new customers instead of servicing the ones they have.

Might make sense from a business standpoint, but probably something you dont want to advertise :-).

acl · on Oct 8, 2009

Actually we're also able to spend a lot more time now on things that really matter to our customers -- things that customers request, like more graphing options and better support for cloud instances. The cloud functionality is already available, and graphing is coming up fast. It's a real pleasure to finally have time to address these things, in addition to the sales and partnership efforts.

Also, performance is significantly better since we simplified the architecture (http://blog.scoutapp.com/articles/2009/10/01/simplify-get-an...), which benefits customers old and new!

amichail · on Oct 7, 2009

Just a meta comment about these lessons learned posts. The motivation behind these is to promote the product in question and one must therefore take them with a grain of salt.

Peer reviewed research is more reliable.

johns · on Oct 7, 2009

If you want to dismiss a post on that theory, you should remove the links to your products in your profile.

amichail · on Oct 7, 2009

There is no need to dismiss all such posts. Some may be interesting/educational regardless.

But you do need to keep in mind the primary motivation behind them -- marketing.

johns · on Oct 7, 2009

Almost everything submitted here has marketing as an ulterior motive. They're either marketing a product or themselves. But thanks for the reminder.