“The Right Tool For The Job” Isn't

Aaronontheweb · on Nov 13, 2013

Some additional thoughts for the author:

- Training / warmup overhead for new employees: the amount of time a new developer needs to warm up when joining a company grows exponentially with the number of different technologies / tools used on the job. It's not just learning Redis - it's learning _your Redis_.

All of the configuration details, deployment procedures, setup, integration with other services, data model, etc specific to your company. This can be a killer for small companies, where everyone needs to know a little bit about everything...

- Future-proofing: the greater the number of different technologies used, the higher likelihood of future driver / server / runtime / etc incompatibility issue. A future release of Cassandra which has a mission-critical feature for your business, but the CQL drivers that support it all require a later version of Node than your background workers currently run. Unfortunately, if you upgrade to a later version of Node then your Redis driver won't work because it hasn't been updated to support the past 6 months of Node releases due to breaking changes Node.JS introduced into a critical security update and ad nauseam...

There will always be a level of this even in a tightly integrated stack, but you're setting yourself up for more frequent headaches the greater the number of technologies you have to maintain in parallel.

- Operational complexity: beyond the basic stuff like service outages, there's dealing with less catastrophic but more frequent ops such as monitoring and configuration management. While there are a number of great generic solutions for monitoring the health of processes, services, and VMs, there's a level of requisite application monitoring that needs to be deployed for each service too. Need to monitor the query plans and cache hits for PostGREs, the JMX metrics for any JVM application, compaction and read/write latency for Cassandra, etc...

Setting up that level of monitoring and _actually using it_ on a day-to-day basis for a large number of different platforms is expensive and cumbersome. If you're Facebook, it may not be a big cost to manage. If you're a 6-person engineering team at a startup, it's a bitch.

Great article!

nottombrown · on Nov 13, 2013

Yeah, tool-specific monitoring is always forgotten. There was a great article I once read on how your second datastore inevitably turns into a ghetto. People overestimate how many things they can keep track of.

Touche · on Nov 13, 2013

Yeah, I agree, I wince when someone uses that phrase. The truth is that most technology choices, whether languages or databases, are designed to be general purpose. They are designed to fill as many needs as they can. Because no one wants their product to be a niche.

tieTYT · on Nov 13, 2013

The author should really change that font contrast. It's almost the same color as the background.

shuzchen · on Nov 14, 2013

Failing together usually only make sense for the simplest of apps, not for anything that has a lot of moving parts. If my pub-sub message queue falls over, I'd rather the web workers still stay up so visitors can see the site - they'll just be without realtime notifications. If the background workers die, those tasks should stay on the queue, but everything else still runs as normal.

So really, the math works out such that if you fail together, you'll have X amount of downtime. If you fail seperately, you'll have X*3 amount of degraded service.

coldtea · on Nov 14, 2013

This assumes that only one of X parts is essential and the others can just fail and you only get some degraded service -- which is not always (or even often) the case.

E.g if the DB fails, you're down. If the web server fails, you're down. If you're a photo service and the file storage service fails, you're down.

tomblomfield · on Nov 13, 2013

I've had similar experiences with poorly-architected SOA.

If your services are inter-dependent, your uptime probability suddenly becomes probability^n, for n services.

dw5ight · on Nov 13, 2013

ha, so true. much like "judgement" is the best technical skill you can hire for - far too many tech wizards spend 6 wks for +5% perf gain :(

zalew · on Nov 13, 2013

> too many tech wizards spend 6 wks for +5% perf gain

depends how much is that 5% gain worth. yes, judgement.

nottombrown · on Nov 13, 2013

Author here. Added a clarification that services can go down for reasons other than data center issues. Who knew?