Indexes are fast when they're built well and used often. Indexes are expensive (and paid for in triplicate via backup costs) when they are seldom or never used. Sometimes you just need to materialize a table temporarily, which of course you can do in the RDBMS as well, but sometimes the data sources are so scattered (or also ephemeral) that keeping all processing inside the DB system is a stretch.
But perhaps the most compelling justification is based on the DB systems familiarity on the team. Not everyone has the same level of SQL expertise and some of the visualization tools added to MapReduce systems and the source language itself are more familiar to them than the output of an EXPLAIN statement. Especially if the same pipeline is effectively hundreds of lines in SQL.
But perhaps the most compelling justification is based on the DB systems familiarity on the team. Not everyone has the same level of SQL expertise and some of the visualization tools added to MapReduce systems and the source language itself are more familiar to them than the output of an EXPLAIN statement. Especially if the same pipeline is effectively hundreds of lines in SQL.