> Invisible cliffs in your code that it will fall off at some unknown point in the future are the antithesis of engineering.
I think this describes the general shape of problem seen in any engineering domain. After a certain point of time (i.e. unknown), we can no longer guarantee certain properties of the system. This is why we add all kinds of qualifiers and constraints in our discussions with the customer.
Certainly, you can make some predictions about how tables will grow over time, but there are also some potential headaches that can be incurred if you arbitrarily add indexing. "We might need to expire records to manage growth" so you go ahead and index that CreatedUtc column. Turns out, the usage patterns of your app changed such that that table is now experiencing 20x more insert volume than you could have ever anticipated and your prediction is now a gigantic liability. With 20x insert volume you would have _never_ indexed that column. You would have used some alternative path involving temporary tables, etc. Now, you are stuck with essentially same problem - a judgement call at 2am regarding whether or not you should drop an index while the rest of your team is in bed.
Since I am unable to predict the future I feel like waiting until the actual problem arises and then dealing with it at that moment is the best option. Strategically, not having an index will likely fail more gracefully than if an index is bottlenecking requests during peak load (i.e. instantaneous vs cumulative data volume).
The key to sleeping at night is to add metrics and alarms near those cliffs or 'edges of the map' so they aren't invisible anymore. It's hard to anticipate every angle to this but in any case where you're making this kind of assumption or tradeoff it's a good idea.
A simple example is setting an alarm on your max load test traffic. When you get that alarm you know your system is now operating in unknown territory and it's probably time to do another load test or at least take a close look at scaling.
Largely agree with this take, it often becomes "add an index because query might be slow" without much discussion or trade offs around query patterns. There's a lot of "what if" over engineering that happens where you end up in hypothetical scenarios.
Look at your real world use cases, query patterns and think hard about it.
I think this describes the general shape of problem seen in any engineering domain. After a certain point of time (i.e. unknown), we can no longer guarantee certain properties of the system. This is why we add all kinds of qualifiers and constraints in our discussions with the customer.
Certainly, you can make some predictions about how tables will grow over time, but there are also some potential headaches that can be incurred if you arbitrarily add indexing. "We might need to expire records to manage growth" so you go ahead and index that CreatedUtc column. Turns out, the usage patterns of your app changed such that that table is now experiencing 20x more insert volume than you could have ever anticipated and your prediction is now a gigantic liability. With 20x insert volume you would have _never_ indexed that column. You would have used some alternative path involving temporary tables, etc. Now, you are stuck with essentially same problem - a judgement call at 2am regarding whether or not you should drop an index while the rest of your team is in bed.
Since I am unable to predict the future I feel like waiting until the actual problem arises and then dealing with it at that moment is the best option. Strategically, not having an index will likely fail more gracefully than if an index is bottlenecking requests during peak load (i.e. instantaneous vs cumulative data volume).