Unrelated to post, but as you seem well informed in the field, would you agree that if a schema is not likely to change and is controlled as you put it, there is no reason to attempt to store that data as denormalized document?
Or at least as you suggest if required for performance the data would still be stored denormalized and where needed materialized / document-ized?
At my current company, there seems to be a belief that everything should be moved to mongo / cosmo (as document store) for performance reasons and moved away from sql sever. But really I think the issue is the code is using an in house orm that requires code generation for schema changes and probably less than ideal performance query generation.
But then I am also aware of the ease of horizontal scaling with the more nosql orientated products, and trying to be aware of my bias as someone who did not write the original code base.
> would you agree that if a schema is not likely to change and is controlled as you put it, there is no reason to attempt to store that data as denormalized document
As a general rule of thumb, yes. Starting with denormalization often opens you up to all sorts of data consistency issues and data anomalies.
> Denormalization is a strategy used on a previously-normalized database to increase performance.
The nice thing about starting with a normalized schema and then materializing denormalized views from it is that you always have a reliable source of truth to fall back on (and you'll appreciate that, on a long enough timeline).
You also tend to get better data validation, reference consistency, type checking, and data compactness with a lot less effort. That is, it comes built into the DB rather than introducing some additional framework or serialization library into your application layer.
I guess it's worth noting that denormalized data and document-oriented data aren't strictly the same, but they tend to be used in similar contexts with similar patterns and trade-offs (you could, however, have normalized data stored as documents).
Typically I suggest you start by caching your API responses. Possibly breaking up one API response into multiple cache entries, along what would be document boundaries. Denormalized documents are, in a certain lens, basically cache entries with an infinite TTL... so it's good to just start by thinking of it as a cache. And if you give them a TTL, then at least when you get inconsistencies, or need to make a massive migration, you just have to wait a little bit and the data corrects itself for "free".
Also, there are really great horizontally scalable caching solutions out there and they have very simple interfaces.
Thanks for your response. The comparison between infinite ttl cache entries and a denormalized doc is an insight I can't say I've had before and makes intuitive sense
Or at least as you suggest if required for performance the data would still be stored denormalized and where needed materialized / document-ized?
At my current company, there seems to be a belief that everything should be moved to mongo / cosmo (as document store) for performance reasons and moved away from sql sever. But really I think the issue is the code is using an in house orm that requires code generation for schema changes and probably less than ideal performance query generation.
But then I am also aware of the ease of horizontal scaling with the more nosql orientated products, and trying to be aware of my bias as someone who did not write the original code base.