I think the behavior is that if you have a Vec of u128 (say 1000), filter that t...

vlovich123 · on Jan 18, 2024

Ok. I missed that there’s a filter step that compounds the problem. The more I read the less this sounds like a bug and more like application code is missing a shrink_to_fit and was relying on a pessimization.

That being said, it’s also not an unreasonable expectation on the user’s behalf that the size and capacity don’t get crazy different in code as innocuous and idiomatic as this.

I wonder how the open bug will end up training itself.

hedgehog · on Jan 18, 2024

Someone here linked an open ticket for this issue. In the comments at least one person made basically the same argument that holding on to a potentially large % of memory is a surprising sharp edge, meanwhile shrinking the Vec and perhaps allocating is unsurprising behavior. Requiring many additional defensive shrink_to_fit calls to avoid this problem seems like the wrong tradeoff but I don't write enough Rust to have a strong opinion.

vlovich123 · on Jan 19, 2024

The question is how much overhead would adding a check to determine to shrink if > % of freedom to all code that doesn’t need this optimization entail?

The reason it’s important to consider is that I can always add a shrink_to_fit even if it’s a sharp edge. I can’t remove the conditional within the std library even if I know it doesn’t apply. And adding explicit APIs to control this nuance is a bit much (whether through a dedicated collect_maybe_shrink function or as a new argument to collect to control shrinkage) and there are usability implications to complicating the UI. It may be that ultimately this should be fixed purely through documentation even though defensive shrink_to_fit sucks. Not all technical problems can be solved and it’s all trade offs.

Dylan16807 · on Jan 19, 2024

> The question is how much overhead would adding a check to determine to shrink if > % of freedom to all code that doesn’t need this optimization entail?

Once per collect? Damn near nothing.

hedgehog · on Jan 19, 2024

I'm guessing allocator time, memory usage, and cache residency are the major performance considerations. Vec knows what size it is so the comparison is cheap and in any case collect is already expected to allocate depending on what it is fed.

vlovich123 · on Jan 19, 2024

The comparison is "cheap" if you can speculate through it & even then isn't free. If your comparison is 50/50 on branching then you're going to pay a penalty every time you hit that code path. It's entirely possible that the map operation is going to dominate the check, I'm just highlighting it's not free and it's a good idea to validate the cost somehow first (but also recognize there are applications you're going to penalize without them realizing it).