There’s a broader problem here, which also applies to the C++ ecosystem, which i...

codeflo · on Nov 17, 2022

Having been bitten by that exact problem in C++, I think the original sin is to treat stuff like copy elision as a mere optimization, instead of a semantic guarantee.

klodolph · on Nov 17, 2022

The C++ committee recognized this problem. As of C++17, copy elision is mandatory. Several forms of it, at least.

dathinab · on Nov 17, 2022

Rust has also recognized the problem from a very early stage one.

For example this is why there was a `box` operator in early rust.

And e.g. placement-in like APIs had been in the works for years, it's just that no satisfying and sound solution has been found (but multiple solutions which initially seems sound).

Which is why we currently are in a "hope the optimizer does the right thing" situations (through it is pretty much guaranteed to do the right thing for a lot of cases around Box).

But then it also isn't the highest priority as it turns out a log of "big" data structures (lists, maps, etc.) tend to grow on the heap anyway, the the situation that someone run into debug builds crashing because of a big data-structure is pretty rare, and it also crashing of release build is even rarer. Some of the most likeliest ways to have a too-big data-structure on the stack is: Having some very deep type nesting. But then such types are (in the rust ecosystem) often seen as an abuse of the type system and an anti-pattern anyway. Through it can be a fun puzzel, and some people are obsessed which bending the type system to their will to create DSLs or encode everything possible in the type system. But I have yet to see commercial projects with mixed skill level team members where using such libraries didn't lead to productivity reduction on the long run (independent of programming language).

klodolph · on Nov 17, 2022

It’s just a bit of a surprise, and Rust hasn’t ironed out some of these surprises. I’m sure it will get fixed eventually.

Yes, you can give examples of cases where unusual code (like deep type nesting) can create these large data structures, and you can call it an anti-pattern. But Rust is also pitched as a C++ replacement for greenfield projects, so you have all of these C++ programmers who are used to being able to “new” something into existence of any size, and then initialize it. A series of design decisions in Rust has broken that for objects which don’t fit on the stack.

I’m satisfied with the explanation that “no satisfying and sound solution has been found” and I’m also satisfied with “Rust developers haven’t gotten around to addressing this issue”. I’m not really interested in hearing why some people who run into the same issue are making bad decisions.

estebank · on Nov 21, 2022

One piece of context I want to add, although there's no language construct for placement new, the unsafe `MaybeUninit` allows you to write partially to memory, and a macro[1] can be written to make almost seamless to use.

[1]: https://crates.io/crates/place

codeflo · on Nov 17, 2022

> But I have yet to see commercial projects with mixed skill level team members where using such libraries didn't lead to productivity reduction on the long run (independent of programming language).

Mixed skill team or not, I really don’t see why Box<[u8; 1024 * 1024]> should be something the language struggles with.

dathinab · on Nov 18, 2022

EDIT: I realized the TryFrom is just implemented for Box<[T]> not Vec<T> but you can easily convert a Vec<T> to a Box<[T]>. I updated the code accordingly.

vec![0u8; 1024*1024].into_boxed_slice().try_into().unwrap()

isn't that terrible to use

her as a function:

fn calloc_buffer<const N: usize>() -> Box<[u8; N]> {

   vec![0u8; N].into_boxed_slice().try_into().unwrap()

}

I you want to rely a bit less on the optimizer using `reserve_exact()` + `resize()` can be a good choice. I guess it could be worthwhile to add a helper method to std.

inferiorhuman · on Nov 17, 2022

Agreed – but why would you want to box an array instead of simply using a Vec?

nayuki · on Nov 17, 2022

You can save memory by having fewer fields. This can matter when you have lots of small arrays.

Vec<u8> has {usize length, usize capacity, void* data}. Box<[u8]> has {usize length, void* data}. Box<[u8;N]> has {void* data}.

inferiorhuman · on Nov 18, 2022

For a typical use case that seems like a rather extreme optimization, no? If you have a lot of objects with many small arrays and you're keeping them in a Vec, they'll be on the heap. If you're dealing with a bunch of small parts of a big blob of binary data, you'd use slices and not create new arrays. If you're on an embedded system you're not likely to have an allocator anyways.

(without trying to be too argumentative) right? Or?

Edit since I've been throttled:

  For example it can make a difference between passing values per register or per
  stack in some situations. … But then for some fields where C++ is currently very
  prevalent it might matter all the time.

That's an interesting one I hadn't thought about (and I didn't realize that the register keyword was deprecated in C++17). In a rather broad sense I hope Rust catches on in the kinda niche stuff where C++ is often popular. For example I've only done a little bit of dabbling with Rust in an embedded context but overall I thought it brought a lot to the table.

vgel · on Nov 18, 2022

In a system at $WORK I recently optimized a structure from String to Box<str> (similar optimization to remove the 8 byte capacity field) and saved ~16Gb of memory. Granted, the program uses 100-200Gb of RAM at peak, but it still was a nice win for basically no work. It's also a semantic win, since it encodes "this string can't be mutated" into the type.

dathinab · on Nov 18, 2022

yes but also no,

In some situations "optimizing smart pointers" to just be a single pointer size (Box<[T; N]>) instead of two pointer sizes (Box<[T]>) or instead of three pointer sizes (Vec<T>) can make a difference.

For example it can make a difference between passing values per register or per stack in some situations.

Or it could make the difference of how many instances of the boxed slice fit efficiently into the stack part of a smallvec. Which if it's the difference between the most common number fitting or not fitting can make a noticeable difference.

Through for a lot of fields of programming you likely won't opt. to do such optimizations as there are more important things to do/better things to spend time optimizing at. But then for some fields where C++ is currently very prevalent it might matter all the time.

tialaramex · on Nov 17, 2022

I guess what they mean is that the Vec would allocate heap space, and you could steal the allocation for your object to make the Box? You'd need to create this MyType manually and then tell Box what you made unsafely with like Box::from_raw()

It feels like a better way to do that directly with Box is Box::<MyType>::new_zeroed() which will make you a Box<MaybeUninit<MyType>> full of zero bytes. If MyType is definitely valid when made entirely of zero bytes and you're sure of that, you can unsafely assume_init() to have the MaybeUninit resolve to an actual MyType.

[[ If you lied, now everything is on fire, I did warn you that you need to be sure and it is an unsafe function ]]

If MyType is very much not valid if consisting entirely of zero bytes well, new_uninit() gives you memory in unspecified (must not be read) state, you can properly initialise it and then assume_init() as before - but all the extra work kinda sucks, and in either case clearly it would be nicer to just write what you meant and have it work.

klodolph · on Nov 17, 2022

I think the commenter made a guess that I was boxing an array, which is a good guess, it just happens to be wrong in this case.

Maybe that will work in the future—I don’t use nightly Rust, so for now, new_zeroed() won’t work. The basic problem is “I want to allocate something large on the heap” and it doesn’t seem like I should need to use nightly builds or unsafe{} to do it.

sli · on Nov 17, 2022

    let heap_value = vec![the_struct];

Based on another comment addressing this, I don't think the original commentor was making assumptions about the shape of your data.

klodolph · on Nov 17, 2022

This doesn’t actually work, it will still overflow the stack. The vec! macro will just copy its arguments into the heap; the arguments are still on the stack to begin with.

tialaramex · on Nov 17, 2022

> I don’t use nightly Rust, so for now, new_zeroed() won’t work

That's a completely fair observation. The main thing I want stabilised is a single niche for custom types. I would take more if offered but experience says that every extra little thing doubles the discussion time, so, one niche is all I need, and Rust guarantees this exists in some form so even if a later mechanism does - say - fancy non-contiguous niches, I just want one value ASAP.

https://github.com/rust-lang/rfcs/pull/3334

> “I want to allocate something large on the heap” and it doesn’t seem like I should need to use nightly builds or unsafe{} to do it.

The former makes sense to me, the latter (a requirement to use unsafe) I can see there can be cases where the compiler has to do a lot of contortions to safely but optimally mint the type in place in the heap and just writing the unsafe case is reasonable. I don't know anything about your type so I can't judge.

chrsig · on Nov 17, 2022

> The idea of using a Vec would be nice—if only the boxed item were an array! It’s a struct, you see…

Vectors of length 1 are still vectors :)

sqeaky · on Nov 17, 2022

If you hate relying on optimizations in principle, I have nothing for you, but if you pragmatically want your debug build to be more like your release build, then there are options.

All the major C++ compilers support some variation on the idea of "Release with debug symbols". If you are using cmake or another meta build there are usually default set of options for this. If you are making your own build scripts then you might just add -d and -O2 to your Gcc of Clang flags.

The debug symbols will still consume space which will impact performance, but that is not likely to be a huge issue in all but the tightest performance regimes. And all the optimizations should be there.

klodolph · on Nov 17, 2022

There’s not an underlying principle here, just trying to avoid nasty surprises.

Many optimizations are in practice unreliable—they are buried in the depths of a compiler and not part of the docs, it may be difficult to find out what conditions are necessary for the optimization to work, you may find that an optimization stops working when you update your compiler, you may find that changing a seemingly-unrelated piece of code breaks the optimization (maybe some function is no longer inlined for various reasons), or you may use a different compiler.

So I prefer to write code that works correctly without optimizations. It’s not a hard rule, but in this scenario, I would prefer to rewrite the code—and this happens to be annoying here.

germandiago · on Nov 17, 2022

> common debug configurations will emit a function call for std::move()

This is bbeing fixed in clang I think at least. It will be treated as an intrinsic not as a function and I recall for forward something similar.

gpderetta · on Nov 17, 2022

fwiw, GCC now has -ffold-simple-inlines exactly for this issue.

klodolph · on Nov 17, 2022

That’s not what -ffold-simple-inlines does. The -ffold-simple-inlines flag simply removes debugging information for certain inlined functions. It doesn’t affect whether the function is inlined in the first place. The result is that debug builds may have a smaller amount of debug information, but the code will be the same.

gpderetta · on Nov 17, 2022

That's not my reading of the docs, which explicitly talk about folding. Also simple tests shows that it does indeed inline the call even in debug mode.

In addition to inlining it also does also remove debug info.

klodolph · on Nov 17, 2022

You may be right—I was reading the release notes, and the actual docs go into more detail about what the flag does.

professoretc · on Nov 17, 2022

There's also the `artificial` attribute which instructs the debugger to skip through marked functions.