Hacker News new | past | comments | ask | show | jobs | submit login

> at least not in the sense that the actual content of the string was copied anywhere

...unless it's a short string within the limits of the small-string-optimization capacity.

I think what confuses many people is that a C++ move assignment still can copy a significant amount of bytes since it's just a flat copy plus 'giving up' ownership of dangling data in the source object.

For a POD struct, 'move assignment' and 'copy assignment' are identical in terms of cost.




The same is true of Rust. I have no idea why the author decided to print addresses only for C++ and not for Rust.

  // (1)
  struct Person {
      name: String,
      age: u8,
  }
  
  fn show(person: Person) {
      println!("Person record is at address  {:p}", &person);
      println!("{} is {} years old", person.name, person.age);
  }
  
  fn main() {
      let p = Person { name: "Dave".to_string(), age: 42 }; // (2)
      println!("Person record is at address  {:p}", &p);
      show(p); // (3)
  }
Its output is:

  Person record is at address  0x7ffcfb2b4e40
  Person record is at address  0x7ffcfb2b4ec0
  Dave is 42 years old


I feel like that's a pedantic detail. True, yes, but irrelevant. You may as well also point out that the return address is going to be copied to the instruction pointer when the constructor returns.


It's a real semantic difference, not a pedantic detail: It means that there is a practical reason that the moved-from object could be non-empty.

A few standard library types do guarantee that the moved-from object is empty (e.g., the smart pointer types).

For some others (basically, all containers except string), it is not explicitly stated that this is the case but it is hard to imagine an implementation that doesn't (due to time complexity and iterator invalidation rules). Arguably, this represents a bigger risk than string'e behaviour, but it's still interesting.


>It's a real semantic difference, not a pedantic detail

What's the semantic difference? Of course moving a class will involve some amount of copying. How could it be any other way? If you have something like struct { int a[1000]; }, how are you supposed to move the contents of the struct without copying anything? What, you take a pair of really tiny scissors and cut a teeny tiny piece of the RAM, then glue the capacitors somewhere else?


> how are you supposed to move the contents of the struct without copying anything?

By taking the physical page this one struct resides in, and mapping it into the virtual address space the second time. This approach is usually used in the kernel-level development, but there has been a lot of research done since the seventies on how to use it in runtimes for high-level programming languages.

Now, it does involve copying an address of this struct from one place to another, that I cede.


Sure. At the cost of needing >=4K per object, since otherwise "moving" an object involves also moving the other objects sharing the same page.


I think it's a worthwhile distinction to bring up because it highlights a common misconception people have about strings and vectors. A string value is not the string content itself, just a small struct containing a pointer and other metadata. If we're talking about the in-depth semantics of a language then it's important to point out that this struct is the string, and the array of UTF-8 characters it points to is not. C++ obfuscates this distinction because of how it automatically deep copies vectors and strings for you in many cases.


> then it's important to point out that this struct is the string, and the array of UTF-8 characters it points to is not.

So then under this model, what’s the difference between a string and a string_view?


> So then under this model, what’s the difference between a string and a string_view?

string_view doesn't do any deep copying.


...one is a string and one is a string view?

I'm not sure what you're getting at. They're both small structs holding pointers to char data, they just operate on that data differently.


Exactly, thinking about things in terms of their implementations is usually not a good way to actually understand what that thing is. By arguing that std::string is just the struct itself, which consists of who knows what... you fail to appreciate the actual semantics of std::string and how those semantics are really what defines the std::string.

std::string_view also has implementation details that in principle could be similar to std::string, it's a pointer with a size, but the semantics of std::string_view are very different from the semantics of std::string.

And that's the crux of the issue, it's better to understand classes in terms of their semantics, how they operate, rather than their implementations. Implementations can change, and two very separate things can have the same or very similar implementations.

A std::string is not just some pointers and some record keeping data; a std::string is best understood as a class used to own and manage a sequence of characters with the various operations that one would expect for such management. A std::string_view is non-owning, read-only variation of such a class that operates on an existing sequence of characters.

How these are implemented and their structural details is not really what's important, it's how someone is expected to use them and what can be done with them that counts.


My original comment was just saying that it's useful to point out to people that the concrete representation of a string in memory is a struct when relevant, since some people might not realize that. I'm not claiming anything about the best way to think about it overall.

> How these are implemented and their structural details is not really what's important

Usually this isn't important, unless you're talking about low level details impacting performance, which is exactly what the article is about.


> Usually this isn't important, unless you're talking about low level details impacting performance,

And if you’re going down that path, the string may not have a pointer at all.

“A string value is not the string content itself”, but in most cases it is if the string is short enough, implementation dependent disclaimer and all that.


That I think the description “the array is not the string” isn’t very elucidating for someone that doesn’t understand the nuance of the ownership/lifetime and move semantics (the topic of the article).

“C++ obfuscates this distinction because of how it automatically deep copies vectors and strings”

It does this because it has to, to guarantee its interface invariants. That “array” (if there is one) really is the string. Just because there might be an indirection doesn’t change that.

> they just operate on that data differently.

Well they operate on the memory “array” of the char data differently (well in the latter not at all).

Also a nitpick: std::string unlike String in Rust or other languages is not married to an encoding. And C++ managed to fuck that one up even more so recently.


It should be, but it's very much not in the real world at least as far as I've seen.

Using std::move for anything other than "unique ownership without pointers" really messes things up. People put std::move everywhere expecting performance gains, just like we used to put "&" everywhere expecting performance gains. It's a bit of cargo cultism that can be nicely dispelled by realizing std::move is just std::copy with a compiler-defined constructor invocation potentially run to determine the old value. With that phrasing, it's hard to hallucinate performance gains that might come automatically.


> std::move is just std::copy with a compiler-defined constructor invocation potentially run to determine the old value

I have no idea what that means.

std::move is a cast to an rvalue reference. That can potentially trigger a specific overloaded function to be selected and possibly, ultimately, a move constructor or assignment operator to be called.

For an explicit move to be profitable, an expression would have otherwise chosen a copy constructor for a type with an expensive copy constructor and a cheap move constructor.

std::copy is a range algorithm, not sure what's the relevance.


Yes, typed too fast. I meant the explicit copy constructor. Luckly, HN will hide my garbage text quickly enough. Thanks for the correction!


In fact, using std::move everywhere can actually make your performance worse!

https://devblogs.microsoft.com/oldnewthing/20231124-00/?p=10...


The real gem of the article is the interlude. E.g., reaching back to C days and pointing out that "It's either copy, or pointer". Once someone has that mental model solidly in hand, all the syntax sugar in the world cannot harm you.

Also "It was an ergonomic advancement." hides a lot of the overwrought syntax sugar in C++ that causes it to be such a weird language if you come from elsewhere. But still an excellent insight into the state of affairs.

I think the "Apparently" language makes it seem like this is some kind of accident that nobody would know about, when really the author was probably just being a creative writer, and the example was fundamental to the post.


You can think of a c++ move as a shallow copy that takes ownership of all objects originally owned by the source.


I mean it'll copy 3 pointers worth of data in all cases. It's just that for short strings, those 3 pointers worth of data contains the text of the string.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: