Hacker News new | past | comments | ask | show | jobs | submit login
Formatting text in C++: Old and new ways (mariusbancila.ro)
117 points by signa11 on Sept 16, 2023 | hide | past | favorite | 65 comments



    unsigned char str[]{3,4,5,6,0};
    std::stringstream ss;
    ss << "str=" << str;
    std::string text = ss.str();
> The content of text will be "str=♥♦♣♠".

no, it wont. if you are on an old Windows with code page 437 then sure. but on any sane UTF-8 system, you're just going to get some binary data.

1. https://wikipedia.org/wiki/Code_page_437


Thanks for finding this out as a codepage issue. The implementation of the operator<< will indeed call ostream::widen() to expand character into a locale dependent equivalent.


Something else to consider is compile time versus runtime validation with formatting libraries, e.g. due to passing the wrong number or type of arguments. The Abseil str_format library does compile time validation for both when possible: https://abseil.io/docs/cpp/guides/format


{fmt} certainly does this too. It works quite nicely with the clangd language server flagging a line as an error until the format string and arguments match.


Does `std::format` still use locales under the hood like `std::stringstream` does?

They dropped locale support for `std::to_chars` so hopefully they can be turned off for `std::format` too


Why do execution times drop so drastically with increasing number of iterations? Shouldn’t the caches be filled after one iteration already? There is no JIT in C++, or is it?


I only had a quick look at the code, but it looks like it's timing memory allocation. For example the sprintf part uses std::string str(100, '\0'). I'm not a C++ expert, but I believe this is essentially doing a malloc and memset of 100 bytes for every call to sprintf. So this is probably a poorly setup benchmark.


I think that might be it. Too bad, the results of this kind of benchmark would have been interesting.


Your CPU is effectively a virtual machine with stuff like branch prediction, speculative execution w/rollback, pipelining, implicit parallelism, etc. etc.

Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...


> There is no JIT in C++, or is it?

This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.

JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.

What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.

* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".


> C++ is Ahead of Time, by design; there is nothing to "just in time" compile.

Can I talk to you about our Lord and Savior the CPU trace cache[1]?

That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.

In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?

Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.

[1] https://chipsandcheese.com/2022/06/17/intels-netburst-failur...


If you want to muddy the waters for contrarianism, go for it.

This is clearly not what the OP was asking about.


> If you want to muddy the waters for contrarianism, [..].

No, and I don’t appreciate the accusation.

> This is clearly not what the OP was asking about.

Eh. I thought this was on topic when I wrote. On a second read I’m not sure either way. In any case, my point stands, I think: there are things happening that warm up after multiple loop iterations, as characteristic of JITs and not caches, and one potential source of those things is in fact JITish despite the fact that the translation of C++ into x86-64 has nothing to do with it—even if I’m not sure whether this is the right explanation in this particular case. The general answer to “can JITish things happen to my C++ code” is a definite yes.


Could be dynamic frequency scaling. To minimize the impact of it when benchmarking one can pin the process to a single core and warm it up before running the benchmark.


Branch predictor maybe.


That was my guess. Training the branch predictor on all those virtual calls.


finally!

I remember using variadic templates to print things in a single function call, like this:

    int i; float f; string s;
    print_special(i, f, s);
It would somehow imitate the behavior of python's print()

I never really understood how variadic template worked, maybe one day I will, and to be honest, I'm suspecting it's really not very kind to compile time, it's a lot of type checks done under the hood.

It's a bit problematic that C++ cannot be compiled quickly without a fast CPU, I wonder how this is going to be addressed one day, because it seems that modules aren't a good solution to that, yet.


I find it disappointing that cpp20 still doesn't have a solution that is more convenient than good ol printf (except for memory safety).

Another example would be convenient list comprehension, convenient maps wihout juggling around with tuples, first(), second(), at()...


Maps have been improved quite a bit.

For example, if you have a std::map<std::string, int>, you can iterate over it like this:

    for (auto [s, n]: my_map) {
        // s = string key, n = int value
    }
You can test for membership:

    if (my_map.contains(“foo”)) { /* do something * }
Although I still usually use find because if the key is in the map, I probably want the value.

You can use initialization lists with them too:

    std::map<std::string, int> my_map = {
      { “one”, 1 },
      { “two”, 2 },
      { “three”, 3 }
    };


    for (auto [s, n]: my_map)
copies all the data needlessly, better to use

    for (const auto& [s, n]: my_map)


That's why some newer languages have 'const are the default'.


I'm pretty sure it's the reference that makes the data not be copied.


Without the const the key would still be const.


Copying the data to a const makes little sense in this case. The extra & choice that has emerged makes things more complicated than needed. The sad faith of this old language.


what a language


just consider const reference to be the default for non-primitive types


My rule of thumb is to use const references when the sizeof the type is larger than the sizeof a pointer or reference.


Might be worth using values/copies even a bit bigger, so long as it's "simple" data. This[1] short post argues for passing `std::string_view` (~2 pointers) by value, for

- Omitting pointer indirections (loads),

- Not forcing the pointee to have an address (i.e. gotta be in memory, not just registers), and

- Eliminating aliasing questions, potentially leading to better codegen if the function isn't inlined.

1: https://quuxplusone.github.io/blog/2021/11/09/pass-string-vi...


This is very useful, thanks!


For list comprehension, we have (C++23): `std::ranges::to<std::vector>(items | std::views::filter(shouldInclude) | std::views::transform(f))` it’s not quite `[f(x) for x in items if shouldInclude(x)]` but it’s the same idea.


To be honest, if that's the notation, i will not be very eager to jump on cpp23. That said, I admire people who's minds stay open for c++ improvements and make that effort.


Well you could write it as

    to<vector>(items | filter(shouldInclude) | transform(f))
if you really want to, but generally C++ programmers prefer to be explicit and include the namespaces.


>but generally C++ programmers prefer to be explicit and include the namespaces.

why, though? Collisions in the stdlib? stdlib is too new to not be the default for these names?


The using declaration modifies your namespace.

From the docs:

    namespace X
    {
        using ::f;        // global f is now visible as ::X::f
        using A::g;       // A::g is now visible as ::X::g
    }
 
    void h()
    {
        X::f(); // calls ::f
        X::g(); // calls A::g
    }
Link to docs: https://en.cppreference.com/w/cpp/language/namespace#Using-d...


Thanks


Sweet baby Jesus I thought that was a joke as I started reading it. Still not entirely sure.


What's wrong with std::format?


Nothing but

(1) notation wise not very different from the old printf which is looked down upon.

(2) f"{name}'s hobby is {hobby} " would read like a novel and there a lot less comma seperated arguments.

(3) std::format is quite a lot of characters to type for something so ubiquitous.


(2) is a point I firmly agree with (though not everyone does), but it’s a hard one.

Here’s the way I think about it. I don’t think I’m wrong but I’m absolutely open to being told otherwise.

C++ is a language. It has a standard library. The library depends on the language, but the language shouldn’t depend on the library. This is because many applications cannot use the standard library, or parts of it.

The conceptual issue with fstrings in C++ is that the formatting is done on a library level. An fstring would be a language feature. It wouldn’t be reasonable for syntax sugar to resolve to a library call.

So what we’d need is a way of having parameterised strings that the language knows to separate out into parameters in a function call. For instance:

f(f”Hello, {planet}”);

would resolve to:

f(“Hello, {}”, planet);

such that replacing f with std::format, std::print (C++23), fmt::format, fmt::print, spdlog::info, spdlog::error, or even scn::scan (?), would do exactly what you want.

However, the expression f”Hello, {planet}” would be meaningless on its own, and care would need to be taken to avoid:

std::string x = f”Hello, {planet}”;

from resolving to:

std::string x = “Hello, {}”, planet;

Which would be equivalent to

std::string x = planet;

Thanks to C++‘s ludicrous comma operator.


I understand what you are saying.

I wonder/doubt if the comma would have to be an explicit step as part of the hand over to the std lib. The comma is a separator that the programmer would use, but does the compiler need that? The compiler needs to translate 1 argument to multiple arguments of multiple types, such that stdlib can receive all info. (The original fstring indeed cannot be expanded by the lib itself, since that would give stdlib a special status). So the needed language feature is a one to multi args translation, where at compile time all types are known. That would mean that in the context of an assignment (in stead of a function argument), the f string does not make sense. In that case the compilation can simply fail, no?

I guess the handing over the multiple args of different types is a problem. Not all infinite combinations can be solved by templating and I guess even if it could, the header would contain too much logic since the template expansion needs to happen from the header?


There's precedent for that in JavaScript's format strings. It also allows for cute ideas, like passing the template to a function that does SQL.


Typically, you'd just `using namespace std` on top of you file. Afterwards calls to `format` have the exact same length as `printf`.


This is questionable advice. In header files 'using namespace' should never be used, in implementation files it opens up some weird edge cases. Instead, do

   using fmt = std::format
and then use fmt(...) as the function call.

... at least that the current advice AIUI.


I don't understand. I genuinely thought that using using namespace std is considered bad practice because of possible arising conflicts. Also you still need to write the word format (though you could alias it to one character, with same namespace conflict possibilities). Am i pedantic?


This is not widely considered a good practice, please don't "using namespace std".


I know it's not recommended but I use it for main().

And extensive formatting and/or output should be minimised outside main(), imho.


It does not have enough greater or less signs in the function signature :)


It seems so trite, especially in 2023, but please don't use sprintf. It isn't safe in general. (Even snprintf is tricky.)


There are a lot of use cases where "isn't safe" is absolutely irrelevant


OTOH, in many of the cases where “isn’t safe” IS relevant, the developer believes it isn’t.


Buffer overflows are never irrelevant. You might get away with it until the day it blows up or someone manages to exploit it. Or you could code it correctly the first time.


"It isn't safe in general"

Disagree. Use whatever is the most simple and boring option for the problem's solution.

Also, the standard library has so much stuff that will give you pain in runtime, that avoiding sprintf really is not relevant.

I don't know what "Safe" means in general in the scope of C++. If there is a memory corruption, my program will crash. Then I will compile it with C++ debug runtime, which will pinpoint the exact location that caused the memory leak. Then I fix the leak.

Not using sprintf will not result in code that would not have memory leaks. C++ in total is unsafe. You need to write code and have production system that foremost takes this into account. You can't make C++ safe by following dogma of not using functions with possible side-effects. There is a very hight chance your fall-back algorithms themselves will leak memroy anyway.

The only way to write 'as non-leaky' C++ as possible is to make the code as simple, and easy to reason about as possible, and to have tooling to assist in program validation. This requirement is much more important than avoiding some parts of standard library.

Use static checkers, use Microsoft's debug C runtime, use address sanitizers, etc.

If you know some parts of you standard library are broken then ofc avoid them. But what can be considered "broken" really depends what one is trying to achiece, and which platforms one is targeting.


saying "dont use" something isn't really actionable, unless you give a safe alternative.


The safe alternatives are in the article.


asprintf is often a better choice if you're heap allocating anyway. For a static or stack buffer, obviously snprintf, unless you know the maximum possible length won't exceed (which you often do....).


And explain why it’s not safe too


YOLO


Will C++ ever get the possibility to just print the contents of an object like Rust does (with the automatic debug trait)? I am tired of writing my own print functions for random objects when debugging because the API developers did not bother to override the <<operator. One of those things that are hard to accept when coming back from Rust.


Not until we get reflection. And reflection efforts seem to be at an impasse, so I don’t imagine we will see it for a few years at least.

I will add, though, that I’ve found copilot to be very handy when it comes to generating formatting. Last week I used it while writing fmt::formatter specialisations for a library with 50 structs, some having over 20 members. Writing all of them took about 10 minutes. I dare say the same would hold for operator<< overloads.


Rust does not use reflection for this. Reflection support in rust isn’t actually much better than in C++.


I’m not a rust dev, but looking into how the Debug trait works, it’s basically what we’d need compile time reflection to get in C++.

It generates code that uses the class name and class members. In C++ we have no way of doing that (without hacky macro stuff).


The fact that Rust also has a pretty-print("{:#}")[1] a single character away is also really convenient. When you're working with JSON or debug printing types it's nice to be able to format that without reaching for external tools.

[1] https://doc.rust-lang.org/std/fmt/


> I am tired of writing my own print functions for random objects when debugging because the API developers did not bother to override the <<operator.

Won't you face the same problem in Rust? If the library developer did not derive the Debug trait, you're out of luck.


TIL C++ now has a third builtin way to format strings. Can't wait to not use it.


The C++ STL is so bad in so many ways, it never ceases to amaze me




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: