Thanks for finding this out as a codepage issue. The implementation of the operator<< will indeed call ostream::widen() to expand character into a locale dependent equivalent.
Something else to consider is compile time versus runtime validation with formatting libraries, e.g. due to passing the wrong number or type of arguments. The Abseil str_format library does compile time validation for both when possible: https://abseil.io/docs/cpp/guides/format
{fmt} certainly does this too. It works quite nicely with the clangd language server flagging a line as an error until the format string and arguments match.
Why do execution times drop so drastically with increasing number of iterations? Shouldn’t the caches be filled after one iteration already? There is no JIT in C++, or is it?
I only had a quick look at the code, but it looks like it's timing memory allocation. For example the sprintf part uses std::string str(100, '\0'). I'm not a C++ expert, but I believe this is essentially doing a malloc and memset of 100 bytes for every call to sprintf. So this is probably a poorly setup benchmark.
Your CPU is effectively a virtual machine with stuff like branch prediction, speculative execution w/rollback, pipelining, implicit parallelism, etc. etc.
Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...
This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.
What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.
* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".
> C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
Can I talk to you about our Lord and Savior the CPU trace cache[1]?
That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.
In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?
Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.
> If you want to muddy the waters for contrarianism, [..].
No, and I don’t appreciate the accusation.
> This is clearly not what the OP was asking about.
Eh. I thought this was on topic when I wrote. On a second read I’m not sure either way. In any case, my point stands, I think: there are things happening that warm up after multiple loop iterations, as characteristic of JITs and not caches, and one potential source of those things is in fact JITish despite the fact that the translation of C++ into x86-64 has nothing to do with it—even if I’m not sure whether this is the right explanation in this particular case. The general answer to “can JITish things happen to my C++ code” is a definite yes.
Could be dynamic frequency scaling. To minimize the impact of it when benchmarking one can pin the process to a single core and warm it up before running the benchmark.
I remember using variadic templates to print things in a single function call, like this:
int i; float f; string s;
print_special(i, f, s);
It would somehow imitate the behavior of python's print()
I never really understood how variadic template worked, maybe one day I will, and to be honest, I'm suspecting it's really not very kind to compile time, it's a lot of type checks done under the hood.
It's a bit problematic that C++ cannot be compiled quickly without a fast CPU, I wonder how this is going to be addressed one day, because it seems that modules aren't a good solution to that, yet.
Copying the data to a const makes little sense in this case. The extra & choice that has emerged makes things more complicated than needed. The sad faith of this old language.
Might be worth using values/copies even a bit bigger, so long as it's "simple" data. This[1] short post argues for passing `std::string_view` (~2 pointers) by value, for
- Omitting pointer indirections (loads),
- Not forcing the pointee to have an address (i.e. gotta be in memory, not just registers), and
- Eliminating aliasing questions, potentially leading to better codegen if the function isn't inlined.
For list comprehension, we have (C++23): `std::ranges::to<std::vector>(items | std::views::filter(shouldInclude) | std::views::transform(f))` it’s not quite `[f(x) for x in items if shouldInclude(x)]` but it’s the same idea.
To be honest, if that's the notation, i will not be very eager to jump on cpp23. That said, I admire people who's minds stay open for c++ improvements and make that effort.
namespace X
{
using ::f; // global f is now visible as ::X::f
using A::g; // A::g is now visible as ::X::g
}
void h()
{
X::f(); // calls ::f
X::g(); // calls A::g
}
(2) is a point I firmly agree with (though not everyone does), but it’s a hard one.
Here’s the way I think about it. I don’t think I’m wrong but I’m absolutely open to being told otherwise.
C++ is a language. It has a standard library. The library depends on the language, but the language shouldn’t depend on the library. This is because many applications cannot use the standard library, or parts of it.
The conceptual issue with fstrings in C++ is that the formatting is done on a library level. An fstring would be a language feature. It wouldn’t be reasonable for syntax sugar to resolve to a library call.
So what we’d need is a way of having parameterised strings that the language knows to separate out into parameters in a function call. For instance:
f(f”Hello, {planet}”);
would resolve to:
f(“Hello, {}”, planet);
such that replacing f with std::format, std::print (C++23), fmt::format, fmt::print, spdlog::info, spdlog::error, or even scn::scan (?), would do exactly what you want.
However, the expression f”Hello, {planet}” would be meaningless on its own, and care would need to be taken to avoid:
I wonder/doubt if the comma would have to be an explicit step as part of the hand over to the std lib. The comma is a separator that the programmer would use, but does the compiler need that? The compiler needs to translate 1 argument to multiple arguments of multiple types, such that stdlib can receive all info. (The original fstring indeed cannot be expanded by the lib itself, since that would give stdlib a special status). So the needed language feature is a one to multi args translation, where at compile time all types are known. That would mean that in the context of an assignment (in stead of a function argument), the f string does not make sense. In that case the compilation can simply fail, no?
I guess the handing over the multiple args of different types is a problem. Not all infinite combinations can be solved by templating and I guess even if it could, the header would contain too much logic since the template expansion needs to happen from the header?
This is questionable advice. In header files 'using namespace' should never be used, in implementation files it opens up some weird edge cases. Instead, do
I don't understand. I genuinely thought that using using namespace std is considered bad practice because of possible arising conflicts. Also you still need to write the word format (though you could alias it to one character, with same namespace conflict possibilities). Am i pedantic?
Buffer overflows are never irrelevant. You might get away with it until the day it blows up or someone manages to exploit it. Or you could code it correctly the first time.
Disagree. Use whatever is the most simple and boring option for the problem's solution.
Also, the standard library has so much stuff that will give you pain in runtime, that avoiding sprintf really is not relevant.
I don't know what "Safe" means in general in the scope of C++. If there is a memory corruption, my program will crash. Then I will compile it with C++ debug runtime, which will pinpoint the exact location that caused the memory leak. Then I fix the leak.
Not using sprintf will not result in code that would not have memory leaks. C++ in total is unsafe. You need to write code and have production system that foremost takes this into account. You can't make C++ safe by following dogma of not using functions with possible side-effects. There is a very hight chance your fall-back algorithms themselves will leak memroy anyway.
The only way to write 'as non-leaky' C++ as possible is to make the code as simple, and easy to reason about as possible, and to have tooling to assist in program validation. This requirement is much more important than avoiding some parts of standard library.
Use static checkers, use Microsoft's debug C runtime, use address sanitizers, etc.
If you know some parts of you standard library are broken then ofc avoid them. But what can be considered "broken" really depends what one is trying to achiece, and which platforms one is targeting.
asprintf is often a better choice if you're heap allocating anyway. For a static or stack buffer, obviously snprintf, unless you know the maximum possible length won't exceed (which you often do....).
Will C++ ever get the possibility to just print the contents of an object like Rust does (with the automatic debug trait)?
I am tired of writing my own print functions for random objects when debugging because the API developers did not bother to override the <<operator. One of those things that are hard to accept when coming back from Rust.
Not until we get reflection. And reflection efforts seem to be at an impasse, so I don’t imagine we will see it for a few years at least.
I will add, though, that I’ve found copilot to be very handy when it comes to generating formatting. Last week I used it while writing fmt::formatter specialisations for a library with 50 structs, some having over 20 members. Writing all of them took about 10 minutes. I dare say the same would hold for operator<< overloads.
The fact that Rust also has a pretty-print("{:#}")[1] a single character away is also really convenient. When you're working with JSON or debug printing types it's nice to be able to format that without reaching for external tools.
no, it wont. if you are on an old Windows with code page 437 then sure. but on any sane UTF-8 system, you're just going to get some binary data.
1. https://wikipedia.org/wiki/Code_page_437