Type-Safe Printf for C

WalterBright · on Dec 14, 2021

D supports calling C functions directly, including printf. When we added printf format checking against the arguments, many bugs were exposed and fixed. It was a big win.

kazinator · on Dec 14, 2021

GCC has this also; this work is different because it removes the type errors. Instead of having an error like "%d expects a parameter of type int, not char ", it just lets %d print the string anyway. It's more like format* in Lisp, say:

  [1]> (format t "~05,'0d" 5)
  00005
  NIL
  [2]> (format t "~05,'0d" "abc")
  00abc
  NIL

Here ~d (decimal) doesn't care that it didn't get an integer.

WalterBright · on Dec 15, 2021

For D's formatted write function, %s means "I don't care what type it is, just do the write thing with it" which works fine for nearly all uses.

AlexanderDhoore · on Dec 14, 2021

How is this safer than enabling all warnings in GCC or clang?

  'Type-safe' in this context does not mean that you get more compile errors, but that the format specifier does not need to specify the argument type, but just defines the print format. In fact, format strings with this library will have less compile-time checking (namely none) than with modern compilers for standard printf. This approach is still safer.

EDIT The answer is at the bottom apparently. Maybe put that up higher?

edflsafoiewq · on Dec 14, 2021

> There is absolutely no chance to give a wrong format specifier and access the stack (like printf does via stdarg.h) in undefined ways. This is particularly true for multi-arch development where with printf you need to be careful about length specifiers, and you might not get a warning on your machine, but the next person will and it will crash there. I usually need to compile for a few times on multiple architectures to get the integer length correct, e.g., %u vs %lu vs. %llu vs. %zu.

kevin_thibedeau · on Dec 14, 2021

This is really annoying on architectures like ARM32 where size_t is closely related to unsigned int but uint32_t is long unsigned int and gets flagged as a different type. It becomes a real problem when using a stripped down printf like the one in newlib that doesn't support %zu.

tinkersleep · on Dec 14, 2021

Exactly! Or on Windows 64-bit, where 'long' is 32-bit and 'size_t' is 'unsigned long long'.

GeorgeTirebiter · on Dec 14, 2021

The Real Problem (tm) is: specifiers like 'char' and 'int' etc should not be allowed; they 'should be' things like c8 or i16 or u64 --- that is, specify the #of bits for that dataype in the type specifier. This is what sys/stdint.h is trying to fix.

What maybe 'should' happen in C2x is: 'int' is defined as i16, 'long' as i32, 'long long' as i64 etc and then see which programs break. Because it's perfectly OK to have 16-bit 'ints' on a 64-bit arch. (size_t is what you use to deal with architecture-specific chunks). And then remove all this 'int' etc crap from C. (Obv, some 'compat switch' would need to exist, but you get the idea.)

kevin_thibedeau · on Dec 14, 2021

No that should not happen. Integer types that adapt to the platform word size enhance portability. Nobody wants a 32-bit default int on an 8-bit platform and using uint8_t or uint16_t can introduce performance regressions on wider platforms. The traditional integer types are perfectly suited for scenarios where the exact width doesn't matter and you know the guaranteed minimum is good enough.

dmitrygr · on Dec 15, 2021

To address the problem you mention, uint_fast8_t and co exist. It is at least 8 bits big, but whatever type is fastest that fits that requirement. So on an 8-bit system it is a uint8_t. on a 32-bit ARM is is a uint32_t. there is also uint_fast16_t and so on...

GeorgeTirebiter · on Dec 16, 2021

As the programmer, I don't really care what the h/w does; if I specify u8 and every operation produces results 'as if' the type were 8 actual unsigned bits, eg, when using an underlying u32 type -- great. From 'my model' -- it's still a u8.

But there must be no conceptual 'leaking' (behaviors that I experience using e.g. uint_fast8_t that are in any way different from behaviors I experience when using u8 ).

I don't especially care how the HW works; because my 'virtual machine' is C. (yes, yes, I know Reality intrudes, and sometimes you need to get closer to the machine. But, isn't this because of 'leaking' between 'virtual machine' and 'actual machine' that I mention above?)

arka2147483647 · on Dec 14, 2021

I would argue that most code nowdays iplicitly assumes that int is 32bit’s long, and wont work correctly in a 8bit platform anyways. If ’platform size conforming’ ints are used, they probably should be opt-in, instead of opt-out.

kevin_thibedeau · on Dec 15, 2021

That's great for people weaned on Java's false promise of a uniform type system. Then you find out you want the behavior of unsigned integer overflow and have to jump through contortions to get it. You can't set a single standard for the default that works universally.

kevin_thibedeau · on Dec 14, 2021

For Windows the reason they couldn't switch to LP64 is because they screwed up the type system with LONG and allowed it to be incorporated into OS structs. That prevents long from being 64-bit for the sake of rationality.

thebruce87m · on Dec 14, 2021

Can’t you just use the inttypes.h along with stdint fixed width types to avoid the multiple compiles?

This stackoverflow answer gives an example: https://stackoverflow.com/questions/7597025/difference-betwe...

edflsafoiewq · on Dec 14, 2021

Of course if you just do everything correctly you don't need safety. But if you are fallible it is nice to have.

tinkersleep · on Dec 14, 2021

Ok, thanks for the hint. I put the most important infos into the intro: you just don't need 'll', 'l', 'z' modifiers for specifying sizeof(operand), as the compiler does that via _Generic.

eps · on Dec 14, 2021

Also put an example in the first pageful. I almost lost hope while scrolling through the wall of format spec when I finally saw the first example.

tinkersleep · on Dec 15, 2021

OK, yes, good idea.

tinkersleep · on Dec 14, 2021

Probably the 1e6th approach, but anyway, I also wanted to play with this myself: here's a _Generic and macro based approach to get printf type-safe in C. It needs C11, and uses some gcc extensions.

marcodiego · on Dec 14, 2021

I, a few times, got reasonably far implementing a generic, type-safe, variadic, macro-based and using _Generic "print" for C.

I copied some examples of how to implement variadic macros, and expanded on that for C basic types. It mostly worked, you'll always have difficulty for corner cases like separating pointers and arrays, but it worked well for the basic C types.

I gave up for a few reasons:

  - I wanted a form to register new types, so it could work for user-defined types;

  - the C pre-processor knows nothing about lists that can be expanded multiple times;

  - variadic C macros are ugly hacks.

Maybe one day I'll get back to it and publish it.

The interesting part is that _Generic combined with macros allows some very interesting tools for implementing primitive forms of polymorphism. Actually, if the C pre-processor supported lists, it would be possible to implement RTTI in C.

tinkersleep · on Dec 14, 2021

> - I wanted a form to register new types, so it could work for user-defined types;

Yes, I had the same urge. You can easily fall into the trap of too many features on the list. I settled on keeping user types out: you can always write a stringify() and pass that to the printf. Not the same, I know. But a more finite project.

> - the C pre-processor knows nothing about lists that can be expanded multiple times;

Yeah, that's a hack. Look at the 'VA_EXP()' macros in include/va_print/base.h. Ugly. Incomprehensible.

> - variadic C macros are ugly hacks.

Absolutely. But I think there is no other way in C.

> Actually, if the C pre-processor supported lists, it would be possible to implement RTTI in C.

I couldn't resist to put in '%t' which prints the C type of the argument...

95014_refugee · on Dec 14, 2021

__attribute__((overloadable)) is also worth looking at...

marcodiego · on Dec 15, 2021

Hm… interesting… it would be good if GCC supported it too.

bumblebritches5 · on Dec 14, 2021

> - I wanted a form to register new types, so it could work for user-defined types;

> - the C pre-processor knows nothing about lists that can be expanded multiple times;

I'm actually working on both features as Clang extensions.

#repeat, a preprocessor directive to loop, can be combined with _Pragma(push_macro/pop_macro) to create lists by redefining a macro.

and currently #increment, though I think I want to expand on this so that other macros can be redefined more easily to create lists via push/pop macro.

The reason push_macro/pop_macro pragmas can't work, is the macro has to be undefined and redefined, and the value then pushed onto a stack in the compiler.

and you can't redefine a macro in the body of another macro directly.

so I've been thinking about maybe a _Pragma(redefine_macro(MacroToRedefine, NewValueForRedefinedMacro))

but I don't want it to be limited to the _Pragma area of the compiler, I want it to be eventually standardized.

I've been talking to a friend at WG14 who suggested making it a "Preprocessor Expression, like `__has_c_attribute` and `defined()`

So that's the area I've been working on recently for the Increment/Redefine PE lately.

marcodiego · on Dec 14, 2021

I spent a long time thinking about this. My conclusion is that the simplest way to achieve this, at least in GCC, is to create a #copy directive that allows a macro, together with its stack, to be copied to another. GCC already allows stack expansion with push and pop but it can only be expanded once; the #copy directive would fix that.

If you get anything close to that working, that would be a godsend. It is the last remaining piece of the puzzle for me to implement complete RTTI in C. It would certainly help to minimize glib boiler plate code too.

I'd really like it to be part of c2x, but I think it is too late now. If it is implemented by either GCC or Clang, the remaining other would certainly it too since it is too useful. So getting it to work in any of these would be good enough for me.

How can I track/follow your progress?

tinkersleep · on Dec 14, 2021

There is __VA_OPT__ in C++2a, which handles recursion termination in macro expansion. This will probably be in future C, too, right?

And if there was also __EVAL__ to force the macro preprocessor into another evaluation level, you could write recursive macros quite easily, e.g., to wrap every argument into a function call:

    #define EACH(f,x,...) f(x) __VA_OPT__(, __EVAL__(EACH(f, __VA_ARGS__)))

This would make the macro magic for this library trivial: you could process lists recursively.

Edit: added missing paren

kzrdude · on Dec 14, 2021

Do you have a usage example? One early in thee readme maybe. Seeing is believing

jhallenworld · on Dec 14, 2021

So I also have a custom printf, but there is a limitation: if you ask gcc to check it with "__attribute__((__format__ (__printf__", then you are forced into using gcc's idea of what the printf format string syntax.

How can I have strict type checking, but a user defined format string?

wahern · on Dec 15, 2021

The format attribute takes the argument positions of the format string and the first variadic argument. You can pass the types as a separate array (compound literal array) as any argument before the beginning of the variadic portion, just as you would if passing in the number of variadic arguments (which one should also do to ensure the number of format specifiers matches the number of passed arguments). The magic for detecting types (e.g. _Generic) would all be the same, but there'd be a little more duplication for the variable argument macro magic. I've implemented this both ways and I don't recall there being any significant difference, but it's been awhile.

kevin_thibedeau · on Dec 14, 2021

Write your own linter.

kazinator · on Dec 14, 2021

It's a big mistake that the format language looks like that of printf.

If you use this in a big code base, there will still be the old printf all over the place.

Now you have to think: is this custom logging function here based on the safe printf from github, or is it vsprintf under the hood?

tinkersleep · on Dec 15, 2021

You are right, it was a poor decision. The format is not C compatible, and using the same sigil is dangerous when switching and this library's format is passed to standard printf. This started out as a drop-in replacement, but it isn't, and cannot be. The library should not encourage, but avoid accidental format string mistakes.

So thank you, this is fixed.

Animats · on Dec 14, 2021

GCC has had printf checking for, what, 20 years?

37ef_ced3 · on Dec 14, 2021

If you want a modern C, use Go.

Unless you need maximum performance (SIMD, GPUs, etc.) you should use a developer-efficient, productive language.

Well-written Go executes almost as fast as C, and you will be more productive as a programmer.

einpoklum · on Dec 14, 2021

Go is not a "modern C". It may or may not be a swell language, but it differs fundamentally from C:

1. Go is a garbage-collected language, C is not.

2. Go is a single-company-managed language, while C is managed by an international standards committee within ISO. You might not care about this difference, but its quite significant w.r.t. how future language developments happen.

3. C types are intentional, Go types are extentional ("structural typing").

These fundamental differences are not cases of one language being superior, or further advanced, than the other - they're about going in different directions.

37ef_ced3 · on Dec 14, 2021

I have been writing C for decades, but now I almost exclusively use Go.

What I mean is that if you like C99, you will probably like Go. Go can be understood as a modernization of C that doesn't abandon C's simplicity but adds a few important facilities that C lacks.

Go obviously derives from C. It's a very C-like language. It makes sense to view Go as an enhanced C that makes slightly different trade-offs and that is applicable to a slightly different set of purposes.

bachmeier · on Dec 14, 2021

Honestly, I think D's betterC would be the right choice for someone that wants to keep writing C but wants modernized features. Go might be great for someone looking to replace C, but betterC is comfortable for someone that prefers to continue to write C.

noughts · on Dec 14, 2021

Has anyone benchmarked Go's garbage collector lately? I like a lot of stuff about Go, but a lot of my work is in video games and real time audio, and I am extremely hesitant to use a garbage collected language for those things.

nikki93 · on Dec 14, 2021

I've been working on a Go -> C++ compiler pretty much mainly for this use case, that skips the GC and concurrency stuff -- https://www.reddit.com/r/golang/comments/r2795t/i_wrote_a_si... -- Includes a demo video of a game I'm making with it and a built-in scene editor that uses reflection etc.

Repo for compiler itself: https://github.com/nikki93/gx (no README.md etc. yet, will be getting to that when I next have a chance (it's a side project)). It just takes around 1500 lines of Go thanks to the parser and typechecker in the standard library.

Go's perf was definitely non-trivially bad for me on WebAssembly.

pphysch · on Dec 14, 2021

> I know I can "do things to maybe cause the GC to run less" or such, but then that immediately starts to detract from the goal of having a language where I can focus on just the gameplay code.

Did you try implementing pooling (e.g. sync.Pool) for game objects/entities/components/etc? How did that go perf-wise?

nikki93 · on Dec 14, 2021

I think the main thing is it starts to become a distraction from just writing the gameplay code. I don't have to implement the pooling stuff now that I have this compiler--naive / simple code tends to also start off with a high perf ceiling. But yeah if I did go further with the game in vanilla Go I might have to try the pool approach. Having worked on game engines with GC language runtimes (using Lua etc.) before, you always ultimately hit a perf ceiling due to lack of memory control and wish you could move out of it, but the runtimes don't give you a way to do that incrementally.

Ultimately in the game scenario the GC is actually just ... not helpful. Game logic code already explicitly handles lifetimes to some degree (eg. when this entity collides with that one, destroy it, etc.) -- emergently deciding when to free things based on references is usually not what you want. You do want it for resource management (like a texture cache), but it actually makes sense to kind of roll that on your own and adapt it to the game. So having a GC and then fighting it just sounds like an ill-fitted solution.

pphysch · on Dec 15, 2021

Allocation pooling gets around the Go GC but is also used in non-GC languages because it can drastically reduce the overall number of allocations AND improve cache performance. In a GC lang, it also forces you to be explicit with your lifetimes which can lead to better code (i.e. you need to Pool.Put rather than let the GC clean up).

In a well designed game engine, you will only need to implement it a handful of times (if that) to cover 99% of the hot code. Certainly not something to need to do for each class of game object.

nikki93 · on Dec 15, 2021

Pooling is indeed what the ECS I use (entt) basically does--every component type has a contiguous pool of instances. Compiling Go to C++ lets me use entt among other things (target all of C++'s targets including Wasm, have some types of metaprogramming like statically reflecting all component types, etc.). The GC thing is just one of the results. There is no comparison for the amount of control you get vs. vanilla Go (where things can escape to the heap "whenever").

remexre · on Dec 14, 2021

WebAssembly is notably a pathological case for _any_ stack-scanning GC, since the stack isn't addressable.

baybal2 · on Dec 14, 2021

Beware of Go. Google may use it to do the "Embrace Extend Extinguish" move. It might be type safe, but not ideologically safe.

This is on top of Go being an unstable, immature language.

37ef_ced3 · on Dec 14, 2021

Go is very stable, and 12 years old.

nikki93 · on Dec 14, 2021

Have you compared the performance and generated binary size of C vs. Go on WebAssembly?

37ef_ced3 · on Dec 14, 2021

For WebAssembly, use the TinyGo Go compiler:

https://tinygo.org/

nikki93 · on Dec 14, 2021

Yeah the Go -> C++ compiler I linked from my other comment is pretty much overlapping with this idea. TinyGo is still a bit early afaict and also tries to exactly implement Go semantics but I'm kind of interested in extending / adapting it to my use case.