Hacker News new | past | comments | ask | show | jobs | submit login
Type-Safe Printf for C (github.com/moehriegitt)
71 points by tinkersleep on Dec 14, 2021 | hide | past | favorite | 50 comments



D supports calling C functions directly, including printf. When we added printf format checking against the arguments, many bugs were exposed and fixed. It was a big win.


GCC has this also; this work is different because it removes the type errors. Instead of having an error like "%d expects a parameter of type int, not char ", it just lets %d print the string anyway. It's more like format* in Lisp, say:

  [1]> (format t "~05,'0d" 5)
  00005
  NIL
  [2]> (format t "~05,'0d" "abc")
  00abc
  NIL
Here ~d (decimal) doesn't care that it didn't get an integer.


For D's formatted write function, %s means "I don't care what type it is, just do the write thing with it" which works fine for nearly all uses.


How is this safer than enabling all warnings in GCC or clang?

  'Type-safe' in this context does not mean that you get more compile errors, but that the format specifier does not need to specify the argument type, but just defines the print format. In fact, format strings with this library will have less compile-time checking (namely none) than with modern compilers for standard printf. This approach is still safer.
EDIT The answer is at the bottom apparently. Maybe put that up higher?


> There is absolutely no chance to give a wrong format specifier and access the stack (like printf does via stdarg.h) in undefined ways. This is particularly true for multi-arch development where with printf you need to be careful about length specifiers, and you might not get a warning on your machine, but the next person will and it will crash there. I usually need to compile for a few times on multiple architectures to get the integer length correct, e.g., %u vs %lu vs. %llu vs. %zu.


This is really annoying on architectures like ARM32 where size_t is closely related to unsigned int but uint32_t is long unsigned int and gets flagged as a different type. It becomes a real problem when using a stripped down printf like the one in newlib that doesn't support %zu.


Exactly! Or on Windows 64-bit, where 'long' is 32-bit and 'size_t' is 'unsigned long long'.


The Real Problem (tm) is: specifiers like 'char' and 'int' etc should not be allowed; they 'should be' things like c8 or i16 or u64 --- that is, specify the #of bits for that dataype in the type specifier. This is what sys/stdint.h is trying to fix.

What maybe 'should' happen in C2x is: 'int' is defined as i16, 'long' as i32, 'long long' as i64 etc and then see which programs break. Because it's perfectly OK to have 16-bit 'ints' on a 64-bit arch. (size_t is what you use to deal with architecture-specific chunks). And then remove all this 'int' etc crap from C. (Obv, some 'compat switch' would need to exist, but you get the idea.)


No that should not happen. Integer types that adapt to the platform word size enhance portability. Nobody wants a 32-bit default int on an 8-bit platform and using uint8_t or uint16_t can introduce performance regressions on wider platforms. The traditional integer types are perfectly suited for scenarios where the exact width doesn't matter and you know the guaranteed minimum is good enough.


To address the problem you mention, uint_fast8_t and co exist. It is at least 8 bits big, but whatever type is fastest that fits that requirement. So on an 8-bit system it is a uint8_t. on a 32-bit ARM is is a uint32_t. there is also uint_fast16_t and so on...


As the programmer, I don't really care what the h/w does; if I specify u8 and every operation produces results 'as if' the type were 8 actual unsigned bits, eg, when using an underlying u32 type -- great. From 'my model' -- it's still a u8.

But there must be no conceptual 'leaking' (behaviors that I experience using e.g. uint_fast8_t that are in any way different from behaviors I experience when using u8 ).

I don't especially care how the HW works; because my 'virtual machine' is C. (yes, yes, I know Reality intrudes, and sometimes you need to get closer to the machine. But, isn't this because of 'leaking' between 'virtual machine' and 'actual machine' that I mention above?)


I would argue that most code nowdays iplicitly assumes that int is 32bit’s long, and wont work correctly in a 8bit platform anyways. If ’platform size conforming’ ints are used, they probably should be opt-in, instead of opt-out.


That's great for people weaned on Java's false promise of a uniform type system. Then you find out you want the behavior of unsigned integer overflow and have to jump through contortions to get it. You can't set a single standard for the default that works universally.


For Windows the reason they couldn't switch to LP64 is because they screwed up the type system with LONG and allowed it to be incorporated into OS structs. That prevents long from being 64-bit for the sake of rationality.


Can’t you just use the inttypes.h along with stdint fixed width types to avoid the multiple compiles?

This stackoverflow answer gives an example: https://stackoverflow.com/questions/7597025/difference-betwe...


Of course if you just do everything correctly you don't need safety. But if you are fallible it is nice to have.


Ok, thanks for the hint. I put the most important infos into the intro: you just don't need 'll', 'l', 'z' modifiers for specifying sizeof(operand), as the compiler does that via _Generic.


Also put an example in the first pageful. I almost lost hope while scrolling through the wall of format spec when I finally saw the first example.


OK, yes, good idea.


Probably the 1e6th approach, but anyway, I also wanted to play with this myself: here's a _Generic and macro based approach to get printf type-safe in C. It needs C11, and uses some gcc extensions.


I, a few times, got reasonably far implementing a generic, type-safe, variadic, macro-based and using _Generic "print" for C.

I copied some examples of how to implement variadic macros, and expanded on that for C basic types. It mostly worked, you'll always have difficulty for corner cases like separating pointers and arrays, but it worked well for the basic C types.

I gave up for a few reasons:

  - I wanted a form to register new types, so it could work for user-defined types;

  - the C pre-processor knows nothing about lists that can be expanded multiple times;

  - variadic C macros are ugly hacks.
Maybe one day I'll get back to it and publish it.

The interesting part is that _Generic combined with macros allows some very interesting tools for implementing primitive forms of polymorphism. Actually, if the C pre-processor supported lists, it would be possible to implement RTTI in C.


> - I wanted a form to register new types, so it could work for user-defined types;

Yes, I had the same urge. You can easily fall into the trap of too many features on the list. I settled on keeping user types out: you can always write a stringify() and pass that to the printf. Not the same, I know. But a more finite project.

> - the C pre-processor knows nothing about lists that can be expanded multiple times;

Yeah, that's a hack. Look at the 'VA_EXP()' macros in include/va_print/base.h. Ugly. Incomprehensible.

> - variadic C macros are ugly hacks.

Absolutely. But I think there is no other way in C.

> Actually, if the C pre-processor supported lists, it would be possible to implement RTTI in C.

I couldn't resist to put in '%t' which prints the C type of the argument...


__attribute__((overloadable)) is also worth looking at...


Hm… interesting… it would be good if GCC supported it too.


> - I wanted a form to register new types, so it could work for user-defined types;

> - the C pre-processor knows nothing about lists that can be expanded multiple times;

I'm actually working on both features as Clang extensions.

#repeat, a preprocessor directive to loop, can be combined with _Pragma(push_macro/pop_macro) to create lists by redefining a macro.

and currently #increment, though I think I want to expand on this so that other macros can be redefined more easily to create lists via push/pop macro.

The reason push_macro/pop_macro pragmas can't work, is the macro has to be undefined and redefined, and the value then pushed onto a stack in the compiler.

and you can't redefine a macro in the body of another macro directly.

so I've been thinking about maybe a _Pragma(redefine_macro(MacroToRedefine, NewValueForRedefinedMacro))

but I don't want it to be limited to the _Pragma area of the compiler, I want it to be eventually standardized.

I've been talking to a friend at WG14 who suggested making it a "Preprocessor Expression, like `__has_c_attribute` and `defined()`

So that's the area I've been working on recently for the Increment/Redefine PE lately.


I spent a long time thinking about this. My conclusion is that the simplest way to achieve this, at least in GCC, is to create a #copy directive that allows a macro, together with its stack, to be copied to another. GCC already allows stack expansion with push and pop but it can only be expanded once; the #copy directive would fix that.

If you get anything close to that working, that would be a godsend. It is the last remaining piece of the puzzle for me to implement complete RTTI in C. It would certainly help to minimize glib boiler plate code too.

I'd really like it to be part of c2x, but I think it is too late now. If it is implemented by either GCC or Clang, the remaining other would certainly it too since it is too useful. So getting it to work in any of these would be good enough for me.

How can I track/follow your progress?


There is __VA_OPT__ in C++2a, which handles recursion termination in macro expansion. This will probably be in future C, too, right?

And if there was also __EVAL__ to force the macro preprocessor into another evaluation level, you could write recursive macros quite easily, e.g., to wrap every argument into a function call:

    #define EACH(f,x,...) f(x) __VA_OPT__(, __EVAL__(EACH(f, __VA_ARGS__)))
This would make the macro magic for this library trivial: you could process lists recursively.

Edit: added missing paren


Do you have a usage example? One early in thee readme maybe. Seeing is believing


So I also have a custom printf, but there is a limitation: if you ask gcc to check it with "__attribute__((__format__ (__printf__", then you are forced into using gcc's idea of what the printf format string syntax.

How can I have strict type checking, but a user defined format string?


The format attribute takes the argument positions of the format string and the first variadic argument. You can pass the types as a separate array (compound literal array) as any argument before the beginning of the variadic portion, just as you would if passing in the number of variadic arguments (which one should also do to ensure the number of format specifiers matches the number of passed arguments). The magic for detecting types (e.g. _Generic) would all be the same, but there'd be a little more duplication for the variable argument macro magic. I've implemented this both ways and I don't recall there being any significant difference, but it's been awhile.


Write your own linter.


It's a big mistake that the format language looks like that of printf.

If you use this in a big code base, there will still be the old printf all over the place.

Now you have to think: is this custom logging function here based on the safe printf from github, or is it vsprintf under the hood?


You are right, it was a poor decision. The format is not C compatible, and using the same sigil is dangerous when switching and this library's format is passed to standard printf. This started out as a drop-in replacement, but it isn't, and cannot be. The library should not encourage, but avoid accidental format string mistakes.

So thank you, this is fixed.


GCC has had printf checking for, what, 20 years?


If you want a modern C, use Go.

Unless you need maximum performance (SIMD, GPUs, etc.) you should use a developer-efficient, productive language.

Well-written Go executes almost as fast as C, and you will be more productive as a programmer.


Go is not a "modern C". It may or may not be a swell language, but it differs fundamentally from C:

1. Go is a garbage-collected language, C is not.

2. Go is a single-company-managed language, while C is managed by an international standards committee within ISO. You might not care about this difference, but its quite significant w.r.t. how future language developments happen.

3. C types are intentional, Go types are extentional ("structural typing").

These fundamental differences are not cases of one language being superior, or further advanced, than the other - they're about going in different directions.


I have been writing C for decades, but now I almost exclusively use Go.

What I mean is that if you like C99, you will probably like Go. Go can be understood as a modernization of C that doesn't abandon C's simplicity but adds a few important facilities that C lacks.

Go obviously derives from C. It's a very C-like language. It makes sense to view Go as an enhanced C that makes slightly different trade-offs and that is applicable to a slightly different set of purposes.


Honestly, I think D's betterC would be the right choice for someone that wants to keep writing C but wants modernized features. Go might be great for someone looking to replace C, but betterC is comfortable for someone that prefers to continue to write C.


Has anyone benchmarked Go's garbage collector lately? I like a lot of stuff about Go, but a lot of my work is in video games and real time audio, and I am extremely hesitant to use a garbage collected language for those things.


I've been working on a Go -> C++ compiler pretty much mainly for this use case, that skips the GC and concurrency stuff -- https://www.reddit.com/r/golang/comments/r2795t/i_wrote_a_si... -- Includes a demo video of a game I'm making with it and a built-in scene editor that uses reflection etc.

Repo for compiler itself: https://github.com/nikki93/gx (no README.md etc. yet, will be getting to that when I next have a chance (it's a side project)). It just takes around 1500 lines of Go thanks to the parser and typechecker in the standard library.

Go's perf was definitely non-trivially bad for me on WebAssembly.


> I know I can "do things to maybe cause the GC to run less" or such, but then that immediately starts to detract from the goal of having a language where I can focus on just the gameplay code.

Did you try implementing pooling (e.g. sync.Pool) for game objects/entities/components/etc? How did that go perf-wise?


I think the main thing is it starts to become a distraction from just writing the gameplay code. I don't have to implement the pooling stuff now that I have this compiler--naive / simple code tends to also start off with a high perf ceiling. But yeah if I did go further with the game in vanilla Go I might have to try the pool approach. Having worked on game engines with GC language runtimes (using Lua etc.) before, you always ultimately hit a perf ceiling due to lack of memory control and wish you could move out of it, but the runtimes don't give you a way to do that incrementally.

Ultimately in the game scenario the GC is actually just ... not helpful. Game logic code already explicitly handles lifetimes to some degree (eg. when this entity collides with that one, destroy it, etc.) -- emergently deciding when to free things based on references is usually not what you want. You do want it for resource management (like a texture cache), but it actually makes sense to kind of roll that on your own and adapt it to the game. So having a GC and then fighting it just sounds like an ill-fitted solution.


Allocation pooling gets around the Go GC but is also used in non-GC languages because it can drastically reduce the overall number of allocations AND improve cache performance. In a GC lang, it also forces you to be explicit with your lifetimes which can lead to better code (i.e. you need to Pool.Put rather than let the GC clean up).

In a well designed game engine, you will only need to implement it a handful of times (if that) to cover 99% of the hot code. Certainly not something to need to do for each class of game object.


Pooling is indeed what the ECS I use (entt) basically does--every component type has a contiguous pool of instances. Compiling Go to C++ lets me use entt among other things (target all of C++'s targets including Wasm, have some types of metaprogramming like statically reflecting all component types, etc.). The GC thing is just one of the results. There is no comparison for the amount of control you get vs. vanilla Go (where things can escape to the heap "whenever").


WebAssembly is notably a pathological case for _any_ stack-scanning GC, since the stack isn't addressable.


Beware of Go. Google may use it to do the "Embrace Extend Extinguish" move. It might be type safe, but not ideologically safe.

This is on top of Go being an unstable, immature language.


Go is very stable, and 12 years old.


Have you compared the performance and generated binary size of C vs. Go on WebAssembly?


For WebAssembly, use the TinyGo Go compiler:

https://tinygo.org/


Yeah the Go -> C++ compiler I linked from my other comment is pretty much overlapping with this idea. TinyGo is still a bit early afaict and also tries to exactly implement Go semantics but I'm kind of interested in extending / adapting it to my use case.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: