Example of an unnecessary change, func void main() instead of void main().
A better C should just be like what Apple is doing for iBoot, or Microsoft with Checked C, by disabling what makes C unsafe while being mostly compatible.
Changing stuff like
- ptr = array into ptr = &array[0]
- num_value = enum_value into num_value = base_value(enum_value)
- char *str into cstring str
- char str[10] into a proper array without pointer decay (alternatively like cstring, a new array like type declaration)
If it is C like, while requiring major code rewrites, then it is just another attempt, regardless of how much valuable work has been put into it.
C++ got its adoption by having been born in the same place as C and UNIX, as C pre-processor with zero friction in C toolchain, hence why it was quickly adopted by C compiler vendors.
Any C replacement needs the same ease of transition, specially for domains that will never move beyond C, because reasons.
> Example of an unnecessary change, func void main() instead of void main().
Introducing keywords in front of declarations is actually a required syntactical change to make the syntax context-free, i.e. to avoid having to carry around a symbol table during syntax parsing, as C requires. (Well, another method would be something like "declaredname :: type { impl }" but it requires more lookahead and it's a much more different syntax).
And all modern languages have added something like this. Personally I'm used to the C way and I like it for its terseness, but the modern approach allows for simplified parsers and better tooling.
In fact C started out like that just as well: There were only a constant number of keywords, like int, void, struct... that could introduce a declaration. But, with the advent of typedef (and C++ which automatically removes the need for the struct tag for all struct definitions) that changed, and in effect a symbol table is now required for parsing.
If you're breaking backwards compatibility anyway, why not go all the way?
I agree there is a need for easy transition, but I think that ease mostly comes from being able to be compiled and linked in existing tool chains like this project does.
I wrote a C compiler as well, got as far as compiling a hello world and all the includes it needs on a standard Linux or MacOS system. My plan was to have a sort of strict option that would be enabled if you wanted to use the new features like modules and namespaces, and the strict option would error on problematic C code.
Anyway I stopped the project when Rust released their first version without garbage collection. It seems to me they solved all my concerns and they didn't even need to integrate C itself.
> Anyway I stopped the project when Rust released their first version without garbage collection. It seems to me they solved all my concerns and they didn't even need to integrate C itself.
Illustrates why projects don't do this:
> If you're breaking backwards compatibility anyway, why not go all the way?
Rust usage, even with millions of dollars of marketing and eleven years of hype, is still almost a rounding error in terms of usage.
> is still almost a rounding error in terms of usage
Within the tiny set of scenarios where Rust is applicable as of now, isn't it completely dominating?
It's dominating in hype and excitement for toy and pet projects. This is just the first few steps. Too few people have dabbled enough yet.
The next step is enough stability/maturity that more open source and maybe fang infrastructure can use it. And when the reasoning behind this becomes widely digested, most "green field" projects will choose Rust.
Millions of dollars in marketing? 1.0 was released in 2015. Yeah, it will take a couple years more before a 6 year old language will replace a 40 year old language, but we'll get there no doubt.
But that's not the right comparison. The use of the C language itself is a rounding error compared to all the languages that replaced it. Java, C#, C++, Python, Javascript, Go, they all replaced most use cases of C in modern programming. Rust is just attacking the last vestiges of C's usefulness.
> If you're breaking backwards compatibility anyway, why not go all the way?
One tiny hobby language I use was changed to accept semicolons as an additional end-of-line, just to allow more easy copy-pasting of C code. It's surprising how portable C is in that sense too. If you're going C-style, probably best to stick C-style. (But I used fn in my own C-alike... who am I to judge?)
`ptr = array` vs `ptr = &array` (no [0] needed) is also part of being able to capture array sizes during compile time for macros, and also being able to copy arrays by value in an easy way. I don't think there would have been a way to do it consistently while retaining the pointer decay. In general, all implicit conversion are associated with difficulty when trying to add other features – just to explain why the pointer decay is dropped.
`num_value = enum_value`... that might actually happen. I haven't decided.
"`char str` into cstring str"... no? The whole story with strings haven't been 100% decided yet, but likely this will be: `char str` raw character string `char[]` a string slice, preferred over null terminating char*, and finally `String` which is a userland, dynamic string.
Something you can do with C3 is that you can convert as much or as little you want from C to C3. C3 is ABI compatible with C, so you can just compile using the C3 compiler for some files and then with GCC or Clang the C files. In fact I did this with vkQuake, converting a little bit of code into C3 and removing that from the .c files, then compiling the C3 code with c3c and the C code with Clang, then linking it together and it runs as if all had been written in C.
As a long time C programmer I love seeing these sorts of 'better C' languages.
I absolutely abhor modern C++ syntax, but there's one thing I think they deserve credit for. The C++ community is thinking deeply about first principles, consistency, composability, memory models and forward progress guarantees. I hope anyone looking to improve C learns from the longsuffering of the C++ world while avoiding many of their syntactical mistakes and evolutionary half-steps.
I really hope the C++ standards committee has a well funded legal purse. In 20 years when the lawsuits start coming from all the ex-programmers battling brain tumors they're going to need it.</s>
C++ already improved C. For example, you can now declare variables in the middle of a code block. Or write "for(int i = 0;" These things were not possible in old versions of C.
Unfortunately the few improvements are overshadowed by many more deprovements, the most critical being that the C subset in C++ has been forked into its own incompatible C dialect which is no longer compatible with C99 and beyond.
IMO, what C++ brings to the table is mainly everything that follows as a consequence from RAII.
And RAII, that's just some syntax to make writing badly structured programs practical. Because it does the tedious parts of matching "parentheses", it allows you to use a lot of them, without consideration if you could refactor the program to get away with fewer of them.
RAII also comes at the cost of requiring or encouraging "features" like exceptions and all the good stuff like copy/move constructors and what not. All highly non-orthogonal language features that require lots and lots of special cases and extra boilerplate.
I think money from Sony, Nintendo, AMD, NVidia, Microsoft, Apple, Google, Microsoft, IBM, Intel, Codeplay, Facebook should be enough for those lawsuits. :)
Aaaa, people are doing a lot of C alternatives lately, I should really make a post for my C alternative before the field is completely saturated... But it's not ready yet :(
I kind of want to at least get hashmaps in before I go public.
Re C3, I think the README could do with more sample code? Not exactly "Hello World", but something to get you hyped about using it.
Things I like:
- macros! macros are fucking awesome. giv examples
- modules are just a straight win
- built-in dynamic arrays, yess.
- compile-time execution is kind of a precondition for macros. Hopefully the same system.
- I've always wanted to play with generic modules. `import foo!(int) as IntFoo;` It seems a logical extension.
- Result-based error handling for the win. Though it really depends on language support how straightforward this is; it can easily degenerate into very spammy error handling. Definitely would like to see examples of this.
- Built-in strings: hopefully UTF-8!
- No preprocessor. Heck yes, it's a crutch.
- Pre/postconditions are nice, but they make a lot of mess on inheritance.
- Immutability by default is definitely a win.
Things you should totally steal from my language: :)
- Format strings are just nice.
- Packages as a generalization of modules: a package is a folder in the same way a module is a file. Dependencies between packages must be explicitly stated. This makes the build system's dependency tracking actually meaningful by effectively doing away with the global search path. I wish more languages would do this.
- D recently acquired automatic C header file import. (Neat has this as a macro.) I cannot overstate how useful this is for hitting the ground running.
- I don't know if your macro implementation has quasiquoting (it's hard to tell from the examples) but if not: add it. This makes macros immensely more convenient.
Hopefully just bytes, which trivially allows storage of UTF-8.
> built-in dynamic arrays, yess.
Dynamic arrays are only few lines to implement. There are different ways to do them, and no matter how, there will always be some problematic aspects. Not sure why you would want to choose one particular implementation and elevate it to a higher status.
> No preprocessor. Heck yes, it's a crutch.
It's ugly and inexperienced users will write bugs using it, but it's also tremendously useful. You mentioned quasiquoting as an alternative, but I'm not positive that it works as a preprocessor replacement for a language that lacks the "homomiconity" of LISP. Are there examples that show it works?
> Dynamic arrays are only few lines to implement. There are different ways to do them, and no matter how, there will always be some problematic aspects. Not sure why you would want to choose one particular implementation and elevate it to a higher status.
Just having a default in the language is insanely useful. I can only appeal to experience here (with D, which has built-in arrays), but I never want to be without them again. This goes doubly for my language, where dynamic arrays are actually a bit involved due to the need for slices, refcounting and capacity tracking for the doubling strategy on append. Not something you want to reimplement everywhere.
> It's ugly and inexperienced users will write bugs using it, but it's also tremendously useful. You mentioned quasiquoting as an alternative, but I'm not positive that it works as a preprocessor replacement for a language that lacks the "homomiconity" of LISP. Are there examples that show it works?
As an example, something like
#define SQUARE(X) ({ typeof(X) x = X; x * x; })
could in a language with macros (and a better function macro syntax than I have at the moment :p) be rewritten as
macro SQUARE(X) ({ typeof($X) x = $X; x * x; });
Which has the exact same effect, but does not suffer from the C preprocessor problems caused by string/token interpolation. Also, errors can be easily and cleanly attributed to the actual location they occur, because SQUARE's nature is a parse tree, not a token list.
As for macros like `#define BEGIN {`, I consider it an advantage that they don't work. :)
> refcounting and capacity tracking for the doubling strategy on append. Not something you want to reimplement everywhere.
Not something I want everywhere, in the first place. Especially refcounting.
> macro SQUARE(X) ({ typeof($X) x = $X; x * x; });
Yes, stuff like that works, but only if your replacement body is a fully formed syntactical expression. Such a "syntactical macro" system can be nice because it's safer, and it's applicable in _most_ cases. But not for all - there are many situations where the full generality of C preprocessor macros (which are lexical / "token list" macros as you say) are useful.
X-macros might be the example that I use most, and that wouldn't work with syntactical macros.
Another application are macros around for loops, like FOREACH_FOO(...) or SCOPED_LOCK(...) for example. All dirty hacks that I like to use from time to time and that I prefer immensely to systems built in to the programming language and lead to immense complications in the language.
Another example would be partial lists of any kind, for example lists of compiler intrinsic attributes
String interpolation is probably not in scope for a syntactical macro system either
printf("Hello from " PROG_NAME " version: " PROG_VERSION "\n");
Another example, but you might sneeze at that one - my current project is a pile of macro hacks, it contains a lot of stuff like
#define DEFINE_BUILTIN(bkind, bname, num_args) else if (is_identifier(t->t_name.buf, bname)) \
...
#define PREC(t, p) case t: prec = p; break;
It's more "temporary" stuff that will need factoring into data tables where there are extensions, but a lot of it is just good enough and will never be touched again.
--
Expecting code to be so clean, always from the beginning, that all macros one would need ever can always be defined by syntactically complete expression bodys, that is not going to work. Just like there have been many attempts at getting rid of text-based programming languages and moving to structured (syntactical) editors - that hasn't panned out either.
> Another application are macros around for loops, like FOREACH_FOO(...) or SCOPED_LOCK(...) for example.
This is the exact case where full macros can deliver a lot more power, more cleanly, than preprocessor macros. In this case, in Neat, you'd probably use a full parser macro rather than a call macro, and you could recognize arbitrary syntax, without allowing you to violate parenthesis order like C does.
Your macro can do whatever it wants, but it cannot conflict with other, preexisting syntax, such as (in C) defining half a loop or half a variable declaration. This way, it can fully own its syntax.
> Expecting code to be so clean, always from the beginning, that all macros one would need ever can always be defined by syntactically complete expression bodys, that is not going to work.
Correct, however the example syntax was a simplification of macros. Fundamentally, there's no reason a macro shouldn't be able to do anything that the compiler itself can do.
> Hopefully just bytes, which trivially allows storage of UTF-8.
Yes, it trivially allows, but then you'd need to deal with utf8-errors all over the program at runtime. Any method for string would need to be written such a way, that deals with invalid utf8 byte sequences.
Unix and C is a living example of what happens when you think of a string as of a arbitrary sequence of bytes. It gives a lot of edge cases which a programmer must bear in his/her mind constantly, because these edge cases would choose the least expected moment to jump on you. And then you'd need, for example, to invent a way to output an arbitrary byte sequence into a place where only UTF8 is allowed. I, personally, hate it. Like arbitrary byte sequences as a file names, even when I know that no one uses non-utf8 file names, I need to write programs working with file names in such a way, that allows arbitrary byte sequences, and to devise some syntax to output arbitrary byte-sequence into a terminal. Or into a web-page.
I see no good reasons to replace strings with arbitrary byte-sequences. If you need arbitrary byte-sequences then you have another abstraction for your task: an array. Much more powerful, because it can be array of bytes, of uint16_t, or of int64_t, or of your own struct. Why to spoil an abstraction of string with an ability to deal with arbitrary byte sequences?
Detecting UTF-8 encoding errors is only needed on the input/output boundary though. This should be solved in a string processing library, not in the language (and all other string processing functions in said library should not produce invalid output strings).
Not so easy. String processing library might want to iterate over chars, to do it a code needs to decode UTF8 string, if string is invalid-UTF8, then you'll get UTF8-error while trying to find a substring in a string. Or when trying to get a slice with chars from 5 to 12. Such an error could jump on you unexpectedly in any place of your program.
> This should be solved in a string processing library
Then a type String also should be defined in a string processing library. Either String follows assumptions of a string processing library, or it doesn't. If it doesn't it makes the task of writing a good processing library much more difficult, and you'll get errors thrown from inside of a string processing library, and you'll need to deal with them.
> String processing library might want to iterate over chars, to do it a code needs to decode UTF8 string
UTF-8 was specifically designed such that most code can deal with it byte-by-byte without any decoding step. I've written many parsers for example, they all just read byte-by-byte and special things happen at ASCII characters (such as ';' or '\n'), and it works trivially with UTF-8 inputs, I don't have the care at all.
I've also written a text editor with a complicated text rope data structure in it. Do you think I should have made a different text rope for each different text encoding the editor should deal with?
No - what my editor does in UTF-8 mode for example, at the visual and editing layer it pulls out data byte-by-byte from the text rope and interprets it as UTF-8. If there is invalid UTF-8 it has to deal with it. But hey, that is the reality of files on a file system - they can contain encoding errors, right? Deal with them! (if you can't simply ignore them).
> Or when trying to get a slice with chars from 5 to 12.
There isn't a single definition of "char". What even is a "char"? Is it a byte? Is it a unicode codepoint? Is it any other kind of Unicode combination of codepoints or glyphs or combine sequence and emoji modifiers or whatever all that junk is called?
If you need a specific subsequence of some UTF-8 encoded text, use a library that fetches it from the byte storage. There is no point in making a programming language type, because you'll lock in to certain usages, and next thing you'll need is a completely different type.
The reality is that data is stored in memory in byte sequences, and that's the representation that a programming language should expose. Everything else is code / libraries.
> I've also written a text editor with a complicated text rope data structure in it. Do you think I should have made a different text rope for each different text encoding the editor should deal with?
No, I'd think that you should make a text rope for UTF-8, and then change encoding of text on the boundary. Or, it might be not an UTF8 but some other representation of Unicode, it depends. I see no reason to create a structure for an effective manipulation of strings without choosing a representation of a character at the compile time, because otherwise it would be slower than it might be. The more assumptions about your data you've made at the compile time, the less runtime conditioning you'd need, the faster your code would be.
Believe me, I had dealt with different encodings all the time. I'm Russian, and we had three widely used unibyte encodings for a cyrillic, plus different encodings for Unicode. So one had to deal with all of them all the time. The easiest way is to deal internally with Unicode only and to change encoding on the boundaries where your program communicates with an external world. There (on the boundaries) you can deal with errors, like character which cannot be represented in an output encoding (or cannot be represented in an internal one, but if you use Unicode it wouldn't be a problem). You can treat user input as an input in an external encoding and throw errors if she inputs something that cannot be encoded in an output encoding. It works all the time, while making your program able to deal with different internal encodings is a PITA, with errors thrown from the most unexpected places, with a spaghetti code trying to deliver errors to places where these errors could be sensibly dealt with.
> There isn't a single definition of "char". What even is a "char"? Is it a byte? Is it a unicode codepoint? Is it any other kind of Unicode combination of codepoints or glyphs or combine sequence and emoji modifiers or whatever all that junk is called?
You can use all of them, just pick distinctive names for them, like "char", "glyph", ... and any other you like. But when you did it, you'd want to know where are the boundaries of these things. You'd want to make slices of sequences of these things. If you cannot rely on a validness of underlying UTF8 then you'll be in trouble.
> If you need a specific subsequence of some UTF-8 encoded text, use a library that fetches it from the byte storage. There is no point in making a programming language type, because you'll lock in to certain usages, and next thing you'll need is a completely different type.
When I need to work with bytes, I use an array of bytes. Not a string, but an array of bytes. It was hard to grasp after years of experience with a C, but I did managed it at some point. Char is not a byte. Byte is not a char. Array is not a string, string is not an array. When I need an array to deal with bytes, I use an array. When I need a string to deal with characters, I use a string. It is a non-trivial idea for a C-programmer, because all his experience tells him that character and byte is the same thing. So if character is not a byte, then (he reasons) character doesn't exist.
If characters as codepoints is a too low abstraction for my task, I can create atop of it another abstraction dealing with glyphs, words, tokens, sentences or something. But the abstraction of codepoints must be a library feature, or I'd be forced to create it myself, to validate UTF8 all over the place, and so on. And if that so, then what the point to have an abstraction of string?
If characters as codepoints is a too high abstraction for my task, I can go lower and use an array of bytes.
It is really an easy idea, just C as a language tends to confuse people minds by teaching them that char==int8_t. At least my mind was confused and I managed to untangle that mess completely only around my 30th birthday. And several years later I've found that Rust's std is totally differentiate chars/bytes, strings/arrays as I do. I had fallen in love with Rust immediately.
> No, I'd think that you should make a text rope for UTF-8
But that's what I made. UTF-8 is encoded as bytes. I can store the bytes in the rope just fine.
The rope has a very simple API, basically read() and write() functions, just like a standard FILE I/O API. Do you want to pick on file system developers that they should add APIs for write_UTF8(), write_LATIN1(), write_KOI8(), write_BYTES(), etc.? And then go to network API designers to do the same for the socket I/O functions? And so on? Of course you don't do that, that would be very bad factoring.
And it's just the same for a rope API.
> The more assumptions about your data you've made at the compile time, the less runtime conditioning you'd need, the faster your code would be.
This is true in general, but the rope is just a storage. No processing happens there. The rope couldn't care less what things you store there. There is no point of having multiple identical read/write implementations.
But if you insist, I recommend to ask for an UTF-8 optimized HDD at your local computer shop :-)
> making your program able to deal with different internal encodings is a PITA
If your program has to deal with multiple external encodings, either you can convert at the boundaries to a canonical internal encoding, or you can't in which case it probably becomes a little more work since you have to convert at different places.
> [...] Char is not a byte. Byte is not a char. [...]
In this paragraph you seem to be confusing the C's "char" with the much more fuzzy idea of "Character" which has like 13 valid definitions.
C's "char" is abstractly defined as the smallest addressable unit of memory available on the machine (required to have at least 8 bits), and historically there have existed 8-bit, 9-bit, 16-bit, or even 36-bit chars. In today's practice it is universally taken synonymous for (8-bit) bytes since all hardware is 8-bit by now. Some people like to be pedantic about the distinction between byte and char, but I most often do not, especially since char is the generally interoperable type in C (with respect to type punning etc.), while uint8_t to my knowledge is not.
"Character" is sometimes understood as "Unicode codepoint" (typically represented as a 32-bit entity, or even as a UTF-8 encoded slice of bytes) or in some cases understood as "Unicode glyph" (probably represented as a slice of codepoints), sometimes understood as even other things.
There is a prominent counter example to the 8-bit char: DSP's, TI has several DSPs with 16-bit chars and I think I've encountered one with a 32-bit char. These boards are actually pretty common in industrial settings
Sure, but all those things belong into a library, not into the languages (at least not into a systems programming language). To the language, a string should just be an opaque bag of bytes and it needs a convention how string literals are layed out as such a bag of bytes.
> Unix and C is a living example of what happens when you think of a string as of a arbitrary sequence of bytes.
This stuff is specifically some of the big design wins of these systems. I recommend you to peek over at the Win32 API for example, with its myriads of *A and *W functions, required compiler settings and macro magic to switch between those (even though of course they can't really paper over the difference), then best practice recommendations that have changed multiple times in history, then strange bugs that appear when some invalid data has sneaked into a system that was assumed "pristine"...
Making a difference between "UTF-8" and arbitrary byte storage is like racism. It isn't only socially inacceptable, it also creates a massive bureaucratic overhead and requires duplication of implementation efforts. I call it bad engineering.
On Windows, unless you really, really, really have to support Windows 9x/ME you ignore the A functions and exclusively use the W variants. You then also don't need the macro magic to switch, as your code doesn't have to work on both Windows NT and 9x (which is the only reason to use that macro magic).
And a difference between "UTF-8" and arbitrary bytes is insofar sensible in that you can perform text operations on the former, but not on the latter. Unicode cannot be treated as just a byte stream as soon as you want to do something to the content (or just for very narrow circumstances).
In fact I use exclusively the A variants as far as possible, because anything else is really bad engineering, as I've explained. Even Microsoft has started to acknowledge this, and whereas the A variants had been implemented as wrappers around the W variants before, I heard they started to reverse that and started recommending the A functions.
If you browse around various documentation, you'll see contradicting statements which variants are recommended, which I take as another sign that the idea of making a distinction on the type level is simply a bad idea.
As to the macros, yes, if you give the A or W explicitly, you won't need the macro setting that translates the unspecific names to the A or W variants. And as said, the macros aren't a good idea anyway, as it's still extremely hard to properly abstract the distinction (the types are different sizes!!), so code is generally tied to a specific choice either way.
> Unicode cannot be treated as just a byte stream as soon as you want to do something to the content (or just for very narrow circumstances).
It can be stored as a byte stream. Whenever you work with the data, you might need to operate on transformed representations - 32-bit codepoints, or larger combinations, or even more complicated stuff like words, sentences, paragraphs, tags, whatever it needs to do the task as done. This is programming, you transform data to achieve things.
What I say is that it's stupid to make a distinction on the type level between things that are entirely the same thing in memory, and that are going to be used for the same things. It's stupid because it unnecessarily create incompatibilities between data and introduces unnecessary "conversions" / copies and requires more code.
Firstly you do not need A functions. They are there only for old programs. For really old programs from win9x era.
Secondly: if there are worse places than Unix, it doesn't mean that Unix is good.
I mean, I love Unix, but Unix sometimes a nightmare to deal with. And kernel believing that "string" means "a byte array with a zero-terminator" one of the worst things of Unix. If one needs an array of variable size, it might do either:
struct unsized_array_t {
size_t size;
uint8_t bytes[]; // being really a bytes[size]
}
The second, for example, can easily replace C strings, you only need to pass one more size_t into functions dealing with strings. You can get substrings without modifying string or excessive copying. But then you might wonder why to call this "string" if it is just an array with no compile-time known size.
The first one is tricker, you'd need to pass around pointers to it, and slices of it would be a problem. But at least you wouldn't need to scan memory to learn the size of it.
Unix strings are not strings but zero-terminated arrays of non-null bytes. It would be obvious if you try another way to deal with a variable size. The Unix-Fathers had an idea how to deal with string, but their idea was proven bad. A lot of functions they invented to deal with their "strings" are now deprecated or even forbidden. Like `gets` for example. I'd hate to use strcpy, I'd better use strncpy, and other functions with `n` inside. But if you started to juggle not just with pointers to strings but with those n's also, you would think of representing strings as structs with an embedded n. And -- viola -- all the Unix's string crap goes through a window. You end with variable sized arrays, and you doesn't need strings anymore. Without losing anything useful.
> Making a difference between "UTF-8" and arbitrary byte storage is like racism. It isn't only socially inacceptable, it also creates a massive bureaucratic overhead and requires duplication of implementation efforts. I call it bad engineering.
> The second, for example, can easily replace C strings, you only need to pass one more size_t into functions dealing with strings.
C doesn't prevent you from creating such a string type, or just passing a separate length as argument to functions. In fact it's the right thing to do (pointer + length) in many cases.
All C does is offer you string literals that are zero-terminated, which is typically practical. Often those are all you need, i.e. to store a plain-text identifier that doesn't allow for NUL characters anyway - so NUL can be used as a sentinel for efficient storage.
Then there is the C standard library, which is a bag of bad practice. Just ignore 90% of the stuff in there. stuff like strtok() etc. is bad. Mostly you want to use memcpy(), memcmp(), strcmp(), maybe strcpy()/strncpy() etc. Then there's stuff like malloc()/free() and stdio and especially the formatting functions that you can use to get something up and running before possibly replacing them with something better. Well, that's about it:-)
"Until recently, Windows has emphasized "Unicode" -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes."
To make it "safe" as in "protect against out-of-bounds accesses", slices would be enough. My strong opinion is that "data shape" concerns should be separate from "storage allocation" concerns as far as possible.
This is especially true for the "to use across libraries without wasting time on conversions". I've said it many times, plain C interfaces (pointer + length, or slices if you insist but I don't like them because they are a less normalized representation) ... are the best way to design interfaces optimizing for interoperability. No need for any pointless conversion, just tell the API where your data is located. The physical fact that is needed for communication is the memory (address + length), it's the necessary and sufficient information to carry out the task.
Yes, nowadays "safety" is not just about Out-of-bounds accesses but people expect the system to even protect against resource leaks, double-free, user-after-free, and race conditions. But even when it is the goal to machine check this by introducing a system that requires thinking on the small scale in isolated mini-units ("classes"/"types") - is there a point in locking in on a specific implementation of dynamic arrays? (Not a rhetoric question)
To loop back to the original point, Neat arrays are pointer + length + base. This is necessary for refcounting, but it also allows managing capacity, ie. appending to slices. D gets away with pointer + length, but it can ask the GC for capacity.
So Oracle, Apple, ARM, Google and Microsoft (Intel bothched their design) are investing piles of money moving the industry into hardware memory tagging for nothing?
Maybe we should tell them to stop if they are so good.
"Oracle, Apple, ARM, Google, and Microsoft" are actually a LOT of programmers and non-programmers with a huge variety of opininons, and I'm sure opinions similar to mine can be found there as well.
Also, they have loads and loads of money and their jobs come with prestige, so they have no problem attracting developers to jobs that are perceived by some programmers (such as me) as boring boilerplate jobs that make me miserable.
That answer was more related to the dynamic arrays discussion. If you want to move to hardware memory tagging, is that even a big thing? In any case my understanding is that it would work with pointer + length just as well, because the hardware tags are created at buffer allocation time, not based on arguments passed to a function.
Of course it works with pointer + length, the whole point of hardware memory tagging is that is a proven failure with 50 years of examples, that leaving to C developers the task to manually prove pointer + length are valid, just doesn't work regardless of what is being sold as story.
So lots of money is being burned to ensure that C code is caged and does no harm, in scenarios where C is to still be used.
Regarding dynamic arrays. Right now it looks like I have a way to do nicely namespaced (macro based!) operator overloading for types (it was unclear whether it would be possible with the feature set or not), so with this actually being in the language the need for built-in dynamic arrays is smaller - since you can use the normal foreach on the dynamic type - and so it can be implemented as a library type. (I want to emphasize that operator overloading is not done by functions, but through macros in C3 - so it's different from C++)
- Format strings, are you thinking about string interpolation or?
- A module isn't a file in C3 but that's a deep subject to get into.
- Automatic C header file import is something Zig is also touting as a feature. This was something I thought I would want early on. But as I worked myself through examples I find that it's hard to get right in all cases, which means that you'll run into cases where your language "almost" works. Plus now you actually tied your language not only to the C ABI, but the entire C standard (note how headers will for example contain static inline code that you will need to parse, or macros that define aliases of functions and builtins). That said a tool to automatically extract a "best effort" interface is planned.
- Regarding macros the difficulty has been to balance power with readability. So that is something which I am considering but still haven't quite embraced. Instead I have macros taking unevaluated expressions and you can in compile time get different things, e.g. `$offsetof("Foo", "a")` will give you the offset of the member `a` in the type `Foo`, but it's done through a special function rather than allowing straight up string interpolation. We'll see once the standard library work starts for real.
D's ImportC is actually a full C compiler (or will be). You're supposed to use it on header files, but the capability is there - it's not just a binding generator.
Huh? Please explain if you don't mind. UTF-8 is completely backward compatible with 7-bit ASCII, so if you don't need international characters everything remains exactly the same, and if you need international characters, strings are still just regular "bags of bytes". The only difference (for international strings) is that the number of bytes in the strings isn't the same as the number of characters (or rather UNICODE code points). But that's only relevant if you actually need to process a string down on the character level. Most CRT ASCII string functions work just fine on UTF-8 data, even strtok() if the delimiters are 7-bit ASCII.
Because Neat is self-hosted and frequently depends on syntax features added a few commits ago, building from fresh source can take up to half an hour. You also need a D compiler (for the initial bootstrap version), but that comes with gcc nowadays. If you wanna try that, just run `bootstrap.sh`. (You may have to patch it to use gdc, not ldc, but the commandline should be the same.) This takes a while because it has to build the compiler something like 60 times, each with the previous version.
Don't do that though! Gimme a ping and I'll slap a new release tag on it. The releases use the C backend to generate a C dump of the compiler, that can then be shipped and compiled on the target system.
Neat is more a D-like than a C-like, but it only breaks C syntax in areas where I think C straight up made the wrong call, like the inside-out type syntax.
Memory management uses automatic ref counting, with some optimizations to keep number of inc/dec manageable.
The thing I'm most proud of is the full-powered macro system, which is really more of a compile-time compiler plugin system.
`compiler.$expr xxx` is itself a macro, that parses an expression `xxx` and returns an expression that creates a syntax tree that, when compiled, is equivalent to having written `xxx`. It's effectively the opposite of `eval`. In that expression, `$identifier` is expanded to a variable reference to "identifier".
So `ASTSymbol test = compiler.$expr $where && $test;` is equivalent to `ASTSymbol test = new ASTBinary("&&", where, test)`. (This shows its worth as expressions become more expansive.)
All in all, this lets you write `bool b = [all a == 5 for a in array]`, and it's exactly equivalent to a plain for loop. You can see the exact for loop at line 103 in that file. `({ })` is stolen from gcc; google "statement expression".
The one thing I'm still blocking on is hashmaps, once that's in I'll make a proper announcement post.
So, Objective-C #import plus some linting rules and minor syntax adjustments? I somehow doubt that this would be worth the change (especially if you have to "extern" stuff like a C++ programmer). People can live with all kinds of minor annoyances but tend to switch mostly for bigger changes.
But hey, hope that something interesting grows out of this. I quite like almost-C languages, but then again I also liked the Bourne Shell macros ;)
One of these days I have to write a Oberon-in-sheeps-brackets.
For an interesting "C++ that doesn't look like C" variant, consider SPECS (Conway/Werther;1996)[1].
No, that's probably not a good description. I'm not saying you will appreciate the language, but it is basically C.
- "Failables" (which is somewhat like Result) offers an alternative to error handling which mostly mirrors how one commonly does it in C, but with conveniences. Since it works a bit different from all other error systems, I'd have to point to the docs for a summary. :(
- Semantic macros where the big win is that they are easier to read and write. (This is heavily inspired by the ASTEC macro system for C)
- Generic modules, which works similar to macro-based generics in C, but easier to read and work with.
- Subarrays (slices), yes they make a huge difference even though they're an obvious addition.
- A bunch of GCC extensions
- Optional design by contract
There are no objects, and certainly no dynamic OO system. There are no constructors and destructors or similar explicit code.
Calling C is straightforward, you just need to declare that it exists, like you would in C.
So `extern func int printf(char*, ...)` -> now you can do `printf("Hello %s\n", "World");` the opposite also works, so if you define a function `func void foo() @extname("c3_foo")` you can then call it from C as `c3_foo()` (or skip the `@extname` but you would have to call `my_module_foo()` instead due to namespacing.
No mandatory header files
New semantic macro system
Module based name spacing
Subarrays (slices) and dynamic arrays built in
Compile time reflection
Enhanced compile time execution
Generics based on generic modules
"Result"-based zero overhead error handling
Defer
Value methods
Associated enum data
Built in strings
No preprocessor
Undefined behaviour trapped on debug by default
Optional pre and post conditions
Associated enum data sounds like variant types. It has namespacing and it has generics. That alone is almost worth it to me (not that I'd use a language with no community.) If it had lambdas, that'd be the quadrivium for me.
This paper [1] should be required reading for all would-be C
replacement authors who don't want to follow in the footsteps of their
unsuccessful predecessors. As for my two cents, the design of modern
programming languages seems to be based on a premise of distrust for
the programmer, but the designers appear to have forgotten that the
distrust goes both ways. In [2], an old book about C programming,
readers are reminded that rarely used language features incur the risk
of the compiler writer not having implemented them correctly. It's
unlikely that any fancy new feature would make me prefer your C
replacement because I don't even use all of the features of C most of
the time.
That being said, if you're crazy enough to write a C replacement in
the first place, then maybe you'll be crazy enough to incorporate my
crazy suggestion for function specialization. In functional languages,
given a function f(w,x,y,z) with many parameters, it's straightforward
to define another function g(a,b) with fewer parameters as being equal
to f(H,b,a,K), a more general function specialized by fixed constants
for some of the parameters and arbitrary permutations of the others.
I'd like to be able to express the transformation that takes a pointer
to the general function f as input and returns a pointer to the
specialized function g, such that the pointer to g can be used in the
same context as any other pointer to a function of that arity and type
defined the normal way. I'd like it to be possible in a library
written be me operating on user-supplied pointers to functions not
known in advance, I'd like to avoid workarounds such as global static
variables or thread-specific storage, and I'd like you to convince me
that your implementation of this feature is too simple to be wrong.
Coincidentally, I am also designing my own "Better C" language and I deeply considered the benefits and drawbacks of modules and headers and headers, for practical purposes, come out slight ahead.
Headers have some really unbeatable advantages over module interfaces: the tooling is incomparable plus the header serves as the API documentation.
While modules are great, in practice you have interop problems and you still need a tool (if it is a binary file) to extract the interfacing API.
Modules are better from an elegance and clean-design PoV, but headers win on the practicality.
Headers are a pricier mistake than the billion-dollar-mistake.
IMO they are the #1 reason C still has a lot of adoption, but not for good reasons: headers hinder interoperability. To interop with C you either need a C-compatible compiler (C++, Obj-C, and now D, Zig) or a human writing interop code by hand. Both things come with a hefty cost, and the first carries the danger of your language having to keep terrible features forever (the C part of C++).
Headers hinder the evolution of the language and the ecosystem.
Modules, on the other hand, enable interop easily, preventing lock-in.
> To interop with C you either need a C-compatible compiler (C++, Obj-C, and now D, Zig) or a human writing interop code by hand.
It's the other way around - C is so easy to interop to that you don't even need to do anything. Modules, on the other hand, or such an effective lock-in that you can't reasonably automate the interop with it.
With a C header, all I need to do is run a single command that generates interfaces based on it for almost every language there is. Try that with any module-based system.
> Why not generate the headers from the modules, then users don't have to worry about them.
That's the approach I took; this brings the limitation that the only thing that can be publicly exported from a module are things that are compatible with C code.
It also makes modules useless - what use would a module be if the interface is fully specified in a header anyway? Just read the header.
TBH, I'm still in the design phase so I tend to change my idea of what's good and bad quite frequently.
> But then all macros go to headers, even private ones.
I don't understand what you mean by this; specifically, whats so special about macros? You can always define module-private macros in the .c file and not in the header like with any other symbol.
Yes, headers do mean that if modules A, B and C have shared private information then that information is not private (as it is shared via a header), but it doesn't mean that A cannot have private information that is visible only to A.
I deliberately removed the "static" methods, so that above would not be allowed. Why I found that to be a good idea is a longer discussion.
The main motivation why it is nice to have it this way is that it's straightforward to understand what the type is when doing `&Point.add`. I have considered a `this` but it's not felt super important to have. Also, note that this is allowed: `Point *x = null; x.doSomething();` - this is made more obvious by taking the reference as an explicit parameter.
> Not to shit on decent work, but I don't see a reason to use this over Zig.
Maybe you don't mean to sound harsh but from their About page:
" It is an evolution of C enabling the same paradigms and retaining the same syntax as far as possible."
I don't think Zig has the same goal.
Also, they usefully compare the languages:
"In Zig but not in C3
Pervasive compile time execution.
Memory allocation failure is an error.
Zig's compile time execution is the build system.
Different syntax and behaviour compared to C.
Structs define namespace.
Async primitives built in.
In C3 but not in Zig
Module system
Integrated build system
Built-in strings, maps, vararrays
Optional contracts
Familiar C syntax and behaviour
"
It seems to me that the only meaningful difference in their goals, is that C3 wants to stay as close to C as possible, including having familiar syntax.
> Also, they usefully compare the languages [...]
The comparison seems disingenuous.
Zig has a module system where each zig file is a module and it has a mechanism to specify public parts of the module.
One of the standout features of zig is the integrated build system, which uses a reliable caching mechanism to allow incremental builds without rebuilding the world. It's somewhat remniscient of bazel actually.
The zig std library features hashmaps and variable length arrays. Strings are not different from slices of u8. It's unclear to me what C3 means by "built-in strings". The only reference I could find was under "crazy ideas", where they vaguely refer to needing to figure out memory management. That seems like a pretty big outstanding issue for a C competitor.
So the unique feature of C3 appears to be the contracts system of pre- and postconditions, which are only used as language hints to the compiler.
A "module system" should be something beyond a name spacing scheme. So that's the difference.
Re: strings I had a stronger idea of that initially but I wasn't sure, so I kept postponing it, and due to features I added later it's now possible to make strings in userland... probably. I still need to think about this a lot.
But I agree that I should revisit this comparison to make it more up to date. For example an important difference would be that Zig adds quite a bit of UB on top of C, whereas C3 removes UB compared to C.
There are minor things Zig doesn't have like: substructs (I think!), trailing macros (allowing easy "scoping" macros, limited operator overloading (allowing for example userland lists use foreach), type methods are possible to extend by other modules. Then the whole way to do generic types and functions are different, with Zig building it on top of parameterized structs, whereas C3 uses parameterized modules, which creates a bit of difference.
> Zig has a module system where each zig file is a module and it has a mechanism to specify public parts of the module.
"Module system" has many meanings. In functional languages, it has much more flexible semantics than just namespacing and data hiding declarations, so perhaps C3 means it has this sort of module system.
> Funny how we can get stuck with a language from the early 70s and we're still in the process of replacing it half a century later.
I see that as a good thing. I’d much rather have a battle tested framework than everything being rewritten every 5 years in whatever is currently trendy simply because a new generation of developers are suffering from NIH (not invented here) syndrome.
I mean, yes C has its problems so I’m all for using safer languages, but just take a look at the mess that is front end web development and tell me that the alternative isn’t better.
I seriously hope languages like Rust do evolve into being much more than a fad. I’ve been in the industry a fair few years now and have seen languages fall in and out of favour (Pascal, Java, OCaml, Go, etc sure someone of them are still popular but nowhere near as much as when their respective hype machine was in town) and really what systems development really needs is another ‘C’ — as in a language that survives the next 50 years as a standard low level language that people can build solid operating systems from. Whatever language people want to code on top of that base is then fair game.
You’re being rather unfair with your interpretation of my post there. I’ve clearly demonstrated in my writing that I’m in favour of other languages growing. There’s a huuuge gulf of difference between my point about praising platform stability and yours about saying newer languages would never get change to prove themselves. Actually what’s happening right now with Rust being trialed for Linux kernel development, which I’ve acknowledged when I said I hope Rust turns out to be more than a passing fad.
To me this says that C got a lot of things right. I've noticed that even things people complain about, like the somewhat arcane promotion rules to "int" makes perfect sense given the constraints and the expected type sizes. It's not until now that this is somewhat breaking, and that's because with 64 bits `int` didn't follow the normal trend of being register sized.
I would say C3 is closer to Odin than Zig. Given that C3 isn't done yet, I'd personally recommend using Odin over Zig. So there is a clear philosophical difference between the two. Aside of course from Zig departing from C syntax all over.
Zig is in many ways a more ambitious language. Not just in the language itself but in the tooling. Unlike C3 it has the ambition to "do things right" - often by doing things differently. Sometimes that pans out, but sometimes not. For example Zig adds a lot of UB "to be fast", but there's a particularly worrying intersection of (a) UB in overflow, (b) introduction of unsigned overflow and (c) implicit type widening which adds a lot of hidden UB / runtime aborts.
The simple example here is `a : i32 = b + c + d`. Even knowing that b, c and d fits in a short is not sufficient to guarantee this will not have UB. And even if we know that `b + c + d` does not trigger UB, we cannot guarantee that `b + d + c` does not have UB(!). The usual solution it to require explicit widening casts, but that solution seems hard for Zig due to relying on a lot of non-power-of-two types in the core language.
I guess also it's a matter of how much you think C sucks :D
I like C, and C3 is basically just trying to tweak a few things C can't change due to legacy reasons. It doesn't try to be a new "let's write everything from scratch because C is bad" kind of language, if you hate C then C3 isn't for you.
The "inconvenient truth" is that C will never really be replaced, just augmented by "better C" languages, or at most wrapped away under language bindings. Even if Zig's goal is to replace C (which is an important motivation to really cover all use cases of C), I think that most non-trivial real-world projects will actually be mixed C/C++/Zig projects, at least for the next few decades.
A better C should just be like what Apple is doing for iBoot, or Microsoft with Checked C, by disabling what makes C unsafe while being mostly compatible.
Changing stuff like
- ptr = array into ptr = &array[0]
- num_value = enum_value into num_value = base_value(enum_value)
- char *str into cstring str
- char str[10] into a proper array without pointer decay (alternatively like cstring, a new array like type declaration)
If it is C like, while requiring major code rewrites, then it is just another attempt, regardless of how much valuable work has been put into it.
C++ got its adoption by having been born in the same place as C and UNIX, as C pre-processor with zero friction in C toolchain, hence why it was quickly adopted by C compiler vendors.
Any C replacement needs the same ease of transition, specially for domains that will never move beyond C, because reasons.