I'm enjoying this so far although a couple of things have made me scratch my head in the example code allocating a vector.
int create_vector(struct vector *vc, int size) {
vc->data = 0;
vc->size = 0;
if (vc == NULL) {
return VECTOR_NULL_ERROR;
}
/* check for integer and SIZE_MAX overflow */
if (size == 0 || size > SIZE_MAX) {
errno = ENOMEM;
return VECTOR_SIZE_ERROR;
}
Accessing the fields of vc before the NULL check seems like an error. Also, would it not be simpler to change type of the size parameter to size_t so that it's the same type as the size field of the vector struct?
Yes, by having the NULL check after those members have been accessed, you're telling the compiler that vc cannot be NULL (because accessing the members if vc is NULL would be undefined behavior). The compiler can (and GCC will) eliminate the NULL check completely.
huh :) have you actually tried doing that i.e. call the function with NULL params and see ? you will see a segfault.
anyways, rather than rely on vagaries of -O2 implementation on compilers, i would rather be explicit about it and just assert invariants...
I believe you got my comment completely bass ackwards.
I'm not describing some optimization that you, the programmer, can do. I'm talking about what the compiler can (and will) do by assuming your code doesn't have undefined behavior.
In this particular case, the compiler recognizes that if the pointer is NULL, undefined behavior has already been invoked and it can therefore eliminate the check.
> In this particular case, the compiler recognizes that if the pointer is NULL, undefined behavior has already been invoked and it can therefore eliminate the check.
> Accessing the fields of vc before the NULL check seems like an error. Also, would it not be simpler to change type of the size parameter to size_t so that it's the same type as the size field of the vector struct?
What should be done in case vc is NULL? Currently it almost certainly causes a segfault, but I've seen people insist on returning an error code when a NULL argument is given but that might go unchecked. A segfault is easy to debug, a forgotten error code is not.
int poke_foo(struct foo *foo) {
// if you're writing a library, this is asking for trouble...
if(foo == NULL)
return -1; // eventually someone forgets to check for this
foo->bar += 1;
return 0;
}
The only sensible thing you can do is add an assert. That will stop the program when the error happens and make it easy to debug.
int poke_foo(struct foo *foo) {
assert(foo != NULL); // asseting here...
foo->bar += 1; // ... is not much better than segfaulting here
return 0;
}
But in the end, we're still reliably crashing the program if NULL is passed and it is as easy to debug as a segfault would. The assert may keep your static analysis tool (like Coverity) happy but it won't really make a huge difference.
NULL pointers are generally not very difficult to debug, what is hard is dealing with free()'d pointers and other non-null invalid pointer values.
I'm on a weird embedded platform where segfaults are almost impossible to debug; the only way to get information out is a log facility over USB that loses the most recent lines of log when it segfaults and resets :(
Would an assert be any better? I guess you could print out a message to the log and enter an infinite loop if an assertion fails.
Similar problems may haunt you when working in kernel space.
But still, null pointers are easy compared to other invalid pointers that can not be distinguished by their value.
Best practices in C are not as easy to make as in other languages because the environment the code runs in may be anything from a micro controller to an embedded platform to kernel space to modern user space apps. It's not so much about the language as it is about the environment the code runs in.
My actual solution is a combination of careful guesswork and a set of macros that write to a region of memory that is not zeroed at reboot. I use them to mark function entry and exit events; in lieu of an actual stack trace, this lets me progressively narrow down where the problem is.
You suggest using 'calloc' to avoid calculating the size of the allocation (which would be necessary with 'malloc'). This is fine, and you store the number of elements in the 'size' field. But later, in the grow function, you double 'size' and then use it as-is in the call to 'realloc' which expects size to be in bytes, not elements.
That is because to my knowledge we does not have something like "crealloc" to grow allocated memory effectively. Naive implementation of such function will lead always to coping memory from old to new area and effective would far more difficult to implement (thus more error prone IMHO).
Hi, my point is that you have a bug in the code. Your 'newsize' is the number of elements. When you call realloc (which expects a byte size) you have to take this into account and multiply with the element size. What you have in the code:
On conventional platforms, sure, but there's nothing in the standard stopping int being larger than size_t. It would make plenty of sense on segmented or small-pointer architectures, like x32 (word size 64 bit, pointer size 32 bit).
> "What happens is that variable i is converted to unsigned integer." No: 'long i' is converted to 'unsigned long'.
Actually, unsigned long is an unsigned integer. He didn't write unsigned int.
> "Usually size_t corresponds with long of given architecture." No: For example, on Win64 size_t is 64 bits whereas long is 32 bits.
"Usually" is the keyword here. He could have said "Usually size_t has at least the same amount of bits as long" and it would be better related to the referred rule.
size is not given a type, but if it is a size_t, then the comparison is constant and if statements is always false, unless size == 0. The second part is useless, since the maximum size of size_t == SIZE_MAX.
Worse, it is not even a certain check against integer overflow. A particularly large expression could have overflowed substantially enough that it becomes positive again!
This, in fact, was the source of a vulnerability in PHP [1]. (The way they 'resolved' it, uh, isn't much better.)
There is almost never a magic bullet for integer overflow. Programming secure systems requires thought.
It seems that I have totally messed up whole vector example by last minute changes. The check was meant to illustrate what eliteraspberrie described in previous comment.
I'm not sure how malloc/free could be considered "advanced" C programming techniques. It's pretty hard to write any non-trivial program in C without using malloc/free and knowing the differences between static, stack-allocated and heap-allocated memory.
You'd be surprised with how far you can get without dynamic memory or recursion...
Still, I agree with your general sentiment. That some of these are "advanced" topics is troubling. Too many programmers today don't know how a computer works. Our ideas of "mastery" are way too low.
The challenging part is to get details of using them correctly and avoid security and portability issues and I consider it neccesaty introduction for section dealing with memory management.
True, but return type of free() is void, thus it says nothing about return value. The POSIX says: "If size is 0, either a null pointer or a unique pointer that can be successfully passed to free() shall be returned."
Depends on one's reference point. A lot of people in the web development world (for example) would probably consider manual memory management to be an advanced topic, because typically they've "grown up" with languages where memory management was so removed or abstracted as to be almost irrelevant beyond trivial "do I have enough RAM to hold this array?" type concerns.
I like the idea behind this article, but I'm a little skeptical of some of its advice.
For instance, I don't think it's a good idea to recommend using Boehm GC as a general solution to avoid memory management. It can't really tell the difference between pointers and pointer-sized integers, which means it usually leaks memory.
Integers matching valid pointers to allocated objects are rare - only common on 32-bit when running near limits of virtual memory space. Which is a bad idea anyway.
Language runtimes that eventually move off Boehm, such as Mono, usually cite other reasons.
C is inherently not suitable for a GC. If you want automatic memory management, it's better to use something like Go. I'm not saying this is a bad GC library, just that I'm not sure I would recommend it for general use when writing C code.
This is not to troll, but Nimrod (which has been mentioned a few times on list) is a language that looks something like mainly Pascal or Python and builds C code that will be subsequently compiled with a soft-realtime garbage collector or without (it does code elimination, GC included, if you are clever). If you are nuts you can opt to force using the Boehm GC with it, but do not expect great things out of it. There is also Rust. I am sure if you spend time on HN you have heard a lot about the latter.
There are (were) a couple bugs in the example code. Here are a some guidelines that will help avoid those, and most problems with integer operations and the heap in general.
First, don't mix unsigned and signed types in arithmetic; and always prefer the size_t type for variables representing the size of an object.
Second, check for overflow before an operation, not after, like so:
Boehm is a bad advice in general... there are situations where it could work maybe, like if you implement an interpreter for the fun of it or something like that.
For many kind of programs reference counting is the way to go for C, it still is manual, but an order of magnitude safer...
The article says two different things about how free() works:
1. In case of NULL pointer free does no action.
2. In "double free corruption" section, it says "Could be caused by calling free with pointer, which is [..] NULL pointer"
So: which is it? Otherwise, there's no point in the NULL/assert() dance, you can freely free() with impunity:
Is the phrase "locator value" defined by the C standard? I've never heard of it before. I always knew the 'l' in "lvalue" as meaning "this expression can appear on the left-hand side of an assignment operation."
EDIT: Found the answer (thanks, draft C11 standard.)
Section 6.3.2.1 contains the definition of lvalue, and it does not use the phrase "locator value" at all. However if you want to use it as a reminder that modifiable lvalues can be assigned to, then more power to ya :-)
"Check for NULL at beginning of function or blocks which are dereferencing pointers to dynamically allocated memory"
I really strongly disagree with this. Better for it to crash if you expect this memory to always have been allocated. This way you can fix your bug instead of putting your app into some potentially unexpected state... if allocations fail it doesn't make sense to just 'carry on anyway' in many situations.
I disagree. Unless you can know for absolute certainty that there is no cleanup required, it is not a good idea to simply crash and leave resources in an inconsistent state.
Otherwise, you need a way for the cleanup code to dispose of the resources and leave the system in a consistent state before crashing.
Ok, its obviously good practice to develop your software in a way that a random outage doesn't leave anything in an inconsistent state, but a lot of software fails to do this. At the very least, you may end up with something like when a server crashes and cannot restart because the port is still considered in use by the OS.
if you are writing software which has to have some kind of guaranteed reliability then maybe there is a sort of argument for this - but unless you are handling the failure case in a deterministic and well behaved way then I still think that is a much worse state to be in than a crash. Even in production. In either case you are putting the system in a potentially weird or unrecoverable state - the difference with a hard crash is that you know straight away and its easier to debug, even after its shipped.
Imagine the customer error report or imagine its a critical system and it starts making mistakes...
For most software smoke testing heavily is enough to make it super rock solid. I know that most software does not do this - just using a web browser or a smartphone makes it painfully obvious that even the big software houses have some seriously shoddy practices and that testing gets seriously neglected (it may be that its impractically big... i stuggle to buy that tbh)
The sum() example does not work as the article says it does. Under Visual Studio 2013 compiling for x86 (C++ compiler, but C++ also has integer promotion rules), the function returns zero. The reason is: 65535 = 2^16-1. uint16_t is signed, therefore it has 15 bits to represent the value. When executing "int16_t a = 65535;" under a debugger, "a" is set to -1.
Thanks, I tested with GCC and Clang on Linux on 64-bit x86, GCC+OpenBSD on 32-bit+x86 and GCC+Linux on PowerPC 603. The point I was aiming for is that described operation is indeed undefined.
What about that funky braces style? Code with wrong braces or whitespace messups indicate a lack of care. If the coder can't put a { in the right spot, what else has he screwed up?