Not doing trivial overflow checking before arithmetic.
Using results from said arithmetic as indices or starting points for memory writes.
My gut says that any application using this library to process untrusted input is exploitable.
Of course, it was never advertised as being secure :-)
Which is unusual, as many "better strings" libraries make claims about the security of the traditional way of string handling (and then go on to do it wrong).
Edit: To give a concrete example, someone here recommended BString just a few days ago. That library has a "security statement" in which it claims to prevent overflow type attacks by using signed integers instead of size_t, and then checking whether the result is negative after arithmetic. But signed overflow is undefined behavior in C, and these are not guaranteed to wrap around. And yes such things have been exploited. I don't know how likely or easy bstring is to "own" but in any case it's doinnit wrong.
And yes I checked the code, it does what it claims to do..
Hello clarry, I don't think there are APIs that if not grossly misused will lead to security issues in general. There is one issue I'm aware of that perhaps is not easily exploitable but surely is unexpected behavior, that is, overflow when you reach a 2GB string: that requires a check in sdsMakeRoomFor() for sure.
Note that this could be fixed by using uint64_t type for example in the header, however in the original incarnation of SDS inside Redis this was not possible for memory usage concerns. In the standalone library I believe it is instead a great idea, since 2GB is no longer an acceptable limit.
From the point of view of the security the most concerning function looks like sdsrange(), there are sanity checks in place to avoid that indexes received from the outside can lead to issues, but I'll do a security review in the future in order to formally check for issues.
If it's any reassurance, I had a look at it, and although there is unnecessary signed/unsigned arithmetic, I could not find anything exploitable. Change the types of variables representing an object size, length, array index, and so on to size_t (from int) and it will be fine.
Might be acceptable for an individual project where you can make statements about the sizes that will be used ahead of time, but for a general purpose library your statement is absolutely false. The issue is that your size requirement can overflow size_t, malloc gets passed a smaller value than you really intended, then if you try to use what you think you allocated (remember, it succeeded with a smaller size), it's a derefence outside the bounds of the allocation. Changing the size or type of the length variable won't solve that. This is a common and well known vulnerability pattern - it is disappointing to see folks so dismissive of it.
I'm aware of arithmetic overflow, thanks. It was the subject of my research (which will eventually be open source, watch www.peripetylabs.com/publications/ if you're interested). See my sizeop library in the meantime: https://bitbucket.org/eliteraspberries/sizeop
I am not dismissive of the problem. I just know from experience that the chances of code like that being fixed at all, let alone correctly, is near zero.
I think when doing malloc(n * m) or similar the most cautious thing to do would be to check for overflow even if you don't think it's exploitable. Especially for a library. Witness for example that OpenBSD's calloc does an overflow check.
I often leave this out of my own code, but not without feeling somewhat guilty about it.
In general, just because you cannot exploit something doesn't mean nobody else can. What to you seems at worst "merely" unexpected behavior is a very good starting point for someone who actually writes exploits.
Consider what happens if an attacker controls initlen passed to sdsnewlen() (also via sdsnew() if he controls the initial string). He can overflow the arithmetic on malloc/calloc. Now either the memcpy or the NUL-byte assignment can write out of bounds, which is potentially exploitable, especially if the attacker has a nearly unbounded number of attempts (which they often do as far as online-facing services go) or if they can do things to affect the program's memory layout -- which they likely can if they're feeding the program with data.
But if these two aren't enough for a successful exploit (I wouldn't make any bets!), the initlen is assigned to sh->len (whoops, conversion to signed... and possible overflow), which then can completely throw off the arithmetic in sdsMakeRoomFor. Here I would actually be quite willing to bet an attacker can arrange these numbers so that he can do a more controlled out-of-bounds write in one of the functions that rely on sdsMakeRoomFor.
These two functions aren't necessarily the only problematic ones; they're just the ones I noticed after a minute of scanning.
And I don't really agree with what eliteraspberrie said; using the right types is a good start but it's absolutely not fine until after you actually do all the overflow checking; for an example of this, see the OpenBSD man page for malloc: http://www.openbsd.org/cgi-bin/man.cgi?query=malloc
Rule of thumb: whenever you do any arithmetic, ask yourself, can it overflow (how do you know it doesn't?) and who's in charge of making sure it doesn't? In this case it's your API's responsibility. This is exactly the same thing you do whenever you call a function: you need to know if it can fail, and if it can, you need to check the return value and do the right thing.
In the years I've spent reading CVEs (and the broken code that caused the alert), proof-of-concept exploits as well as real world exploits, I've learned that some incredibly subtle and seemingly insignificant things can be exploited. As a general rule, don't worry about exploitability, it's usually not worth your time to prove one way or another. Just fix the code whenever you see something that could go wrong. Make it easy for the next guy who audits your code; so he too can tell that your code is secure, just by seeing the right checks in the right place.
It's a good read for anyone doing C these days, and covers a good deal of problems, including the one discussed here. It comes with snippets of known vulnerable real world code too.
Not doing trivial overflow checking before arithmetic.
Using results from said arithmetic as indices or starting points for memory writes.
My gut says that any application using this library to process untrusted input is exploitable.
Of course, it was never advertised as being secure :-)
Which is unusual, as many "better strings" libraries make claims about the security of the traditional way of string handling (and then go on to do it wrong).
Edit: To give a concrete example, someone here recommended BString just a few days ago. That library has a "security statement" in which it claims to prevent overflow type attacks by using signed integers instead of size_t, and then checking whether the result is negative after arithmetic. But signed overflow is undefined behavior in C, and these are not guaranteed to wrap around. And yes such things have been exploited. I don't know how likely or easy bstring is to "own" but in any case it's doinnit wrong.
And yes I checked the code, it does what it claims to do..