It still blows my mind that so many devices depend on a library largely written by a single person. The level of pressure must be insane and it shows in this bit:
"Reading the code now it is impossible not to see the bug. Yes, it truly aches having to accept the fact that I did this mistake without noticing and that the flaw then remained undiscovered in code for 1315 days. I apologize. I am but a human."
If Daniel happens to read this, thanks for your hard work, and I really don't think any apology is necessary. After all, the source was right there for any of us to read and review.
Reminds me of how for a while OpenSSL was being developed and maintained by just two guys called Steve [0]. I think they've upgraded to a team of seven now at least
i really feel that the first sentence in that quote is an apt way to sum up a part of how learning and attention work for us as humans.
personally i've spent some time thinking about how in my life it is very easy for me to pay attention as hard as i can manage and still miss so many things. i feel like things like code review are a great way to work on this sort of thing (attention training, global and contextual awareness, etc.). for example, there is this place that i've been walking by regularly for a couple of years and someone maintains a nice flowerbed there and i pause to take a glance at it and have been doing so for about a year since i first noticed it. but a couple months ago i noticed that there is a companion flowerbed a few feet away that i just had never noticed because i was so focused on the first one. was that companion flowerbed there all along? i suspect that it was! but i have no idea because i didn't even know it existed.
i have tried to extrapolate this to knowledge across humanity in general. i suspect sometimes it takes just one person to notice something, to make an observation that informs their behavior, and it begins to disseminate to us all. sometimes it takes several tries and a long time to happen. and i wonder how much low-hanging fruit there is out there that just none of us have noticed yet. and i think about how integrating a simple, fundamental set of best practices already established by others who have been paying attention can improve the low-bar for us all.
and, to conclude, i too have a lot of gratitude for stenberg and all the folks who've worked on curl, and many other parts of our internet infrastructure that i usually take for granted every day.
I have found the single-dev projects to _always_ have a wonderful contributor experience, with company or vendor ones often being quite poor. Really can see the passion when they put in the effort to work together on improving the project.
While there are likely sustainability issues, I guess we will still continue to have these projects and hats off to everyone like Daniel working not just for themselves but also for their community.
Very sensible conclusions about memory safety/memory-safe languages; excerpts follow.
---
Yes, this family of flaws would have been impossible if curl had been written in a memory-safe language instead of C [...]
The only approach in that direction I consider viable and sensible is to:
- allow, use and support more dependencies written in memory-safe languages and
- potentially and gradually replace parts of curl piecemeal, like with the introduction of hyper.
Such development is however currently happening in a near glacial speed and shows with painful clarity the challenges involved. curl will remain written in C for the foreseeable future.
Everyone not happy about this are of course welcome to roll up their sleeves and get working.
Including the latest two CVEs reported for curl 8.4.0, the accumulated total says that 41% of the security vulnerabilities ever found in curl would likely not have happened should we have used a memory-safe language. But also: the rust language was not even a possibility for practical use for this purpose during the time in which we introduced maybe the first 80% of the C related problems.
[...]
We repeatedly run several static code analyzers on the code and none of them have spotted any problems in this function.
In the last week new fuzzers and tests also appear to have been added. At this point, I think any c project that isn't safety critical would do well to mimic this pipeline.
Replacing parts of curl with Rust will not be possible. Most places want to build from source and don’t want to introduce a new compiler to the tool chain. And Rust supports a tiny subset of possible C targets.
What are those "most places" that "want to build from source"? It's a honest question and I intentionally left out the "don’t want to introduce a new compiler to the tool chain" statement to make the set larger. I understand "most" as larger than 50% on some metric.
I know about Gentoo and other Linux distributions that do build from source but it has been very uncommon for me to see a company building things from source instead of using apt, rpm, docker in this century. It's at least two orders of magnitude faster. I remember how in the 90s I had to download the source, scan the README for the dependencies, recursively so, then building the libraries, then eventually building the program I downloaded first.
Security wise, we were not reading the code, except for Nethack.
The places that want to build from source are the creators of embedded devices that use curl. These devices include (but are not limited to) wifi access points, routers, home automation devices, data loggers, IP video cameras, smart TVs, set top boxes, and streaming devices. There are a lot of these devices, and there are more all the time.
Even if the engineering teams do not build every dependency every time they build firmware (like busybox), they will at some point build all of the components of their system themselves (as supported by yocto).
Anybody using the standard embedded build solutions like Buildroot or Yocto will get Rust support without doing anything. It's already there and implemented.
> These devices include (but are not limited to) wifi access points, routers, home automation devices, data loggers, IP video cameras, smart TVs, set top boxes, and streaming devices
Access points/routers/smart TVs/set top boxes are typically be based on ARM/Linux/Android, and should be supported.
People often point this out, silently implying that Rust is not really supported/usable on the lower tiers and that therefore those platform should stick to C. But that's ignoring that the situation is the same with gcc/clang: niche platforms get much less testing, and very niche ones might have bitrotted without anyone noticing. Gcc doesn't publish or adheres to a tiered platform list, but if it was using Rustc's definition it would be at most tier 2 (because tier 1 distributes official binaries, and prevents any tier1-breaking changes from being merged).
If I'm using a device on one of the lower tiers, it's fair to assume that the C compiler for it has been exercised quite a lot, much more than the shiny new rust compiler that I'm introducing that I have no reason to think anybody else has ever used on this platform.
Why would you think that nobody has ever used Rust there ? Somebody has obviously put in the work to support that platform, rustc doesn't just blindly inherits the list of target triples from llvm.
While it's safe to assume that C gets a decent amount of use on every platform, you can't expect all platforms to be as well-supported as the major ones. Undoubtedly, some of those platforms would be listed as low-tier if the C compilers cared to maintain a platform tier list. But a platform being low-tier doesn't mean you shouldn't use that compiler there.
As for trusting Rust or C on niche platforms, C is so full of UB, platform-specific choices, and vendor extensions, that it's hard to ever fully know how well this or that project will work. Rust is much less surprising, if it works at all I expect it to fully work. I'd definitely pick Rust on niche platforms if I have the choice.
There actually isn't that much work to adding basic support for a new platform target for Rust. You're moreso banking on LLVM's ability to target it, which, given clang gives you that C compiler maturity you're after.
Yeah, I admit that maybe this isn't "most places," since there are almost certainly more shops running various versions of Linux servers in production than there are embedded Linux development teams.
> Replacing parts of curl with Rust will not be possible.
It's not just possible, it's been done. You can compile curl with rustls, you could for a time compile it with quiche, and work is ongoing to compile it with hyper. Curl is remarkably modular, none of those are mandatory.
> And Rust supports a tiny subset of possible C targets.
Gross overstatement. Rust supports the vast majority of devices that people buy today. Even if you ignore platform popularity and just count platforms supported by gcc vs rustc/llvm, there's only a handful missing from the later. And if you're talking about vendor-specific compilers, a lot of them don't support modern C or C++ either.
I know almost nothing about the Rust ecosystem but am interested. What Rust packages could completely replace Curl? The top result on Google suggested hyper and reqwest.
You'd need a few different ones, just like curl itself uses a lot of 3rd-party libraries to provide its full feature set.
Some likely first choices would be hyper (http client/server), rustls (encryption), tokio (async scheduling). The Rust ecosystem is quite rich in protocols and codecs, it shouldn't be too hard to find most (all ?) of the crates you need, but there's still work needed to bring them together into one curl-like tool.
Note that Rust crates tend to be more focused than what you're used to in C, made to be composed together instead of used as a one-stop-lib. So your dependency tree would look much bigger than curl's.
Curl supports a lot of protocols, including surprising ones like LDAP, SMTP and POP3, so there is no exact curl replacement anywhere, but Rust already has libraries for every protocol that Curl supports.
Hyper is a pretty robust HTTP toolkit, and reqwest is a higher-level library on top of it.
"Most places want to build from source" is not something he considered; there is a brief "for users against the C implementation on the platforms you care about".
Should the majority of us continue to bear the risks of memory unsafe code because of a small minority that uses exotic architectures or doesn't want to install a new compiler?
You need to have someone, or a group of people, willing to write the code to implement a replacement that is fully compatible, and maintain it. Until then, curl written in C is what we have.
With that, curl can be used on the exotic architectures and the rust/other language version can be used on other platforms.
You need to have someone, or a group of people, willing to write the code to implement support for modern tooling on their platform. Until then, old harder-to-secure codebases is what these platforms have.
I know reversing the burden of implementation seems flippant, but it's pragmatic. At some stage, it's less community-wide work to support Rust on a new platform than to spend extra time maintaining/securing dozens of C codebases. Curl may not be making its Rust components mandatory anytime soon, but other projects like python crypto already have, to say nothing of projects written in Rust to begin with.
rustc_codegen_gcc is pretty close to ready, let's focus on getting it out of the door, and more target triples supported by rustc and llvm.
"Most" is a very heavy claim. Any concrete evidence of this?
Also in embedded Linux I guess we are technically "building from source", when using a build system like Buildroot or Yocto. But Rust is already supported there and in fact already in use in our project (due to python3-cryptography).
Just to be clear, since this hypothetical library exposes a C ABI, pre-built binaries wouldn’t be an issue if somehow magically curl was reimplemented in Rust. Someone would have to compile it, but that’s the same as if they didn’t want to build something written in C from source.
Not wanting to introduce a new compiler is like wanting to introduce a new language. If you don't like the choice the package maintainer makes, roll up your sleeves.
That said, we can mitigiate it using mrustc to generate c. Does anyone keep a list of all the targets libcurl is getting built for?
This is a great writeup. However, even after also reading the CVE, I'm still unsure about under which circumstances you would be affected. As far as I understand it, you are affected if you
* use a SOCKS5 proxy, AND
* you use the SOCKS5 proxy to resolve hostnames (which AFAICS is NOT the default), AND
* the size of the buffer was changed from the default (100kb) to something below 65541 bytes, AND (EDIT: Not correct: 'libcurl' re-uses the download buffer for this, which by default is 16kB, however it is said that 'curl' itself sets it manually to 100kb UNLESS you use --limit-rate)
* the SOCKS5 proxy is too slow to handle the request immediately (which however, as the CVE states, can usually be provoked if the attacker has control over the request rate). (EDIT: This is wrong, the CVE actually says "Typical server latency is likely "slow" enough to trigger this bug without an attacker needing to influence it by DoS or SOCKS server control.")
So to me, the attack vector seems very small. Am I missing something?
I'm confused why plain 'curl' would crash here: the CVE says that
"The target buffer is the heap-based download buffer in libcurl that is reused for SOCKS negotiation before the transfer has started. The size of the buffer is 16kB by default, but can be set to different sizes by the application. The curl tool sets it to 102400 bytes by default - but it sets the buffer size to a smaller size if --limit-rate is set lower than 102400 bytes per second."
I thought with a buffer of 100kB, this bug wouldn't trigger?
I guess this is mostly relevant for software that runs on shared infra, sends requests to a url provided by an attacker (e.g. webhooks), and uses a SOCKS5 proxy?
I wrote a series of shell scripts called docker-proxy that creates SOCKS5 tunnels to Docker containers running openconnect VPNs, useful when you're working with multiple customers who have different VPNs that want all traffic on your machine forwarded through. https://github.com/carlosonunez/docker-proxy
Since DNS resolution is meant to occur remotely, this CVE would be directly applicable here.
I have to say this is the best CVE writeups I've ever read. And I can only commend the author who is humble throughout despite authoring one of the cornerstone software products of the age. Much respect.
> I think it was downright wrong to switch mode like this, since the user asked for remote resolve curl should stick to that or fail. It is not even likely to work to just switch, even in “good” situations.
> Yes, this family of flaws would have been impossible if curl had been written in a memory-safe language instead of C
It could have been impossible, but you never know if there is a way to escape language VM barriers. However, the author clearly ignored the DNS hostname limit stated in RFC1123 [1], which is hardcoded even in Java libraries.
It's covered in the section titled "host name length":
> A host name in a URL has no real size limit, but libcurl’s URL parser refuses to accept names longer than 65535 bytes. DNS only accepts host names up 253 bytes. So, a legitimate name that is longer than 253 bytes is unusual. A real name that is longer than 1024 is virtually unheard of.
DNS is not the only mechanism for resolving host name to address, even if it's what's used 99.9% of the time today.
Legitimate requirements for hostnames longer than that have to be vanishingly small. I dare anybody to come up with a single anecdote from real world use.
The DNS defines domain name syntax very generally -- a
string of labels each containing up to 63 8-bit octets,
separated by dots, and with a maximum total of 255
octets.
memory-safe language does not imply the use of a VM.
It can just as well mean, that the compiler will attempt to execute a proof, that memory is never accessed out of bounds, without well defined ownership and within the lifetime of the underlying object. Which is what Rust does, for example.
Of course it can also mean, that the compiler will then additionally add internal failure checks and safeguards at critical places (Rust does not do this, but it would be nice to have in systems where one might worry about in-register bit-flips (high radiation environments, like X-ray scanners), i.e. stuff not caught by – say – e.g. ECC memory).
... and that is installed on billions on devices. I can imagine you quickly get very specific conditions here and there when something is deployed at that scale.
This is a long and interesting note which could be reduced to "this is why system libraries need to be either written in a safe language or proved correct".
When one of the best, friendliest, and most transparent C programmers of our time is writing these posts, we need to pay attention.
> When one of the best, friendliest, and most transparent C programmers of our time is writing these posts, we need to pay attention.
Given that said programmer wrote,
> Such development is however currently happening in a near glacial speed and shows with painful clarity the challenges involved. curl will remain written in C for the foreseeable future.
I'm curious what conclusion you'd like us to draw.
I didn't post it as an angry response, just knew what kind of comments this would receive. Not sure why it got flagged, I wasn't saying this was some sort of excuse or that it should be rewritten, just found it interesting he included it. I'm sure he gets that comment a lot.
Come on, it starts out with a link to the CVE and a remark "While the advisory contains all the necessary details. I figured I would use a few additional words..."
If you just want the gist of it, the article does everything it can to send you there.
"Reading the code now it is impossible not to see the bug. Yes, it truly aches having to accept the fact that I did this mistake without noticing and that the flaw then remained undiscovered in code for 1315 days. I apologize. I am but a human."
If Daniel happens to read this, thanks for your hard work, and I really don't think any apology is necessary. After all, the source was right there for any of us to read and review.