How I Made a Heap Overflow in Curl

gkhartman · on Oct 11, 2023

It still blows my mind that so many devices depend on a library largely written by a single person. The level of pressure must be insane and it shows in this bit:

"Reading the code now it is impossible not to see the bug. Yes, it truly aches having to accept the fact that I did this mistake without noticing and that the flaw then remained undiscovered in code for 1315 days. I apologize. I am but a human."

If Daniel happens to read this, thanks for your hard work, and I really don't think any apology is necessary. After all, the source was right there for any of us to read and review.

BTinfinity · on Oct 11, 2023

Reminds me of how for a while OpenSSL was being developed and maintained by just two guys called Steve [0]. I think they've upgraded to a team of seven now at least

[0] https://www.buzzfeed.com/chrisstokelwalker/the-internet-is-b...

slowwriter · on Oct 11, 2023

Presumably all called Steve.

estebank · on Oct 11, 2023

They spiced it up, now there's a Stephan and an Esteban in the mix.

trillic · on Oct 13, 2023

The Steven Seven believe in an uneven heaven.

jakefromstatecs · on Oct 11, 2023

A team of Steven

w23j · on Oct 11, 2023

So true. Also becomes apparent from this sentence in the Hackerone report:

"This report seems entirely correct and it hurts in my soul."

and the blog:

"[...] shipping a heap overflow in code installed in over twenty billion instances is not an experience I would recommend."

Harsh to have to bear that responsibility for so little reward.

avg_dev · on Oct 11, 2023

i really feel that the first sentence in that quote is an apt way to sum up a part of how learning and attention work for us as humans.

personally i've spent some time thinking about how in my life it is very easy for me to pay attention as hard as i can manage and still miss so many things. i feel like things like code review are a great way to work on this sort of thing (attention training, global and contextual awareness, etc.). for example, there is this place that i've been walking by regularly for a couple of years and someone maintains a nice flowerbed there and i pause to take a glance at it and have been doing so for about a year since i first noticed it. but a couple months ago i noticed that there is a companion flowerbed a few feet away that i just had never noticed because i was so focused on the first one. was that companion flowerbed there all along? i suspect that it was! but i have no idea because i didn't even know it existed.

i have tried to extrapolate this to knowledge across humanity in general. i suspect sometimes it takes just one person to notice something, to make an observation that informs their behavior, and it begins to disseminate to us all. sometimes it takes several tries and a long time to happen. and i wonder how much low-hanging fruit there is out there that just none of us have noticed yet. and i think about how integrating a simple, fundamental set of best practices already established by others who have been paying attention can improve the low-bar for us all.

i think that julia evans touched on this in https://jvns.ca/blog/2023/10/06/new-talk--making-hard-things... which was discussed here recently and so did dan luu in his https://danluu.com/p95-skill/ which has also been discussed here somewhere.

and, to conclude, i too have a lot of gratitude for stenberg and all the folks who've worked on curl, and many other parts of our internet infrastructure that i usually take for granted every day.

nine_k · on Oct 11, 2023

https://xkcd.com/2347/ had been true for many years before, and has been true for many years since it was published.

Sometimes the "Don't fix it if it ain't broken" principle is interpreted as "don't support it until breaks", and leads to nasty surprises.

cushpush · on Oct 11, 2023

In my heart I know this xkcd is a curl reference.

nine_k · on Oct 11, 2023

Including but not limited to, sadly.

anuraaga · on Oct 13, 2023

I have found the single-dev projects to _always_ have a wonderful contributor experience, with company or vendor ones often being quite poor. Really can see the passion when they put in the effort to work together on improving the project.

While there are likely sustainability issues, I guess we will still continue to have these projects and hats off to everyone like Daniel working not just for themselves but also for their community.

protomolecule · on Oct 11, 2023

[flagged]

callalex · on Oct 11, 2023

Did you forget to add a link?

avgcorrection · on Oct 11, 2023

It's at the end of the article.

avgcorrection · on Oct 11, 2023

I think he'll be fine.

pizza234 · on Oct 11, 2023

Very sensible conclusions about memory safety/memory-safe languages; excerpts follow.

---

Yes, this family of flaws would have been impossible if curl had been written in a memory-safe language instead of C [...]

The only approach in that direction I consider viable and sensible is to:

- allow, use and support more dependencies written in memory-safe languages and

- potentially and gradually replace parts of curl piecemeal, like with the introduction of hyper.

Such development is however currently happening in a near glacial speed and shows with painful clarity the challenges involved. curl will remain written in C for the foreseeable future.

Everyone not happy about this are of course welcome to roll up their sleeves and get working.

Including the latest two CVEs reported for curl 8.4.0, the accumulated total says that 41% of the security vulnerabilities ever found in curl would likely not have happened should we have used a memory-safe language. But also: the rust language was not even a possibility for practical use for this purpose during the time in which we introduced maybe the first 80% of the C related problems.

[...]

We repeatedly run several static code analyzers on the code and none of them have spotted any problems in this function.

galangalalgol · on Oct 11, 2023

In the last week new fuzzers and tests also appear to have been added. At this point, I think any c project that isn't safety critical would do well to mimic this pipeline.

4death4 · on Oct 11, 2023

Replacing parts of curl with Rust will not be possible. Most places want to build from source and don’t want to introduce a new compiler to the tool chain. And Rust supports a tiny subset of possible C targets.

pmontra · on Oct 11, 2023

What are those "most places" that "want to build from source"? It's a honest question and I intentionally left out the "don’t want to introduce a new compiler to the tool chain" statement to make the set larger. I understand "most" as larger than 50% on some metric.

I know about Gentoo and other Linux distributions that do build from source but it has been very uncommon for me to see a company building things from source instead of using apt, rpm, docker in this century. It's at least two orders of magnitude faster. I remember how in the 90s I had to download the source, scan the README for the dependencies, recursively so, then building the libraries, then eventually building the program I downloaded first.

Security wise, we were not reading the code, except for Nethack.

jdblair · on Oct 11, 2023

The places that want to build from source are the creators of embedded devices that use curl. These devices include (but are not limited to) wifi access points, routers, home automation devices, data loggers, IP video cameras, smart TVs, set top boxes, and streaming devices. There are a lot of these devices, and there are more all the time.

Even if the engineering teams do not build every dependency every time they build firmware (like busybox), they will at some point build all of the components of their system themselves (as supported by yocto).

dezgeg · on Oct 11, 2023

Anybody using the standard embedded build solutions like Buildroot or Yocto will get Rust support without doing anything. It's already there and implemented.

pizza234 · on Oct 11, 2023

> These devices include (but are not limited to) wifi access points, routers, home automation devices, data loggers, IP video cameras, smart TVs, set top boxes, and streaming devices

Access points/routers/smart TVs/set top boxes are typically be based on ARM/Linux/Android, and should be supported.

yjftsjthsd-h · on Oct 11, 2023

https://doc.rust-lang.org/nightly/rustc/platform-support.htm... seems to say that only aarch64 Linux and assorted x86 variants are tier 1; everything else is tier 2 (it compiles but isn't tested to actually work) at best

moltonel3x · on Oct 11, 2023

People often point this out, silently implying that Rust is not really supported/usable on the lower tiers and that therefore those platform should stick to C. But that's ignoring that the situation is the same with gcc/clang: niche platforms get much less testing, and very niche ones might have bitrotted without anyone noticing. Gcc doesn't publish or adheres to a tiered platform list, but if it was using Rustc's definition it would be at most tier 2 (because tier 1 distributes official binaries, and prevents any tier1-breaking changes from being merged).

pjmlp · on Oct 12, 2023

Also, GCC also has support for Go, D, Ada, Modula-2, it isn't as if the GCC side isn't without options for safe systems programming.

And in any case, C++ is always a better option than C regarding safety, when nothing else is available.

moltonel3x · on Oct 12, 2023

D, Ada, and Modula don't have enough traction.

Go's runtime can be a showstopper, especially for multi-language projects.

C++ may be better than C, but many people feel that Rust is even better.

Rust is already available in almost all scenarios, there's no need to wait for it.

pjmlp · on Oct 13, 2023

So now we moved from not having alternatives, to alternatives exist but...

yjftsjthsd-h · on Oct 11, 2023

If I'm using a device on one of the lower tiers, it's fair to assume that the C compiler for it has been exercised quite a lot, much more than the shiny new rust compiler that I'm introducing that I have no reason to think anybody else has ever used on this platform.

callalex · on Oct 11, 2023

Curl could be assumed to be one of the most well-exercised code bases in the world, and yet there are still bugs.

moltonel3x · on Oct 11, 2023

Why would you think that nobody has ever used Rust there ? Somebody has obviously put in the work to support that platform, rustc doesn't just blindly inherits the list of target triples from llvm.

While it's safe to assume that C gets a decent amount of use on every platform, you can't expect all platforms to be as well-supported as the major ones. Undoubtedly, some of those platforms would be listed as low-tier if the C compilers cared to maintain a platform tier list. But a platform being low-tier doesn't mean you shouldn't use that compiler there.

As for trusting Rust or C on niche platforms, C is so full of UB, platform-specific choices, and vendor extensions, that it's hard to ever fully know how well this or that project will work. Rust is much less surprising, if it works at all I expect it to fully work. I'd definitely pick Rust on niche platforms if I have the choice.

ammar2 · on Oct 11, 2023

There actually isn't that much work to adding basic support for a new platform target for Rust. You're moreso banking on LLVM's ability to target it, which, given clang gives you that C compiler maturity you're after.

pjmlp · on Oct 12, 2023

Likewise for C++, with standard library types having support for bounds checking, no need to wait for Rust.

While not as safe as Rust, it is definitly much safer than plain C.

pmontra · on Oct 11, 2023

That makes sense. It was not clear in the context of the post I replied to.

jdblair · on Oct 11, 2023

Yeah, I admit that maybe this isn't "most places," since there are almost certainly more shops running various versions of Linux servers in production than there are embedded Linux development teams.

(edited to fix typo)

moltonel3x · on Oct 11, 2023

> Replacing parts of curl with Rust will not be possible.

It's not just possible, it's been done. You can compile curl with rustls, you could for a time compile it with quiche, and work is ongoing to compile it with hyper. Curl is remarkably modular, none of those are mandatory.

> And Rust supports a tiny subset of possible C targets.

Gross overstatement. Rust supports the vast majority of devices that people buy today. Even if you ignore platform popularity and just count platforms supported by gcc vs rustc/llvm, there's only a handful missing from the later. And if you're talking about vendor-specific compilers, a lot of them don't support modern C or C++ either.

axus · on Oct 11, 2023

I know almost nothing about the Rust ecosystem but am interested. What Rust packages could completely replace Curl? The top result on Google suggested hyper and reqwest.

moltonel3x · on Oct 11, 2023

You'd need a few different ones, just like curl itself uses a lot of 3rd-party libraries to provide its full feature set.

Some likely first choices would be hyper (http client/server), rustls (encryption), tokio (async scheduling). The Rust ecosystem is quite rich in protocols and codecs, it shouldn't be too hard to find most (all ?) of the crates you need, but there's still work needed to bring them together into one curl-like tool.

Note that Rust crates tend to be more focused than what you're used to in C, made to be composed together instead of used as a one-stop-lib. So your dependency tree would look much bigger than curl's.

pornel · on Oct 11, 2023

Curl supports a lot of protocols, including surprising ones like LDAP, SMTP and POP3, so there is no exact curl replacement anywhere, but Rust already has libraries for every protocol that Curl supports.

Hyper is a pretty robust HTTP toolkit, and reqwest is a higher-level library on top of it.

pizza234 · on Oct 11, 2023

The maintainer wrote a post about this topic: https://daniel.haxx.se/blog/2020/10/09/rust-in-curl-with-hyp....

"Most places want to build from source" is not something he considered; there is a brief "for users against the C implementation on the platforms you care about".

agwa · on Oct 11, 2023

Should the majority of us continue to bear the risks of memory unsafe code because of a small minority that uses exotic architectures or doesn't want to install a new compiler?

rhdunn · on Oct 11, 2023

You need to have someone, or a group of people, willing to write the code to implement a replacement that is fully compatible, and maintain it. Until then, curl written in C is what we have.

With that, curl can be used on the exotic architectures and the rust/other language version can be used on other platforms.

moltonel3x · on Oct 11, 2023

You need to have someone, or a group of people, willing to write the code to implement support for modern tooling on their platform. Until then, old harder-to-secure codebases is what these platforms have.

I know reversing the burden of implementation seems flippant, but it's pragmatic. At some stage, it's less community-wide work to support Rust on a new platform than to spend extra time maintaining/securing dozens of C codebases. Curl may not be making its Rust components mandatory anytime soon, but other projects like python crypto already have, to say nothing of projects written in Rust to begin with.

rustc_codegen_gcc is pretty close to ready, let's focus on getting it out of the door, and more target triples supported by rustc and llvm.

dezgeg · on Oct 11, 2023

"Most" is a very heavy claim. Any concrete evidence of this?

Also in embedded Linux I guess we are technically "building from source", when using a build system like Buildroot or Yocto. But Rust is already supported there and in fact already in use in our project (due to python3-cryptography).

steveklabnik · on Oct 11, 2023

Just to be clear, since this hypothetical library exposes a C ABI, pre-built binaries wouldn’t be an issue if somehow magically curl was reimplemented in Rust. Someone would have to compile it, but that’s the same as if they didn’t want to build something written in C from source.

galangalalgol · on Oct 11, 2023

Not wanting to introduce a new compiler is like wanting to introduce a new language. If you don't like the choice the package maintainer makes, roll up your sleeves.

That said, we can mitigiate it using mrustc to generate c. Does anyone keep a list of all the targets libcurl is getting built for?

gizmo686 · on Oct 11, 2023

I've worked on 'build the whole world from source' projects (yocto based). Adding a new compiler to the toolchain is not difficult.

deng · on Oct 11, 2023

This is a great writeup. However, even after also reading the CVE, I'm still unsure about under which circumstances you would be affected. As far as I understand it, you are affected if you

* use a SOCKS5 proxy, AND

* you use the SOCKS5 proxy to resolve hostnames (which AFAICS is NOT the default), AND

* the size of the buffer was changed from the default (100kb) to something below 65541 bytes, AND (EDIT: Not correct: 'libcurl' re-uses the download buffer for this, which by default is 16kB, however it is said that 'curl' itself sets it manually to 100kb UNLESS you use --limit-rate)

* the SOCKS5 proxy is too slow to handle the request immediately (which however, as the CVE states, can usually be provoked if the attacker has control over the request rate). (EDIT: This is wrong, the CVE actually says "Typical server latency is likely "slow" enough to trigger this bug without an attacker needing to influence it by DoS or SOCKS server control.")

So to me, the attack vector seems very small. Am I missing something?

kiviuq · on Oct 11, 2023

I'm trying to under the last comment

> the SOCKS5 proxy is too slow to handle the request immediately

the request seg faults immediately, no delays, no redirects

root@1aac5e228e16:/build/curl-7.74.0# curl -vvv -x socks5h://host.docker.internal:9050 $(python3 -c "print(('A'10000), end='')")

Trying 192.168.65.254:9050...

* SOCKS5: server resolving disabled for hostnames of length > 255 [actual len=10000]

* SOCKS5 connect to AAAAA...

* Send failure: Bad file descriptor

* Failed to send SOCKS5 connect request.

Segmentation fault

https://gist.github.com/xen0bit/0dccb11605abbeb6021963e2b1a8...

deng · on Oct 11, 2023

I'm confused why plain 'curl' would crash here: the CVE says that

"The target buffer is the heap-based download buffer in libcurl that is reused for SOCKS negotiation before the transfer has started. The size of the buffer is 16kB by default, but can be set to different sizes by the application. The curl tool sets it to 102400 bytes by default - but it sets the buffer size to a smaller size if --limit-rate is set lower than 102400 bytes per second."

I thought with a buffer of 100kB, this bug wouldn't trigger?

deng · on Oct 11, 2023

Yes sorry, I was wrong. The CVE says the opposite, I put an EDIT in the OP.

davidshepherd7 · on Oct 11, 2023

Yeah I think this is correct with the edit.

I guess this is mostly relevant for software that runs on shared infra, sends requests to a url provided by an attacker (e.g. webhooks), and uses a SOCKS5 proxy?

nunez · on Oct 11, 2023

I wrote a series of shell scripts called docker-proxy that creates SOCKS5 tunnels to Docker containers running openconnect VPNs, useful when you're working with multiple customers who have different VPNs that want all traffic on your machine forwarded through. https://github.com/carlosonunez/docker-proxy

Since DNS resolution is meant to occur remotely, this CVE would be directly applicable here.

foul · on Oct 11, 2023

To me, this looks like recipe for disaster navigating in darknets like Tor.

account-5 · on Oct 11, 2023

I have to say this is the best CVE writeups I've ever read. And I can only commend the author who is humble throughout despite authoring one of the cornerstone software products of the age. Much respect.

dogben · on Oct 11, 2023

  if(!socks5_resolve_local && hostname_len > 255) {
    socks5_resolve_local = TRUE;
  }

This is really a bad idea. For people who use anti-censorship tool to protect privacy, this can leak their identity through DNS.

vsnf · on Oct 11, 2023

> I think it was downright wrong to switch mode like this, since the user asked for remote resolve curl should stick to that or fail. It is not even likely to work to just switch, even in “good” situations.

Yes, the author shares your opinion.

taspeotis · on Oct 11, 2023

https://hackerone.com/reports/2187833

dig1 · on Oct 11, 2023

> Yes, this family of flaws would have been impossible if curl had been written in a memory-safe language instead of C

It could have been impossible, but you never know if there is a way to escape language VM barriers. However, the author clearly ignored the DNS hostname limit stated in RFC1123 [1], which is hardcoded even in Java libraries.

[1] https://www.rfc-editor.org/rfc/rfc1123

js2 · on Oct 11, 2023

It's covered in the section titled "host name length":

> A host name in a URL has no real size limit, but libcurl’s URL parser refuses to accept names longer than 65535 bytes. DNS only accepts host names up 253 bytes. So, a legitimate name that is longer than 253 bytes is unusual. A real name that is longer than 1024 is virtually unheard of.

DNS is not the only mechanism for resolving host name to address, even if it's what's used 99.9% of the time today.

jandrese · on Oct 11, 2023

Legitimate requirements for hostnames longer than that have to be vanishingly small. I dare anybody to come up with a single anecdote from real world use.

eesmith · on Oct 11, 2023

Could you be more specific? All I saw was:

      Host software MUST handle host names of up to 63 characters and
      SHOULD handle host names of up to 255 characters.

I don't see where the RFC sets a upper limit on host name size.

dig1 · on Oct 11, 2023

6.1.3.5 Extensibility:

    The DNS defines domain name syntax very generally -- a
    string of labels each containing up to 63 8-bit octets,
    separated by dots, and with a maximum total of 255
    octets.

RFC1035 makes this more explicit [1].

[1] https://www.freesoft.org/CIE/RFC/1035/9.htm

eesmith · on Oct 11, 2023

Thanks!

datenwolf · on Oct 11, 2023

memory-safe language does not imply the use of a VM.

It can just as well mean, that the compiler will attempt to execute a proof, that memory is never accessed out of bounds, without well defined ownership and within the lifetime of the underlying object. Which is what Rust does, for example.

Of course it can also mean, that the compiler will then additionally add internal failure checks and safeguards at critical places (Rust does not do this, but it would be nice to have in systems where one might worry about in-register bit-flips (high radiation environments, like X-ray scanners), i.e. stuff not caught by – say – e.g. ECC memory).

matja · on Oct 11, 2023

By the wording of the RFC, accepting a hostname longer than 255 bytes is permissible, creating one is not.

Cloudef · on Oct 11, 2023

Quite a bit of hype and drama for something that requires very specific condition to be exploited.

faster_harder · on Oct 11, 2023

Curl has had their share of issues with CVEs, I assume they want to show that they are taking actual security issues very seriously. See e.g. https://daniel.haxx.se/blog/2023/08/26/cve-2020-19909-is-eve...

palata · on Oct 11, 2023

... and that is installed on billions on devices. I can imagine you quickly get very specific conditions here and there when something is deployed at that scale.

ynik · on Oct 11, 2023

Yesterday's Windows 10 security updates did not include a new curl version; the curl included with Windows is still 8.0.1.

jeffrallen · on Oct 11, 2023

This is a long and interesting note which could be reduced to "this is why system libraries need to be either written in a safe language or proved correct".

When one of the best, friendliest, and most transparent C programmers of our time is writing these posts, we need to pay attention.

yjftsjthsd-h · on Oct 11, 2023

> When one of the best, friendliest, and most transparent C programmers of our time is writing these posts, we need to pay attention.

Given that said programmer wrote,

> Such development is however currently happening in a near glacial speed and shows with painful clarity the challenges involved. curl will remain written in C for the foreseeable future.

I'm curious what conclusion you'd like us to draw.

pstuart · on Oct 11, 2023

Maybe I'm weird, but having a ternary that evaluates to ? TRUE : FALSE bugs me.

jrpelkonen · on Oct 12, 2023

Same. We could both be weird. Or not.

notRobot · on Oct 11, 2023

I have always been a huge fan of Daniel's writing. Such clear and honest writing explaining reasonings and perspectives.

junon · on Oct 11, 2023

[flagged]

knallfrosch · on Oct 11, 2023

> Everyone not happy about this are of course welcome to roll up their sleeves and get working.

junon · on Oct 11, 2023

I didn't post it as an angry response, just knew what kind of comments this would receive. Not sure why it got flagged, I wasn't saying this was some sort of excuse or that it should be rewritten, just found it interesting he included it. I'm sure he gets that comment a lot.

ahoka · on Oct 11, 2023

It doesn’t help that the whole function is a huge computed goto mutating things, spanning multiple pages.

messe · on Oct 11, 2023

…you mean the most common pattern for writing a state machine?

anon-3988 · on Oct 11, 2023

There's obviously no better way to write state machines...

parasti · on Oct 11, 2023

Can you elaborate?

msla · on Oct 11, 2023

If they could, they wouldn't have used sarcasm.

gdprrrr · on Oct 11, 2023

Not OP, but in functional languages, state machines are often simply a function (State, Input) -> State

diogenes4 · on Oct 11, 2023

This is a well-discussed issue with c memes, not a widespread issue. Even in c, jump tables are a safer alternative.

ChoHag · on Oct 11, 2023

[flagged]

pgeorgi · on Oct 11, 2023

Come on, it starts out with a link to the CVE and a remark "While the advisory contains all the necessary details. I figured I would use a few additional words..."

If you just want the gist of it, the article does everything it can to send you there.