Since we launched this last year, GitHub released a v2 of their internal cache API [0], based on Twirp [1] of all things, so we adapted to that. Interestingly that Twirp service also receives Actions artifacts, though we have not intercepted those today given that you likely still want them to appear in the GitHub UI / be accessible from the GitHub API.
Depot is the fastest place to build software. We accelerate builds for customers using GitHub Actions, Docker, Bazel, Gradle, and more. We're seeking our first Enterprise Support Engineer to become a customer-facing expert on build optimization.
We're looking for someone with DevOps / CI consulting experience - you'll work directly with customers as the subject-matter expert on best practices, helping migrate legacy infrastructure, and working directly with the founders on product gaps.
Bonus: experience with Docker buildx, API integrations, or previous CI consulting.
Not quite for every container, but we operate a multi-tenant remote build execution service (container builds, GitHub Actions jobs, etc) so we launch a lot of ephemeral VMs in response to customer build requests. We use separate EC2 instances for strong workload isolation between different customers / jobs, and optimize boot time since that directly translates to queue time.
We also do GitHub Actions runners as a service, so a very high volume of differently-sized ephemeral VMs. We’ve experimented with .metal hosts, however they represent a bin-packing optimization problem, in that you will always be running some amount of spare compute / trying to fit the incoming build requests to physical hosts as tightly as possible.
Eventually you realize, IMO, that doing the bin packing yourself is just competing with AWS, that’s what they do when you launch a non-metal EC2 instance and it’s best to let them do what they’re good at. Hence why we’ve focused on optimization of that launch type, rather than trying to take over the virtualization.
There’s other security and performance reasons too: AWS is better at workload isolation than we can be, both that the isolation boundary is very strong, and that preventing noisy neighbors is difficult. Especially with things like disk, the strategies for ensuring fair access to the physical hardware (rate-limiting I/O) themselves have CPU overhead that slows everything down and prevents perfect bin-packing.
A list of fun things we've done for CI runners to improve CI:
- Configured a block-level in-memory disk accelerator / cache (fs operations at the speed of RAM!)
- Benchmarked EC2 instance types (m7a is the best x86 today, m8g is the best arm64)
- "Warming" the root EBS volume by accessing a set of priority blocks before the job starts to give the job full disk performance [0]
- Launching each runner instance in a public subnet with a public IP - the runner gets full throughput from AWS to the public internet, and IP-based rate limits rarely apply (Docker Hub)
- Configuring Docker with containerd/estargz support
- Just generally turning kernel options and unit files off that aren't needed
> Launching each runner instance in a public subnet with a public IP - the runner gets full throughput from AWS to the public internet, and IP-based rate limits rarely apply (Docker Hub)
Are you not using a caching registry mirror, instead pulling the same image from Hub for each runner...? If so that seems like it would be an easy win to add, unless you specifically do mostly hot/unique pulls.
The more efficient answer to those rate limits is almost always to pull less times for the same work rather than scaling in a way that circumvents them.
Today we (Depot) are not, though some of our customers configure this. For the moment at least, the ephemeral public IP architecture makes it generally unnecessary from a rate-limit perspective.
From a performance / efficiency perspective, we generally recommend using ECR Public images[0], since AWS hosts mirrors of all the "Docker official" images, and throughput to ECR Public is great from inside AWS.
If you’re running inside AWS us-east-1 then docker hub will give you direct S3 URLs for layer downloads (or it used to anyway)
Any pulls doing this become zero cost for docker hub
Any sort of cache you put between docker hub and your own infra would probably be S3 backed anyway, so adding another cache in between could be mostly a waste
Yeah we do some similar tricks with our registry[0]: pushes and pulls from inside AWS are served directly from AWS for maximum performance and no data transfer cost. Then when the client is outside AWS, we redirect all that to Tigris[1], also for maximum performance (CDN) and minimum data transfer cost (no cost from Tigris, just the cost to move content out of AWS once).
Forgive me, I'm not trying to be argumentative, but doesn't Linux (and presumably all modern OSes) already have a ram-backed writeback cache for filesystems? That sounds exactly like the page cache.
No worries, entirely valid question. There may be ways to tune page cache to be more like this, but my mental model for what we've done is effectively make reads and writes transparently redirect to the equivalent of a tmpfs, up to a certain size. If you reserve 2GB of memory for the cache, and the CI job's read and written files are less than 2GB, then _everything_ stays in RAM, at RAM throughput/IOPS. When you exceed the limit of the cache, blocks are moved to the physical disk in the background. Feels like we have more direct control here than page cache (and the page cache is still helping out in this scenario too, so it's more that we're using both).
> reads and writes transparently redirect to the equivalent of a tmpfs, up to a certain size
The last bit (emphasis added) sounds novel to me, I don't think I've heard before of anybody doing that. It sounds like an almost-"free" way to get a ton of performance ("almost" because somebody has to figure out the sizing. Though, I bet you could automate that by having your tool export a "desired size" metric that's equal to the high watermark of tmpfs-like storage used during the CI run)
Just to add, my understanding is that unless you also tune your workload writes, the page cache will not skip backing storage for writes, only for reads. So it does make sense to stack both if you're fine with not being able to rely on peristence of those writes.
No, it's more like swapping pages to disk when RAM is full, or like using RAM when the L2 cache is full.
Linux page cache exists to speed up access to the durable store which is the underlying block device (NVMe, SSD, HDD, etc).
The RAM-backed block device in question here is more like tmpfs, but with an ability to use the disk if, and only if, it overflows. There's no intention or need to store its whole contents on the durable "disk" device.
Hence you can do things entirely in RAM as long as your CI/CD job can fit all the data there, but if it can't fit, the job just gets slower instead of failing.
If you clearly understand your access patterns and memory requirements, you can often outperform the default OS page cache.
Consider a scenario where your VM has 4GB of RAM, but your build accesses a total of 6GB worth of files. Suppose your code interacts with 16GB of data, yet at any moment, its active working set is only around 2GB. If you preload all Docker images at the start of your build, they'll initially be cached in RAM. However, as your build progresses, the kernel will begin evicting these cached images to accommodate recently accessed data, potentially even files used infrequently or just once. And that's the key bit, to force caching of files you know are accessed more than once.
By implementing your own caching layer, you gain explicit control, allowing critical data to remain persistently cached in memory. In contrast, the kernel-managed page cache treats cached pages as opportunistic, evicting the least recently used pages whenever new data must be accommodated, even if this new data isn't frequently accessed.
That is true and correct, except that Linux does not have raw devices, and O_DIRECT on a file is not a complete replacement for the raw devices (the buffer cache still gets involved as well as the file system).
The ramdisk that overflows to a real disk is a cool concept that I didn't previously consider. Is this just clever use of bcache? If you have any docs about how this was set up I'd love to read them.
Why do it at the block level (instead of tmpfs)? Or do you mean that you're doing actual real persistent disks that just have a lot of cache sitting in front of them?
The block level has two advantages: (1) you can accelerate access to everything on the whole disk (like even OS packages) and (2) everything appears as one device to the OS, meaning that build tools that want to do things like hardlink files in global caches still work without any issue.
noatime is irrelevant because everyone has been using relatime for ages, and updating the atime field with relatime means you're writing that block to disk anyway, since you're updating the mtime field. So no I/O saved.
Depot is a build acceleration platform that makes Docker builds and GitHub Actions faster. We've already helped companies like PostHog, Wistia, Semgrep, and Secoda save thousands of hours in build time every week.
We're looking for our first marketing hire to define and execute our go-to-market strategy. If that's you, you'll own everything from content creation to demand gen, with a focus on developer audiences. We're growing rapidly with 500+ paying customers and double-digit monthly growth.
Requirements:
* 5+ years marketing experience with a focus on developer audiences
* Experience with content marketing, SEO, social, and email campaigns
* Comfortable with analytics tools (Google Analytics, ahrefs, PostHog)
* Experience with paid channels (LinkedIn, Reddit, etc.)
* Strong communication skills and ability to work asynchronously
We're a small, remote team building developer tools we wish we've had. If you're passionate about developer productivity and marketing to technical audiences, we'd love to hear from you.
The dates feel like intentional deception. The community question is "how long have you been discussing trademark usage?" and the answer is "I had lunch in February 2023!"
Like somebody not trying to be deceptive would say "we started talking about trademarks and a commercial relationship in February 2023", but that's not what this post says, and that's not the answer Matt has given in interviews, it's always this strange list of dates instead.
[0] https://github.com/actions/cache/discussions/1510
[1] https://github.com/twitchtv/twirp