Layers are fully independent of each other in the OCI spec (which makes them reusable). They are wired together through a separate manifest file that lists the layers of a specific image.
It's a mystery... Here are the bits of the OCI spec about multipart pushes (https://github.com/opencontainers/distribution-spec/blob/58d...). In short, you can only upload the next chunk after the previous one finishes, because you need to use information from the response's headers.
If you've got plenty of time for the build, you can. Make a two-stage build where the first stage installs Python and pytorch, and the second stage does ten COPYs which each grab 1/10th of the files from the first stage. Now you've got ten evenly sized layers. I've done this for very large images (lots of Python/R/ML crap) and it takes significant extra time during the build but speeds up pulls because layers can be pulled in parallel.
Surely you can have one layer per directory or something like that? Splitting along those lines works as long as everything isn't in one big file.
I think it was a mistake to make layers as a storage model visible in to the end user. This should just have been an internal implementation detail, perhaps similar to how Git handles delta compression and makes it independent of branching structure. We also should have delta pushes and pulls, using global caches (for public content), and the ability to start containers while their image is still in transfer.
It should be possible to split into multiple layers as long as each file is wholly within in its layer. This is the exact opposite of the work recommended combining commands to keep everything in one layer which I think is done ultimately for runtime performance reasons.
I've dug fairly deep into docker layering, it would be wonderful if there was a sort of `LAYER ...` barrier instead of implicitly via `RUN ...` lines.
Theoretically there's nothing stopping you from building the docker image and "re-layering it", as they're "just" bundles of tar files at the end of the day.
In true "when all you have is a hammer" fashion, as very best I can tell that syntax= directive is pointing to a separate docker image whose job it is to read the file and translate it into builtkit api calls, e.g. https://github.com/moby/buildkit/blob/v0.15.0/frontend/docke...
But, again for clarity: I've never tried such a stunt, that's just the impression I get from having done mortal kombat with builtkit's other silly parts
Thanks, that helps a lot and I didn't know about it:) It's a touch less powerful than full transactions (because AFAICT you can't say merge a COPY and RUN together) but it's a big improvement.
It's a mystery... Here are the bits of the OCI spec about multipart pushes (https://github.com/opencontainers/distribution-spec/blob/58d...). In short, you can only upload the next chunk after the previous one finishes, because you need to use information from the response's headers.