Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Layers are fully independent of each other in the OCI spec (which makes them reusable). They are wired together through a separate manifest file that lists the layers of a specific image.

It's a mystery... Here are the bits of the OCI spec about multipart pushes (https://github.com/opencontainers/distribution-spec/blob/58d...). In short, you can only upload the next chunk after the previous one finishes, because you need to use information from the response's headers.



Ah thanks.

That's chunks of a single layer though, not multiple layers right?


Indeed, you are free to push multiple layers in parallel. But when you have a 1 GiB layer full of AI/ML stuff you can feel the pain!

(I just updated my original comment to make clear I'm talking about single-layer pushes here)


Split the layer up?


You can’t. Installing pytorch and supporting dependencies takes 2.2GB on debian-slim.


If you've got plenty of time for the build, you can. Make a two-stage build where the first stage installs Python and pytorch, and the second stage does ten COPYs which each grab 1/10th of the files from the first stage. Now you've got ten evenly sized layers. I've done this for very large images (lots of Python/R/ML crap) and it takes significant extra time during the build but speeds up pulls because layers can be pulled in parallel.


I see your point on the pull speed. Most of my pulls are stuck at waiting for the pytorch/dependencies layer.

This might work with pip but I absolutely hate pip and using poetry with great success. I will investigate how to do this with poetry.


Surely you can have one layer per directory or something like that? Splitting along those lines works as long as everything isn't in one big file.

I think it was a mistake to make layers as a storage model visible in to the end user. This should just have been an internal implementation detail, perhaps similar to how Git handles delta compression and makes it independent of branching structure. We also should have delta pushes and pulls, using global caches (for public content), and the ability to start containers while their image is still in transfer.


It should be possible to split into multiple layers as long as each file is wholly within in its layer. This is the exact opposite of the work recommended combining commands to keep everything in one layer which I think is done ultimately for runtime performance reasons.


I've dug fairly deep into docker layering, it would be wonderful if there was a sort of `LAYER ...` barrier instead of implicitly via `RUN ...` lines.

Theoretically there's nothing stopping you from building the docker image and "re-layering it", as they're "just" bundles of tar files at the end of the day.

eg: `RUN ... ; LAYER /usr ; LAYER /var ; LAYER /etc ; LAYER [discard|remainder]`


I've wished for a long time that Dockerfiles had an explicit way to define layers ripped off from (postgre)sql:

    BEGIN
    RUN foo
    RUN bar
    COMMIT


At the very real risk of talking out of my ass, the new versioned Dockerfile mechanism on top of builtkit should enable you to do that: https://github.com/moby/buildkit/blob/v0.15.0/frontend/docke...

In true "when all you have is a hammer" fashion, as very best I can tell that syntax= directive is pointing to a separate docker image whose job it is to read the file and translate it into builtkit api calls, e.g. https://github.com/moby/buildkit/blob/v0.15.0/frontend/docke...

But, again for clarity: I've never tried such a stunt, that's just the impression I get from having done mortal kombat with builtkit's other silly parts



Thanks, that helps a lot and I didn't know about it:) It's a touch less powerful than full transactions (because AFAICT you can't say merge a COPY and RUN together) but it's a big improvement.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: