Layers are fully independent of each other in the OCI spec (which makes them reu...

IanCal · on July 12, 2024

Ah thanks.

That's chunks of a single layer though, not multiple layers right?

wofo · on July 12, 2024

Indeed, you are free to push multiple layers in parallel. But when you have a 1 GiB layer full of AI/ML stuff you can feel the pain!

(I just updated my original comment to make clear I'm talking about single-layer pushes here)

killingtime74 · on July 12, 2024

Split the layer up?

thangngoc89 · on July 12, 2024

You can’t. Installing pytorch and supporting dependencies takes 2.2GB on debian-slim.

electroly · on July 12, 2024

If you've got plenty of time for the build, you can. Make a two-stage build where the first stage installs Python and pytorch, and the second stage does ten COPYs which each grab 1/10th of the files from the first stage. Now you've got ten evenly sized layers. I've done this for very large images (lots of Python/R/ML crap) and it takes significant extra time during the build but speeds up pulls because layers can be pulled in parallel.

thangngoc89 · on July 13, 2024

I see your point on the pull speed. Most of my pulls are stuck at waiting for the pytorch/dependencies layer.

This might work with pip but I absolutely hate pip and using poetry with great success. I will investigate how to do this with poetry.

fweimer · on July 12, 2024

Surely you can have one layer per directory or something like that? Splitting along those lines works as long as everything isn't in one big file.

I think it was a mistake to make layers as a storage model visible in to the end user. This should just have been an internal implementation detail, perhaps similar to how Git handles delta compression and makes it independent of branching structure. We also should have delta pushes and pulls, using global caches (for public content), and the ability to start containers while their image is still in transfer.

password4321 · on July 12, 2024

It should be possible to split into multiple layers as long as each file is wholly within in its layer. This is the exact opposite of the work recommended combining commands to keep everything in one layer which I think is done ultimately for runtime performance reasons.

ramses0 · on July 12, 2024

I've dug fairly deep into docker layering, it would be wonderful if there was a sort of `LAYER ...` barrier instead of implicitly via `RUN ...` lines.

Theoretically there's nothing stopping you from building the docker image and "re-layering it", as they're "just" bundles of tar files at the end of the day.

eg: `RUN ... ; LAYER /usr ; LAYER /var ; LAYER /etc ; LAYER [discard|remainder]`

yjftsjthsd-h · on July 12, 2024

I've wished for a long time that Dockerfiles had an explicit way to define layers ripped off from (postgre)sql:

    BEGIN
    RUN foo
    RUN bar
    COMMIT

mdaniel · on July 12, 2024

At the very real risk of talking out of my ass, the new versioned Dockerfile mechanism on top of builtkit should enable you to do that: https://github.com/moby/buildkit/blob/v0.15.0/frontend/docke...

In true "when all you have is a hammer" fashion, as very best I can tell that syntax= directive is pointing to a separate docker image whose job it is to read the file and translate it into builtkit api calls, e.g. https://github.com/moby/buildkit/blob/v0.15.0/frontend/docke...

But, again for clarity: I've never tried such a stunt, that's just the impression I get from having done mortal kombat with builtkit's other silly parts

skrause · on July 12, 2024

    RUN <<EOF
    foo
    bar
    EOF

https://www.docker.com/blog/introduction-to-heredocs-in-dock...

yjftsjthsd-h · on July 12, 2024

Thanks, that helps a lot and I didn't know about it:) It's a touch less powerful than full transactions (because AFAICT you can't say merge a COPY and RUN together) but it's a big improvement.