More

euank · 2025-10-01T09:04:47 1759309487

I can give an n=1 anecdote here: the dns resolver used to have hard-coded caching which meant that it would be unresponsive to pod updates, and cause mini 30s outages.

The code in question was: https://github.com/grpc/grpc-go/blob/b597a8e1d0ce3f63ef8a7b6...

That meant that deploying a service which drained in less than 30s would have a little mini-outage for that service until the in-process DNS cache expired, with of course no way to configure it.

Kuberesolver streams updates, and thus lets clients talk to new pods almost immediately.

I think things are a little better now, but based on my reading of https://github.com/grpc/grpc/issues/12295, it looks like the dns resolver still might not resolve new pod names quickly in some cases.

euank · on July 13, 2022

Coincidentally, Google has sorta done that too!

ChromiumOS/ChromeOS is built using portage, and is certainly rolling release. It's not really a traditional linux distro, but it's something.

antoncohen · on July 13, 2022

Container-Optimized OS is based on Chromium OS and is more of a traditional Linux, in the sense that is runs on servers.

https://cloud.google.com/container-optimized-os/docs

eliaspro · on July 13, 2022

...and so was CoreOS before the equally named company was acquired by RedHat.

kaeso · on July 13, 2022

In a funny twist of events, the author of the grandparent comment above yours was indeed building that OS (Container Linux) at CoreOS :)

euank · on May 26, 2022

> Does anyone know if and where I could file a bug report?

The linux kernel does take bug reports: https://docs.kernel.org/admin-guide/reporting-issues.html

However, that bug probably isn't specific enough as you've described it, unless you can find the commit causing it (such as via a git bisect https://docs.kernel.org/admin-guide/bug-bisect.html), or come up with a clearer repro.

Alternatively, if you're seeing the issue on a distro-maintained kernel (such as on fedora/ubuntu/debian with their kernel package), reporting the issue to the distro maintainers may be more appropriate.

asddubs · on May 27, 2022

i've had an ubuntu employee fix a kernel bug that rendered my machine unbootable on newer kernels. it was a newer thinkpad so there were quite a few affected users, it'd probably be lesser priority with more obscure hardware. Still, just to support your statement with some concrete experience, reporting to your distro can definitely help

euank · on Sept 13, 2021

> Email has never had explicit threading

The "In-Reply-To" header is described in rfc2822. It is an explicit header in the RFC that is how you create threads.

Every mail client I've used correctly understands how to thread reply-chains using In-Reply-To.

The thing you're talking about, steam receipts grouping, is not a feature of email, but a specific feature of gmail's web view which is not mandated by any RFC and indeed is not explicit threading...

But there is a real way to thread which is defined in the RFC, and if you use a reasonable email client (aka not gmail), every mailing list's threading will work for you.

euank · on Dec 3, 2020

The kubelet can talk to containerd's cri endpoint, yes, but there's one additional bit of complexity.

If someone wants to use kubelet + docker so that they can, for example, ssh into a node and type 'docker ps' to see containers, or have something else using the docker api see the containers the kubelet started, that won't work after re-pointing the kubelet from docker to containerd.

The difference here is namespacing[0], but not the linux-kernel-container-namespace, rather the containerd concept by the same name to allow "multi-tenancy" of a single containerd daemon.

In addition, I don't think you could have docker + cri run in the same containerd namespace since they end up using different networking and storage containerd plugins. I think that terminology is right.

So yeah, repointing the kubelet to containerd directly works fine, but it won't be the same thing as running docker containers.

[0]: https://github.com/containerd/containerd/blob/9561d938/docs/...

euank · on Dec 2, 2020

I think that the title of this is a bit misleading.

Kubernetes is removing the "dockershim", which is special in-process support the kubelet has for docker.

However, the kubelet still has the CRI (container runtime interface) to support arbitrary runtimes. containerd is currently supported via the CRI, as is every runtime except docker. Docker is being moved from having special-case support to being the same in terms of support as other runtimes.

Does that mean using docker as your runtime is deprecated? I don't think so. You just have to use docker via a CRI layer instead of via the in-process dockershim layer. Since there hasn't been a need until now for an out-of-process cri->docker-api translation layer, there isn't a well supported one I don't think, but now that they've announced the intent to remove dockershim, I have no doubt that there will be a supported cri -> docker layer before long.

Maybe the docker project will add built-in support for exposing a CRI interface and save us an extra daemon (as containerd did).

In short, the title's misleading from my understanding. The Kubelet is removing the special-cased dockershim, but k8s distributions that ship with docker as the runtime should be able to run a cri->docker layer to retain docker support.

For more info on this, see the discussion on this pr: https://github.com/kubernetes/kubernetes/pull/94624

icco · on Dec 2, 2020

Also, people probably don't understand the difference between the container runtime and container build environment. You can build your container with Docker still and it can run in a different environment.

xorcist · on Dec 3, 2020

You can, but buildah exists.

root_axis · on Dec 3, 2020

What's the advantage of using buildah?

gravypod · on Dec 3, 2020

It's docker without the dockerfile which, from what I can tell, is the biggest feature of docker most engineers like.

I've personally switched to bazel for building most of my containers but that's a far departure from what the majority of people are doing I suspect.

rockwotj · on Dec 3, 2020

My company uses bazel to build containers and the distroless images that Google provides, it's a really nice setup IMO

gravypod · on Dec 3, 2020

I love the experience and performance. If more adoption happens it'll just get better as more languages are supported.

JACKSONMEISTER · on Dec 3, 2020

Can you point to any sources using bazel for this?

adamkf · on Dec 3, 2020

https://github.com/bazelbuild/rules_docker

kgoutham93 · on Dec 3, 2020

Is containerd CRI compliant? Kubelet still interacts with cri-containerd which inturn calls containerd. Isn't cri-containerd the dockershim of containerd?

Maybe I'm mixing up things, pls correct me wherever needed.

euank · on Dec 3, 2020

containerd can serve CRI requests itself. This has been the case since the containerd v1.1.0 release[0], which included the cri "plugin" as an in-process part of the containerd binary. For a while, to keep up the plugin idea, it was in a separate github repo too, but these days it's in the main containerd repo directly[1].

[0]: https://github.com/containerd/containerd/releases/tag/v1.1.0

[1]: https://github.com/containerd/containerd/tree/9561d9389d/pkg...

Havoc · on Dec 2, 2020

Thanks for explaining.

I suspect this will nuke a huge amount of tutorials out there though & frustrate newbies.

ZiiS · on Dec 2, 2020

This is deep in the internals of Kubernetes, nothing about `docker build/push` or `kubectl apply` will change.

manigandham · on Dec 2, 2020

This changes nothing for 99.9% of Kubernetes users.

euank · on Dec 3, 2020

For what it's worth, there are a few cases where docker vs some other runtime does make a difference.

One difference is that if you 'docker build' or 'docker load' an image on a node, with docker as a runtime a pod could be started using that image, but if containerd is the runtime it would have had to be 'ctr image import'ed instead.

I know that minikube, at some point, suggested people use 'DOCKER_HOST=..' + 'docker build' to make images available to that minikube node, which this would cause to not work.

It would be nice if k8s had its own container image store so you could 'kubectl image load' in a runtime agnostic way, but unfortunately managing the fetching of container images has ended up as something the runtime does, and k8s has no awareness of above the runtime.

Oh, and for production clusters, a distribution moving from dockerd to containerd could break a few things, like random gunk in the ecosystem that tries to find kubernetes pods by querying the docker api and checking labels. I think there's some monitoring and logging tools that do that.

If distributions move from docker to docker-via-a-cri-shim, that won't break either of those use cases of course.

euank · on Nov 11, 2020

> is there a name for this bias?

I think it's response bias. https://en.wikipedia.org/wiki/Response_bias

devinplatt · on Nov 11, 2020

I think actually op is referring to https://en.wikipedia.org/wiki/Participation_bias: "Participation bias or non-response bias is a phenomenon in which the results of elections, studies, polls, etc. become non-representative because the participants disproportionately possess certain traits which affect the outcome."

Confusing terminology, but I guess "response bias" is about a false response by the participant to a question, and "non-response bias" is about whether the selected participant responded (participated) at all.

euank · on Aug 13, 2020

A Dockerfile is not a reproducible set of build instructions in most cases. I'd guess that the vast majority of Dockerfiles are not reproducible.

Let's look at an example dockerfile for redis (based on [0])

    FROM debian:buster-slim
    RUN apt-get update; apt-get install -y --no-install-recommends gcc
    RUN wget http://download.redis.io/releases/redis-6.0.6.tar.gz && tar xvf redis* && cd redis-6.0.6 && make install

(Note, modified from upstream for this example; won't actually build)

The unreproducible bits are the following:

1. FROM debian:buster-slim -- unreproducible, the base image may change

2. apt-get update && apt-get install -- unreproducible, will give a different version of gcc and other apt packages at different times

Those two bits of unreprodicble-ness are so core to the image, that they result in every other step not being reproducible either.

As a result, when you 'docker build' that over time, it's very unlikely you'll get a bit-for-bit identical redis binary at the other end. Even a minor gcc version change will likely result in a different binary.

As a contrast to this, let's look at a reproducible build of redis using nix. In nixpkgs, it looks like so [1].

If I want a reproducible shell environment, I simply have to pin down its dependencies, which can be done by the following:

    let
      pkgs = import (builtins.fetchTarball {
        url = "https://github.com/NixOS/nixpkgs/archive/48dfc9fa97d762bce28cc8372a2dd3805d14c633.tar.gz";
        sha256 = "0mqq9hchd8mi1qpd23lwnwa88s67ac257k60hsv795446y7dlld2";
      }) {};
    in pkgs.mkShell {
      buildInputs = [ pkgs.redis];
    }

If I distribute that nix expression, and say "I ran it with nix version 2.3", that is sufficient for anyone to get a bit-for-bit identical redis binary. Even if the binary cache (which lets me not compile it) were to go away, that nixpkgs revision expresses the build instructions, including the exact version of gcc. Sure, if the binary cache were deleted, it would take multiple hours for everything to compile, but I'd still end up with a bit-for-bit identical copy of redis.

This is true of the majority of nix packages. All commands are run in a sandbox with no access to most of the filesystem or network, encouraging reproducibility. Network access is mediated by special functions (like fetchTarball and fetchGit) which require including a sha256.

All network access going through those specially denoted means of network IO means it's very easy to back up all dependencies (i.e. the redis source code referenced in [1]), and the sha256 means it's easy to use mirrors without having to trust them to be unmodified.

It's possible to make an unreproducible nix package, but it requires going out of your way to do so, and rarely happens in practice. Conversely, it's possible to make a reproducible dockerfile, but it requires going out of your way to do so, and rarely happens in practice.

Oh, and for bonus points, you can build reproduible docker images using nix. This post has a good intro to how to play with that [2].

[0]: https://github.com/docker-library/redis/blob/bfd904a808cf68d...

[1]: https://github.com/NixOS/nixpkgs/blob/a7832c42da266857e98516...

[2]: https://christine.website/blog/i-was-wrong-about-nix-2020-02...

ahmedtd · on Aug 13, 2020

Unless something changed in the months since I have used Nix, this will not get you bit-for-bit reproducible builds. Nix builds its hash tree from the source files of your package and the hashes of its dependencies. The build output is not considered at any step of process.

I was under the impression that Nix also wants to provide bit-for-bit reproducible builds, but that that is a much longer term goal. The immediate value proposition of Nix is ensuring that your source and your dependencies' source are the same.

euank · on Aug 13, 2020

You're right that nixos / all nix packages isn't/aren't perfectly reproducible.

In practice, most of the packages in the nixos base system seem to be reproducible, as tested here: https://r13y.com/

Naturally, that doesn't prove they are perfectly reproducible, merely that we don't observe unreproducibility.

Nix has tooling, like `nix-build --check`, the sandbox, etc which make it much easier to make things likely to be reproducible.

I'm actually fairly confident that the redis package is reproducible (having run `nix-build --check` on it, and seen it have identical outputs across machines), which is part of why I picked it as my example above.

However, I think my point stands. Dockerfiles make no real attempt to enforce reproducibility, and rarely are reproducible.

Nix packages push you in the right direction, and from practical observation, usually are reproducible.

beefee · on Aug 13, 2020

This is true, but the Nix sandbox does make it a little easier. If you're going for bit-for-bit reproducibility, it has some nice features that help, like normalizing the date, hostname, and so on. And optionally you can use a fixed output derivation where you lock the output to a specific hash.

juliosueiras · on Aug 13, 2020

the focus of nix in the build process is the ideal of if you have three build inputs bash 4, gcc 4.8.<patch>, libc <whatever version> , and the source of the package being the same(hash-wise) , the output is very much(for most cases) going to be the same, since nix itself(even on non-nixos) uses very little of the system stuff, it won't be using the system libc, gcc, bash, ncurses, etc, it will use its own that is lock to a version down to the hash, it follow a target(with exact spec) -> output , where as Dockerfile more resemble, of a build that is output first , and not doing build very often, this is why Nix have their own CI/CD system, Hydra to allow ensure daily or even hourly safety of reproducible builds

quotemstr · on Aug 13, 2020

Exactly. Basically, if your product needs network access during build, you don't have a reproducible build, and if you don't have a reproducible build, it's only a matter of time before something goes horribly wrong.

euank · on July 24, 2020

This proposal is more powerful than that one rust macro, but rust's abilities around embedding files are much more powerful than go's approach.

This proposal allows "go build" to embed things in a very specific way, but it's not meant to be extensible.

Rust's 'include_bytes!' macro on the other hand is a macro in the stdlib that can be emulated in an external library. I'm fairly sure every feature of go's embed proposal could be implemented via a rust macro outside the stdlib.

For a specific example, I had a project where I wanted to serve the project's source code as a tarball (to comply with the AGPL license of the project). I was able to write a rust macro that made this as easy as effectively "SOURCE_TARBALL = include_repo!()" [0] to embed the output of 'git archive' in my binary.

Of course, there's a very conscious tradeoff being made here. In rust, "cargo build" allows arbitrary execution of code for any dependency (trivially via build.rs), while in go, "go build" is meant to be a safe operation with minimal extensibility, side effects, or slowdowns.

[0]: https://github.com/euank/include-repo

IshKebab · on July 24, 2020

There is a problem with Rust's "anything goes" approach though - it makes it really difficult to know the inputs to compilation. That makes build systems, IDEs, sccache etc. way less robust.

codys · on July 24, 2020

No more than the alternates here:

- go tooling (ides, etc) have to be taught about the _specific_ embedding in the same way one could teach rust tooling about specifically `include_bytes()` (or any other specific macro in the same way one teaches go tooling to handle specific pragmas)

In the world of rust build scripts, there is tooling that exposes information about which files are used if dependency info is all that is required (I don't know to what extent imperative macros are able to expose similar info).

The core of how I see the comparison here: if we restrict ourselves to the capabilities of go pragmas in rust, the same level of support is possible, but even without that restriction there are ways to obtain (though with more work) the same info.

zenhack · on July 24, 2020

> Of course, there's a very conscious tradeoff being made here. In rust, "cargo build" allows arbitrary execution of code for any dependency (trivially via build.rs), while in go, "go build" is meant to be a safe operation with minimal extensibility, side effects, or slowdowns.

I've been working off and on on a language that tries to get the best of both worlds to some extent. The whole language is built around making sandboxing code natural and composable. Like Rust, it has a macro system, so lots of compile time logic is possible without adding complexity to the build system, but macros don't have access to anything but the AST you give them, so they are safe to execute. There's a built in embed keyword that works like Rust's include_bytes, which runs before macro expansion, which you can use to feed the contents external files to macros for processing. At some point I'll probably add a variant that lets you pass whole directory trees.

nicoburns · on July 24, 2020

I believe that despite the macro syntax, `include_bytes!()` is implemented as compiler magic.

erk__ · on July 24, 2020

It is but it could be implemented in a macro,which is what I think the above commenter means.

euank · on Jan 30, 2020

Go's channels aren't exposed outside a specific go process's runtime. The runtime doesn't give you any convenient way to redirect them. They're not like erlang's mailboxes at all in that regard.

Furthermore, channels aren't the primitive used for multiplexing IO / handling connections on a socket in go. You typically have a goroutine (e.g. 'http.ListenAndServe' spins up goroutines), and the gorutines are managed not by channels, but by the internal go runtime's scheduler and IO implementation (which internally uses epoll).

Because of all those things, replacing a running go process that's listening on sockets is no different from that same problem in C. You end up using SO_REUSEPORT and then passing the file-descriptors to the new process and converting them back into listeners. Channels don't end up factoring into it meaningfully.

If you're interested in what this looks like, cloudflare wrote a library called tableflip [0] which does this. I also forked that library [1] to handle file-descriptor handoff in a more generic way, so I've ended up digging pretty deeply into the details of how this works in go.

[0]: https://github.com/cloudflare/tableflip

[1]: https://github.com/ngrok/tableroll