Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One problem is that you can't filter its "syscalls" as you can regular syscalls. This removes a security boundary that e.g. container runtimes regularly use. So you cannot use it in your regular kubernetes cluster without weakening its security for these pods.


This just reinforces the (maybe unfounded) impression that security is a secondary consideration, and performance is primary.

I'd use io_uring in a heartbeat on a dedicated system where the job is only I/O and security isolation isn't a concern. But multiuser/multiapplication/networked? Not a chance.


I think there is a very large amount of overlap between the people who

1. know what io_uring is

2. are interested in performance enough to look at improvements based on new linux kernel system calls and talk about it in public

3. care about security in multitenant environments or the syscalls used by third party libraries

I think io_uring right now probably makes a lot of sense for HPC and highly technical, performance-sensitive financial stuff, but they can be kind of insular. I don't think most linux hobbyists really need the performance benefits enough t care about it, and most businesses are using a major cloud vendor/don't have the scale or expertise to be thinking about this kind of stuff. Which leaves major cloud providers and really big businesses like Meta with their own internal clouds as the ones that stand to benefit enough to care about performance while really caring about security


For me it's less about performance than cleaner concurrency. Do you know (unless this has been fixed recently) that io_uring is the only way you can asynchronously open a file? Erlang and GHC both have lightweight threads/processes that use asynchronous i/o (for sockets, say), but they keep a separate OS threadpool to be able to do stuff like open files. io_uring lets you write an actual multitasking OS-like thing that runs in a single Linux thread.


There should be no issue with disabling it altogether by banning its setup and usage syscalls.


Which would be prone to misconfiguration, accidents and exploits. Better to not include it at all.


Are you saying it’s impossible to misuse disabling the accept syscall but it’s prone to misconfiguration with disabling io_uring_enter?


I'm saying that just compiling a kernel with stuff not compiled in is misuse-proof. That way you can disable io_uring entirely (but not accept()).


Yup, but that leads to io-uring devs complaining that people dislike software using io-uring because it doesn't run in containers/etc blocking io-uring entirely


Isn't the issue here just that io_uring needs to be enhanced such that, when a seccomp-bpf filter is installed, the filter gets called to approve each SQE, before it gets executed?


That can be done, but reading https://lwn.net/Articles/902466/, writers of security tools are unhappy that:

- io_uring initially was conceived without considering security or auditing tools

- io_uring later was changed to allow ioctl calls, even though security people do not like ioctl because what its arguments mean depends on the device being called (possibly even on the version of the driver), not on the type of device, and often is poorly documented, making it hard for a security filter to decide what to do with a command.

That also made them fear that similar security-breaking changes might be made in the future.


I don't think this is an appropriate use of "just". If io_uring doesn't work with seccomp-bpf filters today, there are many situations where you just can't use it, period.

That someone with kernel IO dev experience may be able to relatively easily add such a fetaure in the future (though I would doubt that, given that it hasn't yet been implemented apparently) doesn't make it a small problem.


I believe you can deny io_uring altogether with the syscalls io_uring_enter, io_uring_register, io_uring_setup?

This would be useful if you want to boot with io_uring but deny it for some sensitive workloads.


What regular filter for syscalls do you use?


seccomp BPF, eBPF, in a way SELinux/AppArmor/Tomoyo/..., maybe you can even call namespaces some kind of syscall filter. And then there is the auditing framework, where you can at least record which critical syscalls were performed.

Nowadays its mostly a combination of eBPF, SELinux and auditd plus namespaces in case of containers. Usually in the combination that some distro ships, so nothing really fancy.


seccomp-bpf, for instance.


Is that a true limitation that cannot be overcome? Are solutions possible and/or available, but require further work to be shipped?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: