>If a handle is provided to a directory, it should imply access to resources wit...

wahern · on Nov 28, 2018

> Further, if your process is treating '..' as a subdirectory, you're doing it wrong. Paths must be normalized (e.g. ~ expanded, . and .. resolved) before requesting a handle via absolute path.

".." is required for atomic traversal (similar to symlinks), which is important in some situations, such as making sure a file tree you've descended isn't removed out from under you in a way that breaks your state. (The directory itself can be moved while you hold the handle, but the important thing is that being able to rely on ".." permits certain algorithms that are safer and more consistent.) Canonicalizing paths introduces race conditions between canonicalization and actual access, which is why this is performed in the kernel.

Canonicalizing paths makes sense if you're accepting paths from untrusted sources and you cannot make use of POSIX openat() + extensions like O_BENEATH. HTTP GET requests are supposed to be idempotent, anyhow, so there shouldn't exist any sort of race condition as a conceptual matter.

But in regular software, it's better to just pass paths as-is. The shell performs "~" expansion, but it doesn't resolve "." or "..". And the shell performs other expansions, like file globbing, for which there's rarely any reason to implement in regular software. Supporting "~" expansion but not file globbing is inconsistent; if you're not the shell, don't implement shell-like features as it just creates confusion and unnecessary complexity.

Calling ".." a holdout from POSIX is misleading. The semantics exist for legitimate and even security-relevant reasons, albeit not necessarily reasons that an embedded smartphone OS may care about. What POSIX lacks is O_BENEATH (from Capsicum) or a similar flag for tagging a directory descriptor to prevent relative paths or otherwise ascending the tree. Capsicum extensions make POSIX perfectly capable of implementing strict capability semantics, and they do so by extending semantics in a manner consistent with POSIX. POSIX isn't inherently broken in this regard, it's just perhaps too feature rich for some use cases and not feature rich enough for others.

There's no end of complaints about POSIX--either it's too complex or too simple. The fact is, no single specification will ever please anybody, so griping about how POSIX is broken is not only pointless but belies a failure to appreciate the underling issues. The alternative to POSIX is basically nothing. Standardizing on Linux is, at best, a lateral move. Adding optional components to specifications has proven the worse of all worlds, and like most other standards POSIX has been slowly removing some optional components altogether while making others mandatory.

cyphar · on Nov 29, 2018

As a complete aside, I'm currently working on a patchset to add O_BENEATH (and some other path resolution restriction flags) to Linux.

https://lore.kernel.org/patchwork/cover/1011459/

loeg · on Nov 29, 2018

Thanks for working to incorporate it in Linux.

FreeBSD has moved to be compatible with the proposed Linux O_BENEATH as well[1]. We're hoping this helps write portable, capability-restricted code between FreeBSD and Linux.

[1]: https://svnweb.freebsd.org/base?view=revision&revision=33974...

(We already had very similar functionality for Capsicum-restricted directory fds to piggy-back off of.)

cyphar · on Nov 29, 2018

Yes, though FreeBSD's O_BENEATH would probably have to be done using (O_BENEATH|O_NOMAGICLINKS|O_XDEV). Linus, Andy and Al were very insistent that various parts should be split into separate flags.

ken · on Nov 28, 2018

> Further, if your process is treating '..' as a subdirectory, you're doing it wrong.

That's the point they make. And in Unix, it's really easy to do it the "wrong" way (like, "type 3 characters" easy), so lots of people do.

Back before protected memory was common on PCs, some people said "If you're chasing pointers without being sure that it points to a valid address, you're doing it wrong". Well, perhaps so, but in practice lots of programs were doing it wrong, and it was the users who suffered.

Capabilities sound to me a bit like protected memory for persistent storage. It'll be a little inconvenient for a little while, and eventually we'll wonder how we ever lived without it.

wahern · on Nov 28, 2018

> "If you're chasing pointers without being sure that it points to a valid address, you're doing it wrong". Well, perhaps so, but in practice lots of programs were doing it wrong, and it was the users who suffered.

Except ".." also is a solution to permitting you to traverse a directory tree without accidentally chasing an invalid pointer; no bug in your program nor any external application can ever make ".." invalid. ".." is equivalent to a back pointer in a linked list or tree structure. Imagine you have a handle to /foo/bar/baz and want to ascend to baz's parent. Between acquiring the handle to /foo/bar/baz and attempting to ascend to the parent, "baz" could have been moved and "/foo/bar" may not exist. Without ".." all of a sudden you're orphaned and your application is stuck in an inconsistent state. Maybe the best solution is to just panic, but that's like saying that all applications should be prepared for any pointer access to segfault at at any moment. That's one solution, and it's actually how some smartphone application environments work. Another solution is guaranteeing the condition can't happen, period. Which is preferable depends on your use cases and which side of an interface you'd prefer to place the burden. For pointers it's fairly obvious which provides the most preferable semantics for maximum safety (or it was until the smartphone and cloud paradigms), but for file systems the answer is less obvious. In Unix when ascending directory trees the scenario of a valid pointer becoming invalid is impossible such that there's always a valid path to the root of the tree (even if the depth can change; you just stop when opening ".." simply reopens the same directory), but for descent there remains the race between readdir + open (as opposed to readdir atomically returning file handles).

These particular semantics were clearly purposeful, not an accident. The former was thought useful, the latter an acceptable simplification. It's why in Unix you can't delete a populated directory in or when a process holds an open reference (unlike other file types), and why you can't hardlink directories.

ken · on Nov 28, 2018

> no bug in your program nor any external application can ever make ".." invalid

Sure it can, if you consider lack of permissions to be invalid. We already do, in other similar situations.

> Without ".." all of a sudden you're orphaned and your application is stuck in an inconsistent state.

No, it just means you need an out-of-band method to accomplish this.

> that's like saying that all applications should be prepared for any pointer access to segfault at at any moment

No, nobody's talking about crashing. It's more like saying you can't assume you can do raw pointer arithmetic to jump around in an array. Languages like Java and Python feel restrictive to C programmers at first, too.

> Without ".." all of a sudden you're orphaned and your application is stuck in an inconsistent state.

I don't understand these claims of races and segfaults. Doesn't Fuchsia avoid race conditions like this with VFS cookies?

blablabla123 · on Nov 29, 2018

> > Further, if your process is treating '..' as a subdirectory, you're doing it wrong.

> That's the point they make. And in Unix, it's really easy to do it the "wrong" way (like, "type 3 characters" easy), so lots of people do.

The OP is writing about iterating subdirectories and accidentally treating ".." as such, if you use certain low level functions - which is kind of bad practice in most cases anyway.

> That's the point they make. And in Unix, it's really easy to do it the "wrong" way (like, "type 3 characters" easy), so lots of people do.

Yeah, pointing to a wrong directory can lead to unexpected results. On the other hand, if this becomes a security issue, then maybe the process should have properly restricted rights as pointed out elsewhere already.

Talking about "..", one could BTW extend the discussion to mixing Filename characters with path separator characters in the same string. ;)

ninkendo · on Nov 28, 2018

The answer to that for a capability-based system would be to not grant a process access to /Users, but instead give it an opaque handle that grants access to /Users/delinka. It's definitely not how unix systems work (where you need read access to all of the parents to access a child directory), but in a capability-based system it makes sense IMO.

cyphar · on Nov 29, 2018

You don't need read access to all parents (on Linux and BSDs at least) -- a privileged can pass a dirfd to a less privileged process and that process can access paths under that dirfd without any permission checks being done for parent directories of the dirfd.

nine_k · on Nov 29, 2018

This is correct. But this is not traversing a path, or even knowing it.

loeg · on Nov 29, 2018

The comment was refuting the claim, "where you need read access to all of the parents to access a child directory."

daurnimator · on Nov 29, 2018

You don't need read access, only execute access.

Ajedi32 · on Nov 28, 2018

Sure, you can do it that way; but now that process being run by bob has full access to everything owned by bob (including, for example, `/Users/bob/.ssh/id_rsa`).

Fuchsia considers that level of access to be unacceptably broad for most applications, which is why it uses a capability-based permissions model instead of a user-based one.

cryptonector · on Nov 28, 2018

Indeed, this ship has sailed. And if .. did not exist then chdir(2) would be the same as chroot(2) unless knowing an absolute path was enough to allow you to access it (assuming --x permissions on the path's dirname's directory components) then, yeah, you wouldn't gain that much as many paths can be guessed.

There just isn't a short-cut for making sandboxes trivial to setup.

I really wish that Solaris/Illumos Zones were standard on Linux. You could have really light-weight containers as anonymous/ephemeral zones whose "init" is the program you want to sandbox, and more heavy-duty guest-like containers as Zones already is.

The difference between Zones (or BSD jails) and Linux containers is that with Zones (jails) you have to explicitly decide what to share to the zone, while with clone(2) you have to be explicit about all the things you DON'T want to share with the container. I.e., Zones requires white-listing while containers requires black-listing, and we all know that black-listing doesn't work as a security device. Granted, the kernel developers could have forgotten to virtualize something important, but when they fix that you don't have to modify and rebuild the zone/jail launcher.

kiriakasis · on Nov 28, 2018

> unless knowing an absolute path was enough to allow you to access it

If understand correctly in fuchsia "absolute path" is always relative to a filesystem handle so knowing it and being able to use it are pretty similar

cryptonector · on Nov 28, 2018

Ok, that works, though you pay a price: you have to keep track of a fair number of such handles. You'll need one for /usr/bin, and for /bin, and all the lib and libexec and share and varstate directories, and /etc. You do get to not let processes see $HOME if you don't want to, and that's very nice.

In a shell one would have to expose a path->handle dictionary for scripts.

Faark · on Nov 29, 2018

You mean key->handle dict? Most environment variables on my windows seem to be what you are describing, except currently being path strings instead of file system handles.

loeg · on Nov 28, 2018

I think you're confusing namespace and unix file permissions.

You can think of capability-restricted directory descriptors as (sort of) individual-fd chroots. File permissions still apply inside a chroot. But the namespace of anything outside the chroot is totally inaccessible.

xg15 · on Nov 28, 2018

> Lastly, this document reads as if knowing a full path grants access to that path and its subdirectories. If that's the case ... oy.

Well, they speak of the path as a "resource provided to a subprocess". In that context, it sounds more like a handle/file descriptor that the child process can pass to some "read", "write" or "get handles of children" syscalls - and that happens to correspond to the file object at /home/bob/foo.

If so, it wouldn't imply that knowing (or guessing) the string "/home/bob/foo" would automatically give you access to the handle.

That's just my reading of our though, no idea of that is what they actually do.

xg15 · on Nov 29, 2018

* our -> it

rzzzt · on Nov 28, 2018

NTFS has a toggle bit for access inheritance: http://www.ntfs.com/ntfs-permissions-explicit.htm

trasz · on Nov 28, 2018

Not just NTFS; it’s a common feature in systems that support NFSv4 ACLs, like Solaris or FreeBSD.

noja · on Nov 28, 2018

It does on Ubuntu. /home/* is readable and executable to everyone.

caymanjim · on Nov 28, 2018

Ubuntu, with default configuration, does create new user directories as globally-readable, but that's just a poor security decision on their part. There's nothing special about /home itself; it's just the default behavior of adduser. And the permissions on /home have no effect on what the permissions are on new subdirectories created when a user is added.

userbinator · on Nov 29, 2018

Lastly, this document reads as if knowing a full path grants access to that path and its subdirectories. If that's the case ... oy.

Indeed. When I read...

As a consequence, this implies that a handle to a directory can be upgraded arbitrarily to access the entire filesystem.

...I was wondering whether the author even knows what filesystem permissions are and how they work. I say let the filesystem handle resolving relative paths; and let the permissions system handle the check on whether one is allowed to access the referenced object.

ithkuil · on Nov 29, 2018

User centric permissions are too broad. Classic example: you might not want your browser to ever access your own ssh private keys

The topic here is to let a user start a process and pass a restricted view of the file system to that process which in turn can spawn child processes to which it could restrict access even further. In order to make it possible to do useful work it's sometimes necessary to also pass around handles/filedescriptors between processes (possibly within different sandboxes) and it's a good idea that the rules governing the view narrowing are not broken.