Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Two pieces of code someone really ought to write (rondam.blogspot.com)
57 points by lisper on Aug 28, 2010 | hide | past | favorite | 34 comments


There are a few attempts to make sockets look like a Unix filesystem, but none seem to have caught on, perhaps partly because sockets are pretty portable, while fs-like interfaces, unless one gets widespread support to the extent of being a POSIX-like "assumed to be on all UNIXes", are inevitably tied to a particular OS flavor.

I believe Plan9 was the first to have support for the idea, though, and NetBSD has had it as an option for about 15 years, though I'm not sure if it's currently maintained. Here's an old paper about NetBSD's implementation: http://www.kohala.com/start/portals.ps

The existence of netcat also makes it a bit less pressing imo. In the cases where I want a quick shell script that does sockets, I can use netcat as basically a socket library. It's not the everything-is-a-file style of Unixy coding, but it is at least the pipe-simple-utilities style of Unixy coding. And if I want to write a "real app" in some higher-level language, the language can wrap socket-opening at the library level to look like file-opening if it wants to, so it doesn't really have to be done by the OS.


FreeBSD has this (mount_portalfs, whose man page references the paper you link to), though I've only messed with it sparingly out of curiosity. Not sure how stable it is, but I thought it was really cool when I stumbled upon it one day.


socat is a little closer to what you really want than netcat: http://www.dest-unreach.org/socat/


> while fs-like interfaces, unless one gets widespread support to the extent of being a POSIX-like "assumed to be on all UNIXes"

FUSE runs on pretty everything, doesn't it?


It runs on most popular Unix-ish systems.


Regarding the first point (socket as file), Plan 9 operating system is not using any system call for socket but use a pseudo file-system /net.

http://doc.cat-v.org/plan_9/4th_edition/papers/net/

Using the same approach in a fuse implementation would make sense.


FWIW, Bash has this built in;

    cat </dev/tcp/www.google.com/80
It's unfortunately disabled in the bash that ships with debian, but should work pretty much everywhere else.


awk has something similar too:

  The following special filenames may be used with the |&
  co-process operator for creating TCP/IP network connections.

  /inet/tcp/lport/rhost/rport  File for TCP/IP connection 
  on local port lport to remote host rhost on remote port 
  rport.  Use a port of 0 to have the system pick a port.


Is there anything similar in zsh? That could be quite helpful?


There are two much more powerful tcp interfaces built in to zsh:

The zsh/net/tcp Module:

http://tinyurl.com/3alarxf

And zshtcpsys:

http://tinyurl.com/32z6bwe


If you allow this:

open("/fuse/sockets/www.whatever.com/tcp/80", "r+")

Then what do you do with these?

/fuse/sockets/www.whatever.com/tcp /fuse/sockets/www.whatever.com /fuse/sockets

They and other variations would each need different semantics, some allowing rw, some ro, some wo, and some not being allowed at all. Some path segments would not allow arbitrary names (tcp/udp, port number) while others require a specific format (host/ip) and others are arbitrary. Some have to be directories, some have to be files, and some are neither. None of this is very filesystem-like. I think the current design is right: opening sockets is kind of special, but you get a filehandle that is very much like an ordinary filehandle.


> Then what do you do with these?

The same things you do with any other incomplete path name. I don't see why this is an issue at all.


There's no such thing as an "incomplete path name", those have to be directories.


So treat them as directories (or more specifically, as mount points). I still don't see the problem.


Treat them as empty, read only directories?


The first issue is a programming language problem, not an OS problem. His language of choice exposes the direct low-level API, which is not usually what application-level programmers want.

A few library functions, and the problem is solved without FUSE or a performance impact. And it's still portable to everywhere.


But then you have to implement the solution in every language, while with FUSE, you implement it once and each language runtime instantly has access to it.


Knock yourself out, then. Every language I use (including C) already has a sugary socket API, though.


Regarding your first point... You forget one thing: you open a file and get a file descriptor. If you open a socket (though it takes 3 calls), it also returns a file descriptor. Sockets can then be written to and read from just like files. Won't even get started then about the fact that sockets allow you to do just about any type of networking, and are very abstract by purpose, to allow a variety of uses.


Sounds like opening a socket could be reduced to one call, so why not fopen()? Seems like it'd be an easy thing to get into a kernel, any reason not to?


The question is if fopen() would then still work with embedded or archaic architectures. Examples such as radio links on tiny 16-bit hardware come to mind.


NO NO NO!!! For the sake of all that is holy don't do it!!!!

Or rather, read & understand the fallacies of network programming: http://www.jezuk.co.uk/cgi-bin/view/jez?id=2650 before you do it.

The failure mode of networks is quite different to that of files, and the programming model reflects that.


> The failure mode of networks is quite different to that of files

Actually, with a remote file system, the failure modes of networks is necessarily exactly the same as that of files.


Yes, and you'll note that a huge amount of the remote file driver code involves dealing with those failures.

If you drop that down a level and turn sockets into files then suddenly programs that are written to work with files will try to deal with these kind-of-file-but-not-really things, and fail in unexpected ways.


What about files hosted over the network? Such as on a SAN?


What about them? Every last point on that list applies to them, by necessity, because they automatically apply to all networks.


"But now suppose that I have a single machine with a single IP address hosting multiple virtual servers, and I want to replicate this setup for each virtual server, i.e. I want each virtual server to have its own instantiation of the custom server application. Now I have to manually assign each instantiation of the app to a separate TCP port number. If I have hundreds or thousands of virtual servers on the same machine (Oh? You think that's not reasonable? Can you say "multi-core architecture"?) that can become a serious administrative (to say nothing of security) nightmare."

Not quite sure I'm interpreting you correctly here - but woudln't you be using localhost for this and/or using loopback interfaces?

Also - if you were designing an application that had to scale out to that many cores, you'd be dealing with things at a system level and using sockets anyway. Whether you are opening unix domain sockets or assigning ports, you still have to track and configure everything - and that can still be automated. While it would be a neat feature - it doesn't seem like a terribly necessary one.

Yes, it would be cool if apache could forward things to a local unix domain socket. Agreed there.

But there's no security nightmare unless you create one - in the instance you describe you bind applications to loopback interfaces, either on ports or creating more loopback interfaces. Designing the applications to use TCP sockets rather than unix domain sockets also leaves you with more flexibility when it comes to design decisions later on - you don't have to leave them on the same machine.

ALso - one wouldn't generally use apache for this - one would use something else as a front end these days that's better suited to forwarding requests and dealing with timeouts, resources, etc.


> Not quite sure I'm interpreting you correctly here - but woudln't you be using localhost for this and/or using loopback interfaces?

Sure. But you still have to assign a port number. And you still have to make a round trip through the network stack. It's also a potential security hole if you're not careful to configure the server apps to only listen on the loopback interface.

> Also - if you were designing an application that had to scale out to that many cores, you'd be dealing with things at a system level and using sockets anyway.

Why? Running N servers on N machines is completely straightforward. Why should it not be just as straightforward to run N servers on one machine with N cores?

> you still have to track and configure everything - and that can still be automated

Of course it can be automated, but automating the assignment of virtual servers to TCP/IP ports is not straightforward. You have to coordinate the port assignment on both the client and the server side. You have potential synchronization issues if one virtual server goes down and another comes up and wants to use the same port as the one that went down. Port numbers are a scarce resource. The file system namespace is essentially infinite. Why make things harder than they need to be?


About the first point: several people have mentioned plan9, GNU awk also handles sockets as a virtual filesystem. You seem to think it's better handled at an OS level (and I agree), but it's a good way to try it out.


You can sorta get what he's asking for with a bash feature. http://tldp.org/LDP/abs/html/devref1.html#DEVTCP


Yes but thats a hack built into bash

The point of unix is to do everything in the file system then anything that can open a file can do everything


Nginx can proxy incoming connections to unix sockets

http://wiki.nginx.org/NginxHttpProxyModule#proxy_pass


My network programming background is a bit small. Could someone explain for me the difference between what OP calls for with sockets and a tun/tap device?


A tun/tap device is a way to write a user space network driver for a piece of hardware to make it look like a network device to the OS. ie: you could make a serial serial cable look like a network device.

What this blog is asking for is a different way to access network resources so instead of calling a function like socket(some_ip, some_port, SOME_PROTOCOL); (which is a gross over simplification for the typical way it is done) he wants to be able to say open("/net/some_ip/protocol/some_port", "rw");




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: