So, this is making a TCP Connection serializable in an ad hoc fashion.
> It is natural to want those connections to follow the container to its new host, preferably without the remote end even noticing that something has changed, but the Linux networking stack was not written with this kind of move in mind.
Some languages provide serialization of most anything by default, such as Lisp. Now, even in Lisp there are objects which don't make sense to serialize, including a TCP connection; however, the components thereof can be collected and sent across the wire or wherever else in a standardized way. The C language, in comparison, offers a few serialization routines for non-structured types, and that's about all.
So, my point is the ability to take running state, serialize it, and reinstate it elsewhere is only impressive to those who have misused computers for so long that they don't understand this was something basic in 1970 at the latest.
But this isn't state of the process image that needs to be serialised, it's state of the connection between two hosts and some kernel configuration on those hosts. Programming language doesn't play into it at all. Languages "such as Lisp" will have the exact same problem, for the same reason. Collecting all of the "components" of the connection and sending them to a different host won't make the other host start sending packets to the new recipient, or replay the in-flight packets (which is state on intermediate routers, different computers than the connected ones entirely), or fix the ARP tables on the neighbouring hosts. None of that is available, and certainly isn't writeable, to the host doing the serialising.
To play some silly semantics games, this isn't so much about _serialising_ a connection as it is about _deserialising_ the connection and having it work afterwards. That act has literally nothing to do with programming language.
> But this isn't state of the process image that needs to be serialised, it's state of the connection between two hosts and some kernel configuration on those hosts. Programming language doesn't play into it at all.
It is because UNIX is written in the C language that there are even multiple flat address spaces instead of segments or a single address space systemwide. The fact that the kernel exists at all is also due to this. It has everything to do with the implementation language.
> Languages "such as Lisp" will have the exact same problem, for the same reason.
Under UNIX, yes.
> Collecting all of the "components" of the connection and sending them to a different host won't make the other host start sending packets to the new recipient, or replay the in-flight packets (which is state on intermediate routers, different computers than the connected ones entirely), or fix the ARP tables on the neighbouring hosts. None of that is available, and certainly isn't writeable, to the host doing the serialising.
It may very well require some specialized machinery, but not nearly so much as one may think to be necessary.
> To play some silly semantics games, this isn't so much about _serialising_ a connection as it is about _deserialising_ the connection and having it work afterwards.
That's implicit. I needn't write of deserializing when writing of serializing, as one is worthless without the other, at least in most cases.
> That act has literally nothing to do with programming language.
Look at what Lisp and Smalltalk systems could do before UNIX existed and tell me that again.
> It is because UNIX is written in the C language that there are even multiple flat address spaces instead of segments or a single address space systemwide.
That is flat out wrong. C supports multi-programming in a system that has one address space (that includes the kernel too). Programs just have to be compiled relocatable.
You know, like what happens with shared libraries: which are written in C, and get loaded at different addresses in the same space, yet access their own functions and variables just fine.
Multics used segments and Lisp Machines had a single address space. UNIX breaks down quickly without multiple fake single address spaces for each program.
> Programs just have to be compiled relocatable.
Yes, and with unrestricted memory access, one program can crash the entire system.
> You know, like what happens with shared libraries: which are written in C, and get loaded at different addresses in the same space, yet access their own functions and variables just fine.
That is except when one piece manipulates global state in a way with which another piece can't cope, and at best the whole thing crashes. Dynamic linking in UNIX is so bad some believe it can't work, and instead use static linking exclusively.
> UNIX breaks down quickly without multiple fake single address spaces for each program.
So do MS-DOS, Mac OS < 9, and others: any non-MMU OS.
> Yes, and with unrestricted memory access, one program can crash the entire system.
That's true in any system with no MMU that runs machine-language native executables written in assembly language or using unsafe compiled languages.
Historically, there existed partition-based memory management whereby even in a single physical address space, programs are isolated from stomping over each other.
> when one piece manipulates global state in a way with which another piece can't cope
This problem is the same with both static and dynamic linking.
And lisp too!
> UNIX breaks down quickly without multiple fake single address spaces for each program.
Citation needed. I don't think my programs very commonly try to go completely outside their address space. The closest thing I see is null pointer crashes, which are still not very common, and those would work the same way in a shared address space.
Edit: Yes, fork doesn't work the same. That's a very narrow use case on the vast majority of machines.
But this isn't about address spaces. They're moving connections between hardware hosts. It sounds like you're got your drum to beat but this isn't about that.
This is right up my alley as it were. I'm not going to read the entire document, but I can remark on how the assembler language model is inferior to something I've created. Details are in my user page. In brief, an assembler language is not only a batch tool, but hides details from the hacker. Optimizing machine code requires the ability to use instructions as data when the opportunity be noticed, but that requires noticing it at all, which requires seeing the numerical values; and an assembler language also makes it difficult to do things such as put labels in the middle of an instruction. Assembler languages often introduce arbitrary name arithmetic and special names to handle this and other cases, but I found that to be inelegant; I've found such to be unnecessary; merely naming individual octets is generally sufficient.
Dang told me I should upload links to places other than my website, so that's what I'm doing. What does it mean for society, when people realize "go build your own" won't be tolerated either? Democracy requires the ability for people to speak, does it not? What does it mean for democracy when one side is told it mustn't have a voice?
It's a shame he doesn't mention Ada, which is static but oh so nice about it.
Right, but the problem is – as Peter Harkins mentions here – that programmers have this tendency to, once they master something hard (often pointlessly hard), rather than then making it easy they feel proud of themselves for having done it and just perpetuate the hard nonsense.
Yes, see the entire history of UNIX. I convinced someone programming was a bad choice for his major because of this stupid attitude.
I'm still reading, but I like how he takes issue with what currently passes for machine text. Some of my work covers the same problem. I should send him an e-mail.
Well, there are lots of replacements for common commands like "ls", "grep", "find", even "cd", and some of them are pretty popular. And there is a wide variety of shells and terminals. No one is complaining about them and the worst you get is "I don't care/not my cup of tea" attitude.
Of course the key idea is to keep compatibility with existing world, for example by choosing a new command name ("rg"? "ack"?). If you just take over the existing name that has been used for years and break existing scripts, people will be unhappy.
I like pointing out that millions of lines of code in the Linux kernel have no real memory exhaustion strategy beyond randomly killing a process. Those are millions of lines of code, few of which are reusable, and they do so very little.
This I feel like is one of the better points in the thread.
The asymmetry that exists in copyright law where large corporations can enforce their copyright to the point of breaking the law themselves (YouTube's content ID is another non-legal, but still very impactful example) is absolute bullshit.
Unfortunately I think that if training ML models on Internet-data is found not to be fair use then things will get harder for individuals training models and corporations will be barely inconvenienced as they can afford to pay for sources, make deals with other large institutions for data, etc.
It's funny how this argument doesn't fly with people when it comes to blocking advertisements, with one man claiming it takes food out of his child's mouth.
> It is natural to want those connections to follow the container to its new host, preferably without the remote end even noticing that something has changed, but the Linux networking stack was not written with this kind of move in mind.
Yes, it was written in the C language.