Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As I’ve noted above, this problem would go away if data streams were adequately tagged in the first place.

Having that high-level knowledge of data structure enables all sorts of intelligent automation.

In the event that the client uses a different memory layout, it could look up a coercion handler that converts the supplied data from its original layout to the layout required by the client. This is, for instance, how the Apple Event Manager was designed to work: all data is tagged with a type descriptor:

typeSInt16, typeSInt32, typeSInt64 typeUInt16, typeUInt32, typeUInt64

typeIEEE32BitFloatingPoint, typeIEEE64BitFloatingPoint, typeIEEE128BitFloatingPoint

typeUTF8Text, typeUnicodeText (UTF16BE), typeUTF16ExternalRepresentation (UTF16 w. endian mark)

typeAEList (ordered collection) typeAERecord (keyed collection)

and so on. (The tags themselves are encoded as UInt32; nothing so advanced as MIME types, but at least they’re compact.)

The AEM includes a number of standard coercion handlers for converting data between different representations, and clients may also supply their own handlers if needed. Thus the server just packs data in its current representation, and if the client uses the same representation then, great, it can use it as-is. Otherwise the client-side AEM automatically coerces the data to the form the client as part of the unpacking process.

There are limits in the AEM’s design, not least the inability to describe complex data (arrays and structs) with a single “generic” descriptor, e.g. `AEList(SInt32)`. That would vastly simplify packing and unpacking—in best case to simple flat serialization/deserialization, at worst to a single recursive deserialization—instead of two recursive operations with lots of extra mallocs and frees for interim data. But the basic principle is sound, and adheres well to the “be liberal in what you accept” principle.

I believe Powershell does something similar when connecting outputs to inputs of different (though broadly compatible) types, intelligently coercing the output data to the exact type the input requires. No manual work required; it “Just Works”.

Or, if you don’t mind the extra overhead then content negotiation is also an option, which is something HTTP does very well (though web programmers very badly). That is advantageous when communicating with “less intelligent” clients as it permits the server, which best understands its own data, to pre-convert (e.g. via lossy coercion) its data to a form the client will accept.

Lots of ways that Unix’s “throw its hands up and dump the problem all over the users” non-answer can be massively improved on, in other words, without ever losing the lovely loose coupling that is a Unix system’s strength. It only requires a single piece of essential—yet missing—information: the data’s type.



The problem outlined in the original article isn't about data streams. It is, at bottom, about the contrast between data storage and in-memory representation.

Typed data streams were not invented by Apple. Back in the 1980s, there was (for example) Sun's RPC mechanism, which gave you "seamless" remote procedure call, including transfer of arbitrary structures over a network.

But the original post is much more about filesystems. I used the socket example merely to illustrate the problem, not the actual topic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: