Tuple Spaces – Good Ideas Don't Always Win (2011)

nostrademons · on July 29, 2018

The CTO at my first job out of college was the chief architect for JavaSpaces. I think we'd used it in an early version of the product but had migrated away from it by the time I joined full-time.

Aside from the fairness issues that other commenters mention, tuple-spaces work at an awkward level of abstraction. You can easily implement many other distributed concurrency models with them - semaphores, message queues, producer/consumer channels, broadcasts, even MVCC and transactions - but oftentimes, in the application it's more natural to just use the more specific abstractions. For example, you could implement a message queue by putting tuples of [type tag, sequence number, ...data] in and taking out the next message of a given type - but usually you'd want guarantees that there are no name collisions among type tags, and that sequence numbers are never skipped, and a mechanism to resend missing messages if for whatever reason they aren't produced correctly. At that point you'd rather just use golang-style channels or a real message queue library rather than roll that on top of the tuple-space.

There are relatively few problem domains I can think of that map directly to a tuple-space, without building some other concurrency abstraction on top of it. Dependency graphs and workflows, perhaps, but there are other libraries to handle that specific problem which also handle things like tracing, debugging, and error correction.

ralphc · on July 29, 2018

Early to mid 2000's a product I worked on had a messaging abstraction that you could use JMS, MQSeries, MSMQ, etc. We tried, really tried, to use JavaSpaces but it had no FIFO guarantee. You could theoretically add and get tuples and one could stay in the space for days. A non-starter for that.

esfandia · on July 29, 2018

It introduces a nice level of indirection between different actors, so that they don't need to know about each other's existence in order to communicate and solve problems together. The actors can be added and removed dynamically, and so the problems to solve and the problem-solvers can evolve over time. It's a pretty powerful and flexible idea! To use GoF terminology, it's like having a Mediator for a bunch of asynchronous and fully decoupled pub-sub Observers and Observables. It's a way to implement Barbara Hayes-Roth's Blackboard model.

JavaSpaces, TSpaces emphasized the over-the-internet aspects of it, which didn't matter to us for our project. We ended up finding and using something much simpler called LighTS. With all the nice support for concurrency in Java these days though, it wouldn't be much work to put one together from scratch.

yagyu · on July 29, 2018

I used tuplespaces to implement a poor man's distributed computing for Matlab back in 2009 or so.

It would simply put Matlab code and parameters in a tuple, a worker would pick it up, compute, and put the results back. Used it to distribute the heavy function evaluation in a genetic optimization.

It was very easy and trouble free..

Edit: mixed up the implementations, I used TSpaces by IBM http://www.almaden.ibm.com/cs/TSpaces/Version3/ClientProgrGu...

Immortalin · on July 29, 2018

Sounds like the current Function-as-a-service trend?

DonHopkins · on July 29, 2018

"Go Service Yourself" is the latest fad in Concurrent Micro-Managed Golang Function-As-A-Service Oriented Architecture Implementing Robust Distributed Parallel Y-Combinator Cloud Servers with Amazon Lambda, so other Amazon Lambda Functions can recursively call Themselves without knowing Their own names.

DonHopkins · on July 29, 2018

In the discussion of "X and NeWS History", I mentioned "PIX", which integrated PostScript with tuple spaces on Transputers, in thread about how X-Windows is actually just a terribly designed and implemented distributed database with occasional visual side effects and pervasive race conditions:

https://news.ycombinator.com/item?id=15327211

Jon Steinhart: "Had he done some real design work and looked at what others were doing he might have realized that at its core, X was a distributed database system in which operations on some of the databases have visual side-effects. I forget the exact number, but X includes around 20 different databases: atoms, properties, contexts, selections, keymaps, etc. each with their own set of API calls. As a result, the X API is wide and shallow like the Mac, and full of interesting race conditions to boot. The whole thing could have been done with less than a dozen API calls."

To that end, one of the weirder and cooler re-implementations of NeWS was Cogent's PIX for transputers. It was basically a NeWS-like multiprocessing PostScript interpreter for Transputers, with Linda "tuple spaces" as an interprocess communication primitive:

http://ieeexplore.ieee.org/document/301904/

The Cogent Research XTM is a desktop parallel computer based on the INMOS T800 transputer. Designed to expand from two to several hundred processors, the XTM provides a transparent distributed computing environment both within a single workstation and among a collection of workstations. Using Linda tuple spaces as the basis for interprocess communication and synchronization, a Unix-compatible, server-based OS was constructed. A graphic user interface is provided by an interactive PostScript window server called PIX. All processors see the same set of system services, and within protection limits, programs capable of using many processors can spread out over a network of workstations and resource servers, acquiring the services of unused processors.

https://en.wikipedia.org/wiki/Transputer

http://wiki.c2.com/?TupleSpace

https://en.wikipedia.org/wiki/Tuple_space

wmleler · on July 29, 2018

Another good article to read about this is "Linda Meets Unix", which was published in IEEE Computer magazine. https://ieeexplore.ieee.org/document/44903/ (full disclosure: I wrote it).

sitkack · on July 29, 2018

Don, I absolutely love your posts. Please keep them coming!

toast0 · on July 29, 2018

This sounds interesting, but feels like magic. How do the tuples get to where they're going? In my experience, computing bits that feel like magic have hidden costs, that are usually rather high.

Contrast this approach with Erlang, there's still a lot of tuples, but you have to (somehow) know where to send them at a (sometimes high) human cost to developers, but usually low runtime cost.

makmanalp · on July 29, 2018

It's better to think of tuple spaces as a concurrency / communications model than an implementation. So it's more like "the actor model" rather than Erlang's or Java/Akka's specific implementation of it. It's more about "if we had this type of system with these constraints and these features, abstracting away these details, what would we gain or lose?". You're right that in the end a good or bad implementation can make or break things (take a look at this paper: https://arxiv.org/abs/1612.02979), but that's not the point, at least with the original paper.

The interesting thoughts from the paper as far as I can see were: 1) Tuple spaces are programming language or architecture or program independent, and vastly different programs can communicate with each other 2) You don't communicate directly to other agents by address, you write to a topic and read from a topic, which is a form of decoupling producers and consumers 3) The "block when nothing to read in this topic" idea, which makes programming coordination SO easy. I guess it's a bit like unix pipes.

If tuple spaces don't seem that interesting and novel, it's probably because of the benefit of hindsight and that a lot of these ideas are so subsumed into the tools of today. I can't definitively make the claim that Linda is the cause of this, but I suspect it had some effect. I think the original author also had a lot of wacky ideas around "cyberspace" and all that, but that's another deal and I don't think it's why people find the Linda paper interesting now. The closest useful descendants of Linda to be seem to be modern Pub / Sub systems or coordination databases like RabbitMQ, Kafka, Zookeeper.

toast0 · on July 29, 2018

I can't run my service on a model, I have to run it on an implementation. When people try to abstract away the hard parts of the problems, it usually leads to bad results. For examples, SQL let's you express all sorts of interesting queries, very few of which are a wise idea to run thousands of time per second; ORMs make this even worse -- how data is stored and how data is queried really need to be determined in concert in order to make data storage and retrieval work, assuming you have sufficient data and query volumes to care -- if you've got only a few thousand pieces of data, it barely matters. That's not to say SQL isn't useful, or interesting, it's just not enough.

In this case, the mechanism seems too general -- if I want to read all the outstadning tuples with 2 as the second elememt, that's possible, but seems rather hard. If it's really about consuming tuples within a named channel, I want to know more about the expected or desired properties of distribution -- how is it determined which agent gets to consume a tuples when multiple request a matching tuple simultaneously, which tuple is consumed when many tuples exist that match. What guarantees are needed to ensure progress, what guarantees are hard to provide, what guarantees are useful, but not required etc. Is this the most basic abstraction for distribution, upon which other useful abstractions can be built -- or are there other underlying basic abstractions that are needed, are there useful abstractions which cannot be built upon this, etc.

naasking · on Aug 8, 2018

> In this case, the mechanism seems too general -- if I want to read all the outstadning tuples with 2 as the second elememt, that's possible, but seems rather hard

How is this any different than an SQL query with 2 in the second column? Tuple spaces seem like a RDBMS with a more restricted query model.

teilo · on July 29, 2018

I immediately thought of ETL when I read this.

I don't see how this tuple space concept can work possibly replace robust supervised process management. Also, how is tuple space different than message passing? Somehow a process needs to know what values it is supposed to consume. Something has to manage that, and before you know it, you are passing messages through tuple space and have, essentially, re-invented the wheel.

sago · on July 29, 2018

Tuple spaces are simply a way of coordinating between distributed processes.

> possibly replace robust supervised process management.

No more than message passing replaces supervised process management. A supervisor might use message passing to do its work, or it might use a tuple space, or something else again. It is still a separate thing.

> how is tuple space different than message passing

Message passing is an example of a shared nothing coordination scheme. For process A to know about some information, process B needs to send a message containing a copy of that information. This is useful, but not always practical. So few systems, even that use message passing, use only this simple form of message passing (more on that below).

> Somehow a process needs to know what values it is supposed to consume.

Of course. Is is no different regardless of the communication mechanism. Processes need to know what data they need to know. If anything, this is more difficult for message passing: where the sender needs to know what the receiver might need.

> Something has to manage that

Not necessarily. The processes can just go and read the data from memory. You would be right if you think that sounds like a terrible idea. But unfortunately it was, and still is, more than common. To avoid catastrophe it requires careful use of data locks, mutexes and dances around race conditions and deadlocks.

So you are right in thinking that managed approaches are superior. Message passing is one such approach. Tuple spaces are another.

> before you know it, you are passing messages through tuple space and have, essentially, re-invented the wheel

Partially true, but the opposite is more common.

Effectively tuple spaces use a database (of tuple data) as a coordination system, along with one blocking primitive (wait for data).

You can, of course, use the data in the database as messages, create some kind of message brokering process, and reinvent message passing. But the opposite is also true. In a complicated system co-ordinated with message passing, you often end up with processes whose job it is purely to hold and be queried for data by multiple consumers/modifiers. Effectively this is coordinating via data. Or re-implementing a cut down tuple space via message passing. When you start to design a 'querying' protocol to allow different processes to access and wait for data in a consistent way, regardless of what that data is, you are very close.

This isn't unique to just message passing and tuple spaces. A sufficiently complex (and well designed) program performing parallel computation with threads and semaphores will usually have its own ad hoc reimplementation of both message passing and tuple spaces. Whatever parallel computing formalism you use (and there are others besides these three), you are essentially solving the same problem. It is no surprise that they will begin to resemble each other.

Your point seems to assume that message passing is a default. It is certainly the thing that seems to have won the battle for mindshare. But the thesis of the article is that basing a system around a tuple space is a better foundation. And by better I mean that it is easier to code at scale.

I am not totally convinced. But I have to say it kind of chimes with my experience. Very large message passing systems become unwieldy, And I have ended up coding something that looks like a tuple space to tame the maze of producers and consumers.

buckminster · on July 29, 2018

"Any sufficiently complicated message passing system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of tuple spaces."

protomyth · on July 29, 2018

If you are interested in tuple spaces then I recommend reading David Gelernter’s “Mirror Worlds: or the Day Software Puts the Universe in a Shoebox...How It Will Happen and What It Will Mean” and any Jini documentation you can get.

jarpineh · on July 29, 2018

Huh. That's interesting. I remember reading Wired article about Jini as a pudding programmer and thinking "Yes, this can't happen soon enough". And I can still continue thinking...

Article is from 1998 and mentions (among other things) Tuple Spaces:

https://www.wired.com/1998/08/jini/

I wonder what ultimately decided Jini's fate.

sitkack · on July 30, 2018

A few jumping off points if you like this sort of thing

LuaTS — A Reactive Event-Driven Tuple Space https://pdfs.semanticscholar.org/91cb/8c359920682fda35abd9c2...

https://redis.io/ (uses Lua for scripting)

https://en.wikipedia.org/wiki/Comparison_of_triplestores

Comet: An Active Key Value Store

https://vanish.cs.washington.edu/pubs/osdi2010comet.pdf

https://vanish.cs.washington.edu/pubs/osdi2010comet_presenta...

ZeroMQ http://zeromq.org/

I'd love to hear to distributed languages with first class support for tuple space type operations (not Erlang).

carapace · on July 29, 2018

See also "chemical" programming

http://chemist.io/chymyst01.html

https://en.wikipedia.org/wiki/Join-calculus

isthatart · on July 29, 2018

Or chemlambda https://github.com/chorasimilarity/chemlambda-gui/blob/gh-pa...

DonHopkins · on July 29, 2018

And if that's not weird and robust enough for you, try "Programming the Movable Feast Machine with λ-Codons":

https://www.youtube.com/watch?v=DauJ51CTIq8

Or "Intercellular Transport in the Movable Feast Machine":

https://www.youtube.com/watch?v=6YucCpYCWpY

And especially "Robust-first Computing: Distributed City Generation":

https://www.youtube.com/watch?v=XkSXERxucPc

http://www.cs.unm.edu/~ackley/papers/paper_tsmall1_11_24.pdf

riffraff · on July 29, 2018

Ruby had a bundled distributed tuplespace implementation (Rinda) for many years, built on top of druby.

I remember playing with it, and wondering why it wasn't more popular.

mpweiher · on July 29, 2018

"And compile-time analysis of tuple in/out patterns can make it run efficiently in most cases; adhering to some simple patterns can help too."

Sounds like it might be overly generalized, with the developer having to implement actual mechanisms ("simple patterns") on top and the compiler/runtime having to figure out efficient implementations by presumably sophisticated analysis, all the time hoping that the two align.

lukego · on July 29, 2018

Nix reminds me of tuplespaces. Each derivation in the store is a tuple describing how to evaluate a result. Active evaluations also activate their dependencies.

macintux · on July 29, 2018

Somewhere I should still have an old JavaSpaces book I picked up specifically because I found tuplespaces such a compelling idea. I’ve tried to find a good way to use it over the years, but it’s never quite matched any problem I was trying to solve.

I seriously considered it as a way to share monitoring events between various systems that might be interested in consuming them: logging, billing, alerting, etc.

mcguire · on July 29, 2018

The paper mentioned in the article, "Generative communication in Linda" by David Gelernter is available from citeseer: ://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.9679

With broken links. Nice. Try http://signallake.com/innovation/p80-gelernter.pdf

DonHopkins · on July 29, 2018

Author of the visionary and flamboyantly named book "Mirror Worlds or the Day Software Puts the Universe in a Shoebox...How It Will Happen and What It Will Mean (1992)", and unfortunate target of the Unibomber.

https://en.wikipedia.org/wiki/David_Gelernter

https://en.wikipedia.org/wiki/Mirror_Worlds

mhd · on July 29, 2018

One does wonder how tuple spaces would've fared without the influence of the unabomber…

sulam · on July 29, 2018

I wonder more how it would have fared if DG weren’t kind of an asshole.

wmleler · on July 29, 2018

I spent several days with DG when we were working on PIX and QIX, and I found him to be a smart, helpful person. Of course, he was on a crusade for tuple spaces, but you shouldn't hold that against him. Also, that was before he had several fingers blown off.

sulam · on July 29, 2018

He named Linda before he had several fingers blown off too. I believe his crusade against the 90-something percent of climate scientists that believe in anthropogenic climate change is after he lost the fingers. I’m very sad he lost them, no one should have to deal with that. He was kind of an asshole before -and- after. And fwiw I’m using a broad definition of asshole, one that explicitly includes both people who name languages after porn stars because they were offended by Ada and people who deny established science outside of their field which has the not-inconsequential likelihood of causing mass extinctions.

dangoor · on July 29, 2018

I worked at a company in 2002 that used tuplespaces for managing distribution of searches to worker machines. It worked really well! I don't remember us ever having trouble with that part of our system.

aghillo · on July 29, 2018

Gosh, for my Bachelor’s final year dissertation in 1990 I implemented a distributed version of Linda / Tuplespace in C++ across Ansaware and Tanenbaum’s Amoeba OS. Seems a lifetime ago!

jmount · on July 29, 2018

With JavaSpaces I remember really getting burned by the lack of queuing fairness guarantees. The ideas were nice- but execution was very laggy.

galaxyLogic · on July 29, 2018

This would still run on top of threads or something. So how could a basic Java -based wen-server be better if it took advantage of JavaSpaces?

klodolph · on July 29, 2018

Could we equally ask, “how something written in machine code be better if it took advantage of the JVM, because the JVM is implemented in machine code?”

DonHopkins · on July 29, 2018

The Linda programming language was named as a pun on Ada.

https://en.wikipedia.org/wiki/Linda_(coordination_language)#...

DrJosiah · on July 30, 2018

Built a python tuplespace around '06. Realized it was really just a task queue and / or RPC abstraction with a database.

Now mostly just use RPCs and queues directly, depending on whether I need something now, or later.

sometimesijust · on July 29, 2018

I don't get it. Isn't this just a threadsafe schemaless database?

panic · on July 29, 2018

The main difference is that you can have code waiting for a query to yield results. Imagine a chat service which has a separate process for each user’s TCP connection. Normally, when a new message arrives, you have to tell each user’s connection process to send the message over the network. In a tuple space system, each connection process can wait for new messages to be added to the database and send them automatically.

foota · on July 29, 2018

You can access by wildcarded fields, it looks like. Like if you have a tuple (string, int, int) you can query ("test", 1, ) and get back all those rows, or (, 1, 1), etc. This isn't generally efficient in a schemaless database.

zubairq · on July 29, 2018

Eve language was actually a good tuple spaces implementation which had a lot of fans