Idempotence: What is it and why should I care?

ageitgey · on Aug 21, 2018

This is vital if you are designing apis or clients that deal with charging a user money. It should be literally impossible for a user to accidentally get charged twice due to a flakey connection if you design correctly.

The trick is to have the client generate a random 'idempotency key' (a uuid) to start each logical transaction and have the server use that id to prevent double charges of the same transaction. By always passing that key, client can request that the payment be processed a 100 times with no fear of it being processed more than once.

This stripe blog post has as good a description as any: https://stripe.com/blog/idempotency

rooam-dev · on Aug 21, 2018

Good point, but I'd add that not only when charging money, but any operation that can have side effects if executed more than once (emails, external calls, etc.).

Having idempotent queue consumers/listeners can help with overall systems resilience (someone already mentioned this below). It helps as much as you don't have to worry when restarting a queue broker and/or apps with consumers.

It adds complexity, but I think it pays off in the long run.

hardwaresofton · on Aug 21, 2018

So I think this is actually the secret to creating actually dependable, no-downtime transitioning endpoints. It's just an idea that has been rolling around in my head but:

- Express all operations as log messages (ez pz distribution)

- Ensure all operations are idempotent

- Record the operations (this is the log you can distribute if you please)

- Disallow API code modification, only allow accretion/use of new API endpoints.

- All APIs that come up have their own databases, a bit of the CQRS model here (but without events -- just the actions performed)

- When you need to stand up new API servers, start the new ones (with handling code for old operations completely unchanged) next to the old ones, and update the http-server code (like request handlers) to output the new commands. Older servers that don't understand the new commands will ignore (or redirect), and new servers that do understand will process and add to the distributed log. New nodes just stream the replications of the already existing nodes and no one spends any time with an inconsistent view of the database

Of course, writing to a distributed log is slow (pick whichever consensus algo you want, you either have durability with a quorum or best-effort without), but this only is a huge deal if you're doing lots of writes, and for most web applications, that's not what's happening, the vast majority is reads.

CRDTs might even fit in here, because if you want a multi-master setup, you could literally keep the log as a set (I'm not quite sure how truncation of super old records would want) keyed by transaction ID -- Assuming the same request doesn't go to multiple servers, their logs should be easily combinable at the end of the day -- API1 is gonna see events A B and E, API2 might see C and F, and API3 will likely see D G and H.

Honestly everything I've described here is really more like moving the coordination/distributed log problem to the application level (up until now all this action would just happen @ the Postgres/DB level), but I'm not yet convinced it's a terrible idea.

I haven't found the time to actually try to make what I'm describing here a thing but would love to hear thoughts

Groxx · on Aug 29, 2018

I'm curious about how you anticipate handling new APIs / how this approach helps ensure consistency for people who aren't on the new APIs. Seems like if

    a -> b -> c

becomes

    a    b -> c
     \-> x

then C won't be aware of the new stuff happening in X... unless it's 100% compatible with everything B does, including every log it produces, at the moment it starts receiving traffic, which seems unlikely. In the time until C updates to read from X:

    a    b    c
     \-> x -->^

isn't C (and its consumers) operating on an "inconsistent view of the database", as produced by A?

hardwaresofton · on Aug 30, 2018

I think I mentioned it earlier, but the idea is that the commands are immutable -- API growth happens through accretion only. New APIs must handle a superset of old ones.

Realistically, this is basically the same as how it's handled in most APIs today -- until you can guarantee (or choose to strictly enforce) that no one use a particular API, it just stays.

In addition to this, new instances use completely different databases, but rely on the replaying the stream of commands that got the old instance there to catch up (and new commands as they come in).

Groxx · on Aug 30, 2018

To be clear, I'm interpreting "APIs" as "a method". So adding a new method is equivalent to adding a new service. If you mean for methods to never increase in number, only flexibility, then yea - I think I follow, this all makes sense. Then new stuff is truly new and disjoint from others, and there's no migration to worry about.

---

Also, since you're mentioning "replaying the stream of commands", I think this means "consistent" is strictly bound to "... at the point in time it has read to, from API X"? Then yea, switching APIs / methods is fine, you just delay the readers. It's event sourcing in a nutshell - there are undeniable benefits between any two "services", it's a compelling design.

I was interpreting it more in a system-wide sense with a large number of services, which is where I don't have a good feel for event sourcing - consumers of C and [others] are not "up to date" with what A has done until they read all data derived from all sources from the same minimum A-timestamp. So without a vector clock (probably) it's generally unsafe to consume from C and Q until they're both up to date, because C is missing stuff from A that Q already handled. Building something that maintains correctness and usefulness in the face of this seems extremely difficult or constrained, unless you accept unbounded delays (in practice: likely weeks of dev time in some cases).

---

And last but not least: CRDTs solve pretty much all of this without synchronization of any kind, yea. Are they still a pain to design? Or have we developed relatively-repeatable strategies nowadays? I haven't kept up much here, sadly.

---

I'll probably have to reread this all a couple times to make sure I'm not totally off somewhere irrational, sorry! Yours was a rather dense comment to comprehend, and I'm not sure I'm following correctly. Event sourcing has been interesting to me for quite a while, but I've never really developed a feel for how to build large, multi-developer(-team) systems out of it and it sounds like you might have an idea.

pgt · on Aug 21, 2018

You will enjoy the fact-based database Datomic, which is the reason Rich Hickey made Clojure: https://www.youtube.com/watch?v=Cym4TZwTCNU

Datalog is a much, much, much query language better than SQL: http://www.learndatalogtoday.org/

hardwaresofton · on Aug 21, 2018

I do remember Datomic and I think it's a great tool but I fell out of love with the Clojure ecosystem and JVM-based languages as a whole and don't think I'll be getting back into it/them.

I do remember wanting to check out Datomic (I believe after seeing a talk on how it was being used at a bank in southern america?[0]), but I found it unreasonably hard to find and download/experiment with the community edition -- compare this to something like Postgres which is much more obvious, more F/OSS compliant (I understand that they need to make money) and Datomic doesn't really look that appealing to me these days.

At this point in my learning of software craftmanship I can't do non-statically type-checked/inferenced languages anymore -- I almost never use JS without Typescript for example. Typed clojure was in relatively early stages when I was last actively using clojure, and I'm sure it's not bad (probably way more mature now), but it's a staple in other languages like Common Lisp (the declare form IIRC). The prevailing mood the clojure community seemed to be against static type checking and I just don't think I can jive with that anymore.

Thinking this way right now Datomic wouldn't be a good fit for me pesonally but I believe that it is probably high quality paradigm.

[EDIT] - I found the talk: https://www.youtube.com/watch?v=7lm3K8zVOdY

[0]: https://www.datomic.com/nubanks-story.html

pgt · on Aug 21, 2018

Yes, Datomic is the killer app for Clojure [^1]. Have a look at Datascript[^2] and Mozilla's Mentat[^3], which is basically an embedded Datomic in Rust.

Hickey's Spec-ulation keynote is probably his most controversial talk, but it finally swayed me toward dynamic typing for growing large systems: https://www.youtube.com/watch?v=oyLBGkS5ICk

The Clojure build ecosystem is tough. Ten years ago, I could not have wrangled Clojure with my skillset - it's a stallion. We early adopters are masochists, but we endure the pain for early advantages, like a stable JavaScript target, immutable filesets and hot-reloading way before anyone else had it.

Is it worth it? Only if it pays off. I think ClojureScript and Datomic are starting to pay off, but it's not obvious for who - certainly not very ever organisation.

React Native? I tore my hair out having to `rm -rf ./node-modules` every 2 hours to deal with breaking dependency issues.

Whenever I try to use something else (like Swift), I crawl back to Clojure for the small, consistent language. I don't think Clojure is the end-game, but a Lisp with truly immutable namespaces and data structures is probably in the future.

[^1]: In 2014 I wrote down "Why Clojure?" - http://petrustheron.com/posts/why-clojure.html [^2]: https://github.com/tonsky/datascript [^3]: https://github.com/mozilla/mentat

pgt · on Aug 23, 2018

Also look at Magic, an experimental typed JVM Lisp with immutable namespaces: https://github.com/mikera/magic

hood_syntax · on Aug 21, 2018

I take issue with some of Rich Hickey's technical opinions, but Datomic seems pretty cool. I knew Datalog has been around for a while, but had never looked into it further.

SmirkingRevenge · on Aug 21, 2018

No need to write the distributed log yourself, you just need Kafka.

You basically just independently conceived of, what is becoming a pretty popular architecture these days - event driven systems based on distributed logs.

The source of truth in your system becomes the idempotent distributed log of events (rather than your rdbms or data warehouse) which ought to be "replayable", allowing you to audit or even potentially recreate your application state (databases, etc), at any point in time.

Transitioning to a new version of an api/service means adding another subscriber to the event stream (and perhaps reprocessing some or all of its history), that runs in parallel with the old version - then when everything looks good you can quietly disable the old version.

hardwaresofton · on Aug 21, 2018

I think I wasn't clear enough about why this solution isn't just a distributed log/event sourcing approach:

- The description above was intended to NOT specify any tech it depends on -- Kafka is the best queueing and long-term-log-storage engine I've ever heard of, but the point is that it shouldn't matter. Kafka doesn't "write" the distributed log, it just stores it.

- I don't want an Event Sourcing solution. It's often combined with CQRS, but the thing is, I don't think you actually need events (I might be wrong) -- you just need the commands, and as long as what they do never changes, you can skip a bunch of problems with event sourcing -- you can more easily compress stuff if you don't allow the semantics of operations to change. Also there's stuff about how you handle things like schema changes, etc -- in event sourcing it's often "recalculate the whole DB", but I think that should not be the solution -- In this system I'm proposing, I'd literally ensure I could reflect the migration as operations in the log, and do it that way.

What you're describing is a Kafka-log based event sourcing system, and it's close to what I mean, but not quite the same -- I want the systems to be able to able to trivially cluster, for example, by hitting an endpoint like `/replicate`, which does nothing but stream log messages out as they hit one particular server. In the system you describe I think the equivalent thing would be a tiny bit harder to achieve since you'd need to be able to listen to another subscriber's messages.

I'm also completely aware that I might go about building this and realize I just took the long way around to the system you're describing but I don't think I am just yet.

beaconstudios · on Aug 21, 2018

what you've just described is very similar to the data management side of the platform I'm building. I can confirm that given event sourcing, idempotent mutation, and only adding new API endpoints, you're a long way towards what I would consider the ideal online software architecture. So many problems with API maintenance just go away with this approach.

rectangletangle · on Aug 21, 2018

This is solid advice. Another common trick is to disable the submit event when the submit button is clicked for the first time, preventing two requests from firing, when the user double clicks. Then re-enable the event, if the request fails. Ideally this is done in addition to server side nonce validation, and not as the only preventative measure, because browser differences, or network issues could cause a double request (more likely if the HTTP GET method is used, instead of a proper POST). Regardless server and client side techniques should be used together to create a better UX, similar to using both client and server side email format validation.

digitaLandscape · on Aug 21, 2018

I don't like this because in practice sites usually fail to re-enable the submit button if something goes wrong. Just let the user submit multiple requests if they want to retry, don't take that away from them, just make it harmless.

jen729w · on Aug 21, 2018

Newbie programmer here, but even I am already implementing finite state machines which handily fix this sort of issue.

Any sort of UI without them now feels archaic.

hood_syntax · on Aug 21, 2018

Agree 100%. If you don't have explicit state transitions in a UI, you're asking for trouble. I think some people don't realize how much harder they make it on themselves by not setting clear boundaries; they see the up front cost and balk, when it saves a non-trivial amount of headache in the future, not to mention reducing mental overhead when you've got the structure hashed out.

digitaLandscape · on Aug 21, 2018

Meh. The logic still applies: on a bad network, the first request can hang for half a minute when a retry would get lucky and go through immediately.

Don't handcuff your users for no benefit, just because you can't write a well-behaved backend.

PinkMilkshake · on Aug 21, 2018

That's only a partial solution. It may be a user agent or a queuing system (or anything else sitting between a person clicking the button and where the final transaction is recorded) that attempts a retry.

BoiledCabbage · on Aug 21, 2018

Restricting what the client can do isn't idempotency. Allowing the user to perform the same action (make the same api-call) multiple times, Nd your API coalesce them as if it were 1, is. The point and benefit of idempotency on this scenario is that you silently support "bad" behavior.

qoi2ijds0 · on Aug 21, 2018

IMO disabling buttons causes more trouble than it's worth - you can never guarantee the user hasn't clicked twice (because don't forget the code that disabling the button in the first place is relying on an event firing that tells you the button was clicked - this doesn't mean 2 events can't be queued before the event handler is called), and then you need a whole chunk of code around re-enabling the button depending on what has happened after the fact which is then a big source of bugs.

notatoad · on Aug 21, 2018

disabling the submit button is a nice UI feature to indicate idempotency, but it's not the same thing as the function actually being idempotent. both are important.

genericid · on Aug 21, 2018

This is basically client-side validation.

enraged_camel · on Aug 21, 2018

I apologize for the noob question, but... is it a good idea to have the client be in charge of generating UUIDs? When dealing with idempotence, they have to be truly random, not just pseudorandom, right? I looked into this recently, and my understanding (from the cursory amount of research I did) was that the UUID generation method depends on the random number generator used by the browser, which can vary a lot.

Why not generate the UUID on the server and send it to the client along with the page request (if using SSR)? I.e. the server generates the UUID, sticks it in a database field, then sends it to the client. When the client responds with the UUID, you can check that against the database to make sure its valid.

chaboud · on Aug 21, 2018

You can do a mix of the two to reduce server traffic (especially for abandoned sessions) and user latency.

Give the client a client ID that you generate that should be properly random... Have them submit that ID along with their self-generated ID.

tankerdude · on Aug 21, 2018

Yup. This is actually an interview question that I go through to see if they know mechanisms to prevent the double charge, fulfillment, etc. problem.

not_kurt_godel · on Aug 21, 2018

I find that the easiest way to reason about idempotency is thinking about what happens when a client incorrectly retries a request (that is, resubmits it even though it was successful the first time). You should always strive to make sure that the duplicate request is handled gracefully and without negative consequences as much as possible. Requests that mutate state should, as a general rule, accept the desired state as an explicit parameter - avoid toggles, increments, decrements, 'go to next step' types of calls. Idempotency tokens are another valid strategy, though they can be clunky to implement.

drawkbox · on Aug 21, 2018

There was a story about toggles in apis on HN a few months ago [1][2], basically the toggle function was called for the garage door, which is clearly not idempotent, rather than an action like open/close so that is wasn't repeated. The toggle action led to opening and closing of the door repeatedly. Non idempotent requests are especially bad when tied to real physical machinery.

[1] https://twitter.com/rombulow/status/990684453734203392

[2] https://news.ycombinator.com/item?id=16964907

empath75 · on Aug 21, 2018

I once wrote a Jenkins job that created a sub directory every time it ran, if it didn’t exist and somehow managed to write it so every time it ran it went another layer deep, adding another nested subdirectory each time and it was set to reuse the same workspace. After a few months, all the jobs on the server started failing because the worker node ran out of inodes. And that was my introduction to idempotence.

gotodengo · on Aug 21, 2018

Reminds me of the Spotify unicode username issue[1]. Where a function assumed to be idempotent (think tolower(username)) actually wasn't with certain unicode inputs. Allowing account takeovers.

[1] https://labs.spotify.com/2013/06/18/creative-usernames/

tzs · on Aug 21, 2018

spotify> For example it is hard to see the difference between Ω and Ω even though one is obviously a Greek letter and the other is a unit for electrical resistance and in unicode they indeed have different code points

This surprised me, because the correct Ohm symbol is in fact the Greek letter, so why does Unicode have a special code point for it?

Unicode also does this for Kelvin, where the correct symbol is a capital K but Unicode has a separate code point for it, and for ångström where the correct symbol is a capital A with a circle above it but Unicode gives it a separate code point.

They do not do this for Newtons (capital N), Joules (capital J), Watts (capital W), or anything else I can see where the standard symbol is an ordinary letter or group of letters.

In all three of these cases the Unicode Consortium recommends NOT using the separate code point.

So...what's special about Ohms, Kelvins, and ångström that (1) gives them their own place in Unicode, and (2) what is the point since we are not, according to the Unicode Consortium, supposed to use them?

bloak · on Aug 21, 2018

Unicode was originally proposed as a universal character set to replace all existing character sets. For it to have any chance of acceptance it had to be possible to convert from JIS/whatever into Unicode then back again without any loss of information. So if there were any daft duplicates in legacy character sets those had to be duplicated in Unicode. I don't know if that explains those three physical units, but that's what I'd guess happened.

akira2501 · on Aug 21, 2018

> So...what's special about Ohms, Kelvins, and ångström

Nothing other than misguided thinking in the early versions of the standard.

The other problems with these special symbols is that if you call tolower() or similar on them they'll return the "normal" character they're based off of. So toupper(tolower(char)) != char.

r_c_a_d · on Aug 21, 2018

Does tolower() or toupper() even make sense with general unicode characters? I wouldn't expect it to... but I've never really thought about it before :-)

tialaramex · on Aug 21, 2018

Mostly, we're used to defining tolower() and toupper() to return either a lower or upper case variant if one exists, otherwise you get back what you put in. For most Unicode codepoints no such variants exist and so you just get back whatever you fed in. Some of the alphabets have uppercase/ lowercase, but obviously most writing systems don't do this.

However, lower(upper(X)) is not defined to be the same as lower(X), and there's no promise that meddling with a string transforming with lower() or upper() does what you hoped because that isn't how language actually works (e.g. in English the case sometimes marks proper nouns so "May" is the Prime Minister of the UK, but "may" is just an auxiliary verb).

Where standards tell you something is case-insensitive, but it's also allowed to be Unicode rather than ASCII, you can and probably should "case crush" it with tolower() and then never worry about this problem. In a few places you have to be careful because a standard says something in particular is case-insensitive, but not everything that goes in that slot is case-insensitive. For example MIME content type names like "text/plain", "TEXT/PLAIN" and "Text/Plain" are case-insensitive, but

multipart/mixed; boundary="ABCDEFGHIJKL" multipart/mixed; boundary="abcdefghijkl" multipart/mixed; boundary="AbcDefGhiJkl"

... declare three different boundary tokens, and none of them matches the sequence abCdeFghIjkL.

majewsky · on Aug 21, 2018

What's worse, tolower() and toupper() are locale-dependent. In most locales,

  tolower("I") = "i"

but in Turkish,

  tolower("I") = "ı"

Same in the other direction, because there is also a large I with dot.

gotodengo · on Aug 21, 2018

At this point it's a backwards compatibility issue. Like you say for Ohm they now recommend using the omega symbol[1] but there's still code out there using the Ohm symbol.

Solving that wouldn't have helped in the Spotify case though since there's a ton of other edge cases like combining characters 'e' + ' ́' vs precomposed characters 'é' which still cause the need for an idempotent canonicalization of usernames.

Not to get too far from the topic at hand, but I came across the Spotify article earlier this week while looking to support Unicode usernames in an application. After consideration I've decided to just lock things down to ASCII for now. It's just too big a case to consider and there are bigger fish to fry.

[1] https://en.wikipedia.org/wiki/Ohm#Ohm_symbol

idempotent · on Aug 20, 2018

A function f is idempotent if: f(f(x)) = f(x)

The "absolute value" function is idempotent: abs(abs(-42)) = abs(-42)

The "squared" function is not: sq(sq(7)) != sq(7)

kazinator · on Aug 21, 2018

This is about idempotent operations, which are basically state changes that have the same affect if executed multiple times as if executed once:

  $ chmod a+x file
  $ chmod a+x file

This is linked to mathematical idempotence in the sense that an operation is a function which takes some inputs i and the state of the world S and produces a new state S':

S' = f(S, i).

So then if f(f(S, i), i) = f(S, i) then f is idemponent mathematically, and the operation is idempotent in the software sense.

rhacker · on Aug 21, 2018

Posting this to drive people crazy :) (only kidding)

How about state changes as a side effect, with a near meaningless result.

LockAccount('user')

The account is now locked.

LockAccount('user')

Error: The account was already locked!

However, the internal state of the user after the second operation remains the same. We also cannot use any of the information in the result as something meaningful as shown in these math equations because the API has a wall over an opaque data structure. Which is often not the case in Math. Similar to chmod, there might be a low-level file API that throws an internal error because the file is already executable, but the chmod wrapper hides that possible error. (It probably doesn't have an error internally, but for argument sake).

haswell · on Aug 21, 2018

For HTTP resources, this is addressed in RFC7231 - essentially, servers may implement non-idempotent behavior such as logging for methods otherwise considered idempotent.

https://tools.ietf.org/html/rfc7231#section-4.2.2

not_kurt_godel · on Aug 21, 2018

A more idempotency-friendly API would be:

    setAccountLockStatus('user', LockStatus.LOCKED)

morokhovets · on Aug 21, 2018

Or maybe ensureAccountLocked('user')?

clord · on Aug 21, 2018

Your error changed the state S of the world.

jgoldfar · on Aug 21, 2018

You're right, and the technical distinction is warranted, but the definition you give is still not equivalent to that of being mathematically idempotent, since the domain and range of f are not isomorphic.

kazinator · on Aug 21, 2018

That's only because the domain is multi-dimensional, while the range is the domain of just the first argument.

We can fold the input into S. That is to say, the input is just an aspect of the state of the word, and then we can reduce f(S, i) to just f(S), so then we have f(f(S)) = f(S).

rectangletangle · on Aug 21, 2018

One of my favorite parts of Python is the idempotent type casting functions.

Say you have a function that takes a list as input, but sometimes a tuple is passed. Wrapping the input like lvalues = list(values), ensures that lvalues will always be a list (or it throws a type error), so you won't get annoying attribute errors, when trying to access the list's methods. In the case of a list type(list([])).__name__ == 'list', as does type(list(list([]))).__name__, etc.

This can get a bit tricky with situations where type casting may be undesirable, say casting a string to a list. So judicious use is necessary.

tutfbhuf · on Aug 21, 2018

In category theory we also say h is idempotent iff h^2 = h whereas h is an endomorphism.

filmor · on Aug 21, 2018

The "morphism" property is completely unneeded for idempotence and the "endo" part is implied :)

whatpad · on Aug 21, 2018

This differs from the definition in the article. I think the article's definition is the significantly more common one for software engineering.

CaliforniaKarl · on Aug 21, 2018

One other reason to care: If using TLS 1.3, having an operation be indempotent increases the chances that the client could safely use 0-RTT for that specific operation.

drej · on Aug 21, 2018

I work in data engineering and fixating on idempotence has been one of the best things I've ever done. Now whenever we build a new job or we review an existing one, the first question (well, second, first being 'do we actually need this?') is usually 'is this idempotent?' Saves SO much hassle. Processes fail, nodes disconnect, OOM kills stuff, these things happen on a daily basis in larger systems, be ready for that.

SmirkingRevenge · on Aug 21, 2018

Amen. And if you combine idempotent operations with atomic operations wherever possible, it gets even better!

pmarreck · on Aug 21, 2018

One of the practical applications of this concept I've found is that I try to write idempotent database migrations (so rerunning the migration, which is a common necessity while you're developing it, but is also useful if problems occur, won't error).

So in essence both the "up" and the "down" migrations are idempotent and warn if they are not (and why).

flukus · on Aug 21, 2018

Database migrations are inherently stateful so I'm not a fan of indompodence here, it can leave the schema in some arbitrary states. I much prefer tools like flyway (https://flywaydb.org/) that are more deterministic, each migration will only be run once so you're going from known state to known state.

pmarreck · on Aug 21, 2018

I've had situations where a migration that ran locally just fine, failed in staging and/or production (config-level stuff running into hosted DB access rights, etc.). Those situations are a mess to untangle without idempotent migrations.

Also, minor niggle but Flyway isn't database-agnostic, you'd have to use the SQL of whatever DB you happen to be using (although if you code in Java I guess you could use ORM commands)

jackthetab · on Aug 21, 2018

How do you do it without throwing an error?

My philosophy has been to ignore the error if it's a re-application of the migration because the migration _is_ idempotent. So ALTER TABLE foo ADD COLUMN bar INT either succeeds (because there is no bar column) or it fails (because there is a bar column) with no harm done.

joombaga · on Aug 21, 2018

This would work if you're just ignoring that specific error, but it's better to guard for idempotence. Only apply that ALTER if COLUMN bar doesn't already exist.

mbushey · on Aug 20, 2018

Here's a great blog on the subject: What is Idempotence? https://www.sendthemtomir.com/blog/what-is-idempotence

JauntyHatAngle · on Aug 21, 2018

Interesting, never thought about applying idempotency to the more traditional programming areas like web dev, although I would like to think I develop my apps like this even without thinking about it in terms of idempotency.

For those who might not be in the know, this is a crucial concept for IaC through tools like Puppet/Salt/Ansible etc, as it allows you to think/program infrastructure configuration as a state rather than scripting everything and having to take account of all the minute states that may exist on a legacy or well entrenched system.

hnruss · on Aug 21, 2018

I often combine idempotence with the command-query separation principle (CQS) by making it so that queries are idempotent. This tends to encourage simple, predictable code.

mlajszczak · on Aug 21, 2018

Idempotence can be very helpful when one strives for resiliency.

Suppose you have an application that processes tasks that (among others) call an external API. Both external API call and task processing can fail independently. Moreover, task processing can fail after API call succeeded. If you the external API is idempotent, you can simply retry on any task processing failure, no matter when it happened. It can simplify error handling a lot.

NightMKoder · on Aug 21, 2018

One thing to note about this - idempotency is a very simple concept, but the implementation is actually quite hard. Case-in-point: the kafka producer API recently gained support for "idempotent/transactional capabilities": https://kafka.apache.org/documentation/#upgrade_11_exactly_o... . That is to say exactly-once persistence by using an idempotency key.

Everything looks bulletproof unless you take a step back. To connect with the example you gave: just having the kafka producer guarantee that one call to send() is idempotent is not enough. Your application needs to be able to be idempotent - e.g. if the same RPC/web request has to be retried on a different server. You need to be able to pass an idempotency key TO the kafka API - which you currently cannot do. The API currently allows for either a global(ish) lock or duplicated messages - so it's not quite idempotent.

Idempotency needs to be end-to-end, otherwise it doesn't work. Unfortunately that's very rarely the case - almost nobody tries to idempotently make XHR requests to their servers. In effect it's almost always easier to de-duplicate idempotently on read rather than attempt to write idempotently. It's a really hard simple problem with lots of corner cases.

js2 · on Aug 21, 2018

At 13:45 in this interview from 2003 with Sergey Brin and Larry Page, you can listen to them try to explain idempotentence on the air to Terry Gross and her NPR audience:

http://www.npr.org/2003/10/14/167643282/google-founders-larr...

rimliu · on Aug 21, 2018

And in 2005 Google came up with Google Accelerator, which has shown many sloppy programmer what happens when you use HTTP method which is supposed to be idempotent—GET—for others purposes. What happened that GA crawled all the links it found for prefetching and some of them were "delete" links in admin interfaces. I think that was the biggest push not to use GETs to modify data :)

tw1010 · on Aug 21, 2018

I read the title as impedance, then I read the article and kept reading impedance, and was about to write a comment mentioning that that's not at all impedance he's talking about, it's idempotency, but then I read the title again and now I can't understand why I read it as the former all that time.

yjftsjthsd-h · on Aug 21, 2018

That's okay, I read "impotence" and couldn't believe that anyone would need it explained as to why they should care.

wincy · on Aug 21, 2018

A comment about the site. On an iPhone X and I’d imagine other phone screens, the margins are huge. Like the margin takes up maybe 15% around the text, then there’s padding on the div taking up maybe 10-15% more space. So there’s 3-4 words per line without turning on reading mode.

dmourati · on Aug 21, 2018

I just think of the Wailers song: Do It Twice. "I'd like to say baby, you so nice I'd like to do the same thing twice, yeah!"

ianamartin · on Aug 21, 2018

Pure idempotency isn't usually desirable. In any important database table where this can be an issue, you want two timestamps. 1 for when the row was created and 1 for when it was last updated. The upsert should change the ts_updated value, and the attempt should be logged.

If you want genuine pure idempotent interactions, you can't do that, and you have to rely on unique constraints in the database and swallow that particular error, so that nothing about the universe of the application state changes and nothing gets logged.

But that is a pretty garbage way to do things. As with many things, some moderation and flexibility are a good idea.

Instead of focusing on the exact meaning of the word, we should focus on making sure that nothing bad happens if someone does something twice. That's what the operational concept of idempotence is.

It's relatively easy to get something to happen at least one time, and it's slightly less easy to get something to happen at most one time. Getting something to happen exactly one time is really, really hard. That's why we should care: most of the things we want to happen will happen more than once in any nontrivial system.

Purity in concept isn't important. Safety in the sense that nothing bad happens the second or third or nth time around is.

Side note about the Stripe blog linked in the current top post by ageitgey: that's not a useful solution. You can't trust a uuid created outside the context of the uniqueness that needs a guarantee. That's one of the fundamental problems of distributed systems. And it's one I would think Stripe should know better than to espouse since, after all, they did hire aphyr.

You need something closer to home, not something received from a relatively untrusted source. Before you gripe at me and tell me that uuids are, in fact, uuids, let me explain. There are all kinds of situations that can force a client to regen a uuid.

A gas station pump resets because of a blink in power. It remembers all the information about the transaction except for the uuid because the programmer was smart and wanted that to be, well, unique. And he wanted the pump to be smart and retry the failed attempt. Same txn; different uuid.

The user hits the refresh button in the middle of the txn, but the rest of the form data is cached. Not the uuid.

The corner store owner who hates people who use credit/debit cards in general in Queens gets pissed because it takes more than two seconds to process, and pulls the power plug to reset it. Yeah, POS units should clear after that, but they don't always.

User's phone switches from cell service to wifi. Forces a refresh in the app in the middle of a txn. It's still listening for the same transaction response and when it times out tries again with a different uuid.

These are real scenarios that we have to deal with. Trusting a client to only ever retry with the same uuid is not safe. So when I say that you need a uuid in the same context, I mean the uuid created in your database when it receives an auth request, inside the SQL transaction used to create the entry for that attempt.

Everything else about the transaction is used to fingerprint it with the local uuid as the upsert key. Once that is in place, then you can start to have some duplicate txn safety. That's the beginning of your hell, but it's a better one than trusting client devices.

I'm not trying to take a potshot at Stripe. I have tons of respect for them, but that particular article is misplaced. You can't just pass uuids around and think that you're safe because they are actually unique. I wish it were that simple.

It most definitely is not, as the person said, "the trick."

lmm · on Aug 21, 2018

> you want two timestamps. 1 for when the row was created and 1 for when it was last updated. The upsert should change the ts_updated value, and the attempt should be logged.

If timestamps are worth recording they're worth recording in the normal way you'd record business data. Associate a timestamp with when the user made their request, not on when your server happened to process it.

> A gas station pump resets because of a blink in power. It remembers all the information about the transaction except for the uuid because the programmer was smart and wanted that to be, well, unique. And he wanted the pump to be smart and retry the failed attempt. Same txn; different uuid.

Don't do that. If it's a transaction ID, generate the ID for the transaction and keep it with the data for the transaction.

> The user hits the refresh button in the middle of the txn, but the rest of the form data is cached. Not the uuid.

Again, keep the ID with the data it goes with.

> Trusting a client to only ever retry with the same uuid is not safe. So when I say that you need a uuid in the same context, I mean the uuid created in your database when it receives an auth request, inside the SQL transaction used to create the entry for that attempt.

That's an unscalable approach. If you can afford to run everything on a single database with monotonically increasing time then sure, knock yourself out, it makes everything easier and is good enough for a lot of cases. But in these days of client-side UI, like it or not you are working on a distributed system where the client and server are separate nodes, and distributed system techniques are your best hope of getting sensible behaviour.

drTriumph · on Aug 21, 2018

I've seen this before but it never hurts to re-read