Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing I always get stuck on with these techniques is, how do you handle transactions which perform validations/enforce invariants on data when you’re just writing writes to a log and computing materialized views down the line? How can you do essentially, an “add item to shopping cart” if for example, users can only have max 10 items and so you need to validate that there aren’t already 10 items in the cart?


This all sounds to me very close to the event-sourcing/CQRS/DDD area of thinking. In which case you look at it in two parts:

- Event firing: Here is where you fire an event saying that the thing has happened (i.e. item_added_to_cart, not add_item_to_cart). Crucially, this event states the thing has happened. This isn't a request, it is a past-tense statement of fact, which is oddly important. It is therefore at this point where you must do the validation.

- Event handing: Here you receive information about an event that has already happened. You don't get to argue about it, it has happened. So you either have to handle it, or accept you have an incomplete view of reality. So perhaps you have to accept that the cart can have more than 10 items in some circumstances, in which case you prompt the use to correct the problem before checking out.

In fact, this is typically how it goes with this kind of eventual-consistency. First fire the event that is as valid as possible. Then when handing an 'invalid' event accept that its just got to happen (cart has 11 items now), then prompt the user fix it (if there is one).

Not sure how helpful this is here, but thought it a useful perspective.


You're not "just" writing to a log. You always need a view of the state of the world to take decisions that enforce invariants (in this case, the "world" is the cart, the "decision" is whether an item can be added).

What you'd do is, when you receive the "addToCart" command, construct the current state of the cart by reading the log stream (`reduce` it into an in-memory object), which has enough data to decide what to do with the command (eg throw some sort of validation exception). Plus some concurrency control to make sure you don't add multiple items concurrently.

For reading data, you could just read the log stream to construct the projection (which doesn't need to be the same structure as the object you use for writes) in-memory, it's a completely reasonable thing to do.

So at the core, the only thing you persist is a log stream and every write/read model only exists in-memory. Anything else is an optimization.

DDD calls this "view of the world" an "aggregate". Reading the log stream and constructing your aggregate from it is usually fast (log streams shouldn't be very long), if it's not fast enough there's caching techniques (aka snapshots).

Similarly, if reducing the log stream into read models is too slow, you can cache these read models (updated asynchronously as new events are written), this is just an optimization. This comes at the cost of eventual consistency though.


This kind of thing only works if your whole universe belongs to your database.

Transactions and conditional-updates work smoothly if it's your customer browsing your shop in your database - up to a point.

But I usually end up with partner integrations where those techniques don't work. For instance, partners will just tell you true facts about what happened - a customer quit, or a product was removed from the catalogue. Your system can't reject these just because your Db can't or won't accept that state change.


You don't use it for that sort of thing.

But if you did you'd need an aggregatable (commutative) rule.

Like you can't aggregate P99 metrics. (To see why, it is similar to why you can't aggregate P50. You can't because a median of a bunch of medians is not the total median)

So you measure number of requests of latency < 100ms and number of requests. Both of these aggregate nicely. Divide one by the other. Now you get Pxx for 100ms. So if your P99 target was 100ms you set your 100ms target to 99%.

Anyway you'd need something like this for your shopping cart. It is probably doable as a top 10 (and anything else gets abandoned). Top 10 is aggregatable. You just need an order. Could be added to cart time or price.


I was thinking about this as well as these streaming things become more popular. Would you write add-to-cart events and those trigger add-cart events; the latter containing an valid field which will become false after the 10th add-cart. So after that you remove-from-cart which triggers add-cart which then becomes valid again < 11 items? And transactions similarly roll back by running the inverse of what happened after the transaction started. I'm just thinking out loud. I understand you probably wouldn't use this for that, but let's have some fun shall we?


All systems I have worked on like this has some concept of a version number for each entity / aggregate.

So you get that the account_balance was 100$ on version 10 of the account, and write an event that deducts 10$ on version 11.

If another writer did the same at exact same time, they would write an event to deduct 100$ at version 11. There will be a conflict and only one version will win.

This is exactly like any optimistic concurrency control also without event as the primary storage.

Didn't check if the system linked to supports this, I guess it might not? But this primitive seems quite crucial to me.


I assume that shopping cart limit is a made-up example, but I'm curious what preconditions are you actually enforcing in the real world via DB transaction rollback?


TLDR; You evaluate your preconditions and invariants before the event is published, using the current state of the aggregate.

Here's how that looks like in a DDD world:

* An aggregate is responsible for encapsulating business rules and emitting events

* An aggregate is responsible for maintaining the validity of its own state (ensuring invariants are valid)

* When a command/request is received, the aggregate first rehydrates its current state by replaying all previous events

* The aggregate then validates the command against its business rules using the current state

* Only if validation passes does the aggregate emit the new event

* If validation fails, the command is rejected (e.g., throws CartMaxLimitReached error)

Example flow: Command "AddItemToCart" arrives

>> System loads CartAggregate by replaying all its events

>> CartAggregate checks its invariants (current items count < 10)

>> If valid: emits "ItemAddedToCart" event. If invalid: throws CartMaxLimitReached error


You write the 'add item' event regardless, and when building the 'cart' view you handle the limit.


add_item is not an event, rather a command/ request that is yet to be validated. item_added is the event = a fact that was 'allowed to happen' by the system.

Keeping commands in a persistent store is a matter of choice but not necessary. I've seen people doing command sourcing and calling it event sourcing.


Alternatively "invalid cart" could itself become an event.


Well, assume the non-overdraftable bank account example instead then, what do you do then?


Sounds like an easy way to run out of storage space




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: