The largest feature my team develops is a sync engine. We have a distributed spe...

The largest feature my team develops is a sync engine. We have a distributed speech assistant app (multiple embeddeds [think car and smartphone] & cloud) that utilizes the Blackboard pattern. The sync engine keeps the blackboards on all instances in sync.

It is based on gRPC and uses a state machine on all instances that transitions through different states for connection setup, "bulk sync", "live sync" and connection wind down.

Bulk sync is the state that is used when an instance comes online and needs to catch up on any missed changes. It is also the self-heal mechanism if something goes wrong.

Unfortunately some embedded instances have super unreliable clocks that drift quite a bit (in both directions). We consider switching to a logical clock.

We have quite a bit of code that deals with conflicts.

I inherited this from my predecessor. Nowadays I would probably not implement something like this again, as it is quite complex.