Making Direct Messages Reliable and Fast

mxstbr · on June 15, 2019

If anybody else needs this kind of system, most GraphQL clients (at least in the JavaScript ecosystem) handle all of this out of the box or can be extended to work this way with minimal effort.

They have a client-side cache for the whole app that you can optimistically update, all network requests run through that, it handles rolling the updates back when responses come in, etc.

Examples in the JS ecosystem that I am familiar with include Apollo and Relay, and urql is working on this.

dmitriid · on June 15, 2019

> If anybody else needs this kind of system, most GraphQL clients

Well, the client is probably the easiest and the most boring piece of the puzzle. What you do in the backend matters.

sambal · on June 15, 2019

What’s done in the backend matters slightly less in the thread of an article entirely concerned with the client.

https://cdn-images-1.medium.com/max/2400/1*ttu76ysGchnhH3CyT...

dmitriid · on June 16, 2019

"How do you handle DMs for a billion connected clients" is still a much harder and a more interesting story than "if you want something similar, throw a GraphQL at it".

sambal · on June 17, 2019

That may be, yet it was story nobody in that thread was talking about save for you.

bluesign · on June 15, 2019

If the order of requests are retained, does it mean that if ‘mark thread as read’ request somehow fails on server side, all other requests will wait for it to be send until it gets success from server?

axxl · on June 15, 2019

We did something very similar to this and our solution was to have 2 classes of requests, those that require global ordering (creation of posts etc) and those that don’t. These second classes of requests can contain dependencies to each other to have a local ordering but don’t block any other requests.

bjackman · on June 15, 2019

Yeah I wondered if instead of maintaining a global ordering of all requests, it wouldn't be better to just require timestamps in requests that will need to be ordered. Then you only need partial orderings, which you can resolve when marging the optimistic state, only among those requests that actually need to be ordered, and requests don't need to block each other.

Edit: although now I think about it, if you go that route you effectively have to do the same timestamp ordering resolution on the server too.. so probably more complicated than just keeping a global ordering!

xhgdvjky · on June 15, 2019

timestamps are rarely the answer

invasionofsmall · on June 15, 2019

In my experience it is reasonable to wait in the sense that it's basically a middle layer where all the requests pass through. Also, if, as an example, there is a missing connection, you don't allow people to keep doing things. We did it on our app but you can see the same on FB app where you see a "lost connection" alert.

bluesign · on June 15, 2019

For timeouts etc ok, but one serverside call failure, shouldn’t hang the all application.

raverbashing · on June 15, 2019

Even while being a part of the industry, it is amazing how much technology it takes for things to work seamlessly, and also how many gotchas there are in the way

mistahenry · on June 15, 2019

It’s also rough when building new applications. Instagram/Facebook level UX are the expectation from the layperson.

For instance, if you add photo upload to your app, people will complain if it doesn’t support multi-photo upload, varying sizes, and client side photo filters.

The bar is set very high. It’s taken me a whole year to build a website + app combo in my spare time that has all the expected bells and whistles on top of the app’s regular functionality (social login, site tours, optimistic states, skeleton loading, push notifications, global type agnostic search, etc). MVP took two months but every time I showed to people, the reaction was always “why isn’t it autosuggesting content? Where is the explore page? Why can’t I login with my Facebook, why aren’t you saving drafts while I create posts in case my computer crashes, etc).

It really is a ton of effort to turn a simple idea into reality

spunker540 · on June 15, 2019

It’s really hard to build consumer-facing products solo these days!

I recently “gave up” on building my own site and just pay for a Shopify site now (only an option if you’re in e-commerce) and I’ve been blown away by how easy and high quality it is, and $30/month is honestly way cheaper than the time it’d take me to create and maintain a similar site. There are plenty of alternatives to Shopify that are competitive too I understand.

Perhaps it is worth your while to explore outsourcing so that your own time is not the limiting factor!

noir_lord · on June 15, 2019

It possibly takes more effort than it should do to do many of these things, in a way the very difficulty of building these things serves to re-enforce the requirement to have large teams.

In a way the fractured nature of the web serves the big organizations with the resources to build things that are 'clean' experiences across every platform.

While I doubt it was by design it certainly benefits them now.

skinnymuch · on June 15, 2019

I thought Shopify is a lot better than the competition right now.

xhgdvjky · on June 15, 2019

this is why using a popular framework is critical even if it is flawed. there is support for so much stuff you just need to install

matlin · on June 15, 2019

This seems a lot like an eventually-consistent database. I think you could accomplish something similar with something like CouchDB or Firebase that provide conflict resolution between local and server state automatically. I'm wondering what the advantage is here?

dnlgl · on June 15, 2019

My strong guess is that they have just reinvented a client component of an eventually consistent data store, and in this context the blogpost is actually embarrassing for them. I wonder if they will notice further problems and reinvent causal consistency and vector clocks next.

elt193 · on June 15, 2019

Great write up! I have a few questions if anyone familiar with the development can share: 1. How do you handle trust-buster issue: users see a message go through only to check back an hour later that it’s not. 2. For this to work, all components must adhere to this same pattern. How did you handle the migration plan? 3. Seems like something that can work across platform. Any thoughts on c++ layer between android and iOS? Or Facebook tends to go at it platform-focused.

Thank you!

bluesign · on June 15, 2019

I guess it is handled by a state like ‘sending’. So when you send a message, initial update before sending the request in UI is showing a ‘sending icon’

cloudify · on June 15, 2019

This looks a lot like the Flux pattern, it would be interesting to understand how they deal with conflicts and updates that keep failing, will the Mutation Manager cancel the optimistic state after a certain amount of failed requests to the server?

JetSpiegel · on June 16, 2019

Why is Instagram reinventing the wheel, instead of using WhatsApp backend?

I though Zuck announced they were merging everything into an unified system?