More

nbm · on April 13, 2022

One thing to keep in mind for how development at larger tech companies works is that you’re often not building on your own desktop, you’re usually building on a development server that’s on a well-connected (effectively production-quality, if not literally the same) network. You don’t see a ton of drops in those cases, so it works well. Not that there hasn’t been effort to recover from networking issues encountered in this and other build tooling - at scale, someone’s development server is going to have a bad day every day.

You also need much better tools than grep and locate for a monorepo - or any sufficiently large repo probably. Just load the full repo into memory in a few places around the world, and use an API to find the text you’re looking for. If you already have expertise with search services in your company, this is not that challenging a step - and you can get fancy by using something like Tree-sitter to make those searches more advanced than text. Hitting disk (especially for whole directory trees for “grep -r”) is a losing approach in a large repo.

nbm · on Dec 7, 2021

Since you mention a big tech company, what you’ll probably find is that there are people that care about these things around you, and they’re probably doing what they can and could do with support - not necessarily to do work, but also just moral support. (I like to pretend I’m one of those people in the quality space in my area in my big tech company…)

As others mention, be careful trying to change the world (and especially saying bad things about what you see) before you grow your credibility. But ask a few questions in team meetings (or, at something like FB, in workplace groups) about what test or staging or related infrastructure is available as part of ramping up, and that might get people that care to notice you’re a potential ally.

There’s even the (admittedly probably very) small chance that that the team wants to improve here, and doesn’t know how/didn’t have the resources to do better before - either way, you’ll learn more without alienating anyone.

Big tech companies also often offer mobility and culture variety, so keep your eye out for teams that align with what you care about. Learn what they do and how they got started at least - or possibly move there.

(If you’re at FB, feel free to reach out to me - same account name.)

nbm · on Oct 6, 2021

Perhaps the right word is “system” - the combination of software (DNS protocol server software, health checker, BGP agent, and so forth) involved in making the DNS service available (or not, in this case). These could be running on the same computer, or separate ones if you are particularly imaginative.

Unfortunately, “DNS system” means something very different than if you said “load balancing system”, so “server” is simpler.

(Usual disclaimer: work at FB, even on this exact stuff, but not representing it.)

nbm · on Aug 29, 2021

Stories (which is the medium used) about how things went right without effort are boring. Which leaves two stories - things that went right that had a lot of effort, and things that went wrong.

In the case of things going right with a lot of effort, there’s the case where things went right because of that effort. If you write stories about those, you can be accused of being self-aggrandizing - but they do give people practical options to consider in similar situations.

In the case of things going wrong, there’s the case where you tried to correct things, but were not successful. There’s value in exploring what you could do better - but that’s not a story. The story is the people and archetypes and systems and so forth. In that case, you can be accused of seeking to blame others.

The other stories could be where you did something wrong (but which at the time seemed like they were good and necessary), but despite that the outcome was successful, or because of that, the outcome was unsuccessful. If you’ve read enough of the back catalog, these do exist.

I like that these are presented as stories, and that they are representative of situations people will recognize and also find themselves in. If you’ve seen them, it’s validating that others perceive the challenges the same. If you haven’t, you’re forewarned about things that may come up.

nbm · on Aug 29, 2021

Her stories gain a sharper edge if you’ve worked in similar sorts of environments before.

If you aren’t aware of how there are teams trying to “own” turf, and prevent alternatives (even one-offs), and also how the entire company tries to funnel anything that matches a keyword to that team, even when it is the most tentative match, then you aren’t aware of the challenges faced to navigate them.

You see one path (going with the flow), but don’t see what happens when you don’t follow it.

Not going with the flow is definitely valuable - it’s something any good senior person should have in their tool belt. And if you read Rachel’s stories, there are many examples of not going with the flow (and comments in the HN posts about how she’d be more effective if she didn’t go against the flow).

The challenge is that you can’t always go against the flow either. It’s celebrated if you cut through some red tape - but at some point you’ll just get a reputation of being contrary. Even if you don’t, it’s tiring to have to be the one trying to course-correct the world. Either way, you have to choose your battles. And a button on a dashboard probably isn’t worth using your capital on…

There’s a degree to which you can just go out and talk to people and build relationships. You’d be mistaken if you think that developing these relationships (as well as a reputation for solving real problems, which puts you on the right foot with many strong engineers) is an avenue that wasn’t explored.

nbm · on Aug 28, 2021

At FB, most ICs are not hired for a specific team in advance, and instead choose between (for SWE, a long list of) teams during the Bootcamp process. You can chat to future team mates (not just manager or lead) and ask them straight out about that. (Some people also ask for the half-ly survey results for that team as well.)

If you’re being hired for a team specifically (at another company, say), ask for a “follow-up” meeting with future team mates after offer as a condition for accept. You have a lot of power at that point.

nbm · on April 26, 2021

Keep in mind that the referenced paper that this page is based on is over a decade old now. Many things have changed since then, as you can imagine. Look for later papers and engineering blog posts for more about what’s changed since then.

nbm · on Feb 25, 2021

The main benefit is multiplexing - being able to use the same connection for multiple transactions at the same time. This can have benefits in finding and keeping the congestion window at its calculated maximum size, reduce connection-related start-up, as well as overcome waiting for a currently-used connection to be free if you have a max connection per server model.

The other potential benefits were priorities and server-initiated push, but both I’d say largely went unused and/or were too much trouble to use. Priorities were redesigned in HTTP 3 - more at https://blog.cloudflare.com/adopting-a-new-approach-to-http-... - and Chrome recently decided push in HTTP 2 wasn’t worth keeping around - https://www.ctrl.blog/entry/http2-push-chromium-deprecation....

HTTP 2’s main problem is head-of-line blocking in TCP - basically, if you lose a packet, you wait until you get that packet and acknowledge a maximum amount of packets thereafter - slowing the connection down. With multiplexing, this means that a bunch of in-flight transactions, as well as potentially future ones, are blocked at the same time. With multiple TCP connections, you don’t have this problem of a dropped packet affecting multiple transactions.

HTTP 3 has many more benefits - basically, all the benefits of multiplexing without the head of line blocking (instead, only that stream is affected), as well as ability to negotiate alternative congestion control algorithms when client TCP stacks don’t support newer ones - or come with bad defaults. And the future is bright for non-HTTP and non-reliable streams as well over QUIC, the transport HTTP 3 is built on.

contravariant · on Feb 25, 2021

Right, all this kind of feels as if HTTP/2 is trying to solve transport layer problems in the application layer. Especially if you leave out the server initiated push. I can't really pretend to know much about this but I can't say I'm surprised that this causes problems when the underlying transport-layer protocol is trying to solve the same problem.

So is it correct to view HTTP/3 as basically taking a step back and just running HTTP over a different transport-layer protocol (QUIC)? (If so I think the name is a bit confusing, HTTP over QUIC would be much clearer)

cwp · on Feb 25, 2021

That's true, but the transport layer has ossified, and the application layer is the only place we can still innovate. RIP SCTP.

contravariant · on Feb 26, 2021

Still sad, it would have been much nicer to just keep HTTP as is and just put in a different transport layer. Or maybe extend HTTP a little but right now we've got a protocol independent HTTP/1.1 and a new HTTP/3 which rather than being more general strictly relies on a single protocol.

nbm · on Feb 26, 2021

In some ways, HTTP 3 is the same HTTP messages, just over QUIC.

But as you get into other features, there are differences. And your clients and servers need to both have fallbacks to HTTP 2, since UDP connectivity might not be available, and fallback is expected.

So, you have to build support for having working push, or not. Or having working priorities, or not. Or having long-lived non-HTTP sockets, or using fallbacks like web sockets or even long polling. There’s even more fun on the horizon, and I’m not looking forward to my colleagues thinking of a fallback strategy for some of those...

mumblemumble · on Feb 25, 2021

It was originally called HTTP over QUIC, and got renamed to HTTP/3 in order to avoid some other confusion.

https://en.wikipedia.org/wiki/HTTP/3#History

SahAssar · on Feb 25, 2021

HTTP/2 is what you do if you're confined to using TCP. HTTP/3 is what you get if you use UDP to solve the same problems (and new problems discovered by trying it over TCP).

nbm · on Feb 19, 2021

In the two cases mentioned, there isn't any extra _response_ time.

> servers cannot periodically advertise their load measures to clients

This is an asynchronous mechanism - you can use polling, push, or queue, or whatever. Individual responses don't pay the cost.

> staple current load reports to query responses, which the client can cache and use for a period of time

This is just a few extra bytes in the response, which likely has negligible cost. One can fairly reasonably say that responses don't pay any cost.

Obviously there are other costs - it just depends on what you're trading off between on whether it is the right decision.

vijayjayaram · on Feb 19, 2021

Author here. Great points, I agree 2 extra round trips are not strictly necessary.

The advantage of P2C is the simplicity that comes from its stateless/just-in-time nature. If you cache load, either by servers broadcasting or including a load indicator in the response, you have to consider cache invalidation and TTLs. The longer the TTL, the higher the odds of a thundering herd (all clients think one server is under loaded and proceed to overload it). If you're undertaking this complexity, it may be better to go with a proxy, which can have other benefits too (connection pooling, no client cooperation required). Like all such decisions, it depends on the situation.

nbm · on Feb 20, 2021

Even in the proxy case (which is my day job), it's often worth the price of the up-front load polling to have sufficiently recent information to take action on. The most obvious cases are those where you have some (or all) requests being expensive, as opposed to thousands of exclusively tiny requests.

This does depend on having a local load balancing layer (not, say, making a choice between two servers that are each 50+ms away from the load balancer), and also having a high-performance RPC stack (C++ on both sides, and Thrift, in my case).

nbm · on Feb 18, 2021

It comes from the paper "The Power of Two Random Choices: A Survey of Techniques and Results" - http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20...