What happens behind the scenes when we type www.google.com in a browser? (2015)

jamesmcintyre · on June 19, 2019

I worked with the guy who wrote this repo. He's one of the smartest, nicest, and genuinely curious guys I know.

He also ran teams and his leadership style was very humble and encouraging in a workplace that was, at times, the opposite.

So glad to see he's still publishing awesome content like this.

Be sure to check out some of his other popular repos, great stuff!

mayakumar · on June 21, 2019

I roger that. His name is `Vasanth Krishnamoorthy`. I still have the good fortune of working with him in `WalmartLabs`. Extremely talented, multi-faceted, curious and above all a wonderful human being. I have learnt plenty through my interactions with him and am still learning from him :-). I am serendipitously lucky to have met him in my life.

MrXOR · on June 18, 2019

Another similar context:

https://github.com/alex/what-happens-when

https://news.ycombinator.com/item?id=8902105

https://news.ycombinator.com/item?id=9189096

vinayms · on June 19, 2019

The main problem with these kinds of exposition topics is that the steps involved are quite dense. By that I mean something like density of real numbers; between any two real numbers there are any number of real numbers. Then, the writer just chooses to expand on those steps that they are comfortable with, or the ones that get more eyeballs. I mean, why not explain how signal travels across various media like air, underwater cables, to and from satellite etc?

Now, don't get me wrong, I am not saying this is useless. I am just saying that if one chooses to give a 30k feet perspective, they had better stay there and not bounce between 40K and 10K.

bgravinz · on June 19, 2019

I’m not seeing not sticking to 30k is really detrimental to how most people would likely consume a doc like this, which would probably be to sample various parts to find what they’re interested in and then dig in.

tempguy9999 · on June 19, 2019

I would most politely wish to note that one is tended to read documents. One should rarely consume them unless in the spirit of espionage.

Equally, in the comfort of one's own parlour, one may listen to one's wireless. Not 'consume', as was once heard.

The butler informs me on this occasion that I am correct. I bow to him in this. Uncommonly sharp fellow.

And so to the Drone's Club, where literature, wireless nor dolly birds are permitted.

AndrewStephens · on June 19, 2019

I love a good reductionist deep dive into things we use everyday. This is a great overview to a wide range of topics.

I wrote something similar a couple of years ago[0] which skips the keyboard and display but goes further into the world of IP packet transmission (and HTTP/2). The parent link is better written though.

[0] https://sheep.horse/2017/10/how_you_are_reading_this_page.ht...

xurukefi · on June 19, 2019

The most amazing part about this is that regardless of how much more details one would want to put into it, it is probably practically infeasible to make it complete. It's a wonderful example to demonstrate the importance of the concept of abstraction in CS.

jimmaswell · on June 18, 2019

I would've liked to see ps/2 keyboards gone over. I still use one.

notyourwork · on June 19, 2019

Do you plug it into a USB port or a ps/2 port on motherboard?

jimmaswell · on June 19, 2019

The motherboard.

tingol · on June 19, 2019

Could you expand on why? Just curious.

jimmaswell · on June 19, 2019

I bought it at a yard sale ~10 years ago and it still works

turtles · on June 19, 2019

In the TLS handshake: "The server generates its own hash, and then decrypts the client-sent hash to verify that it matches"

The server decrypts a hash? But thats not how hashes work.

sowbug · on June 19, 2019

The client encrypts its hash before sending it to the server. Thus the server must decrypt it to compare it to the one it generated.

tristanperry · on June 19, 2019

Hashes are one-way, so cannot be decrypted. The server can _compare_ the results of a hash (by doing the hash itself, and comparing the results), though.

sowbug · on June 19, 2019

You and turtles are suffering from the cryptographic equivalent of a hypercorrection, in the same way that well-intentioned people insist on the propriety of the grammatically impossible phrase "between you and I" (which should be "between you and me," because prepositions take objects, not subjects.) The two of you have had the irreversibility of one-way hashes drilled into your heads, just as many of us were taught when young not to say "me and Susie were playing on the swingset." And you have an allergic reaction to anyone using "decrypt" and "hash" in the same sentence, which can lead to that allergy triggering a false positive. In this case that's what's happening.

Cryptographic hashes are irreversible. That's the point of such a device. But there is nothing stopping someone from taking the result of a cryptographic hash and then encrypting it, and then that someone or someone else decrypting that ciphertext to recover the hash result. E(H(S), k) leads to an encrypted hash, and D(E(H(S), k), k) recovers the hash. It's computationally infeasible to retrieve S. But nobody wanted to do that; they just wanted to know H(S).

You are correct that the server compares the result of the hash (which in context can also be called a "hash," such as "I used SHA-256 on my term paper, and then I spray-painted the hash on the face of the town clock tower, thus proving the existence of my term paper before the class deadline"). Nobody's arguing that. But how did it obtain the thing it's comparing its own result to, without M also obtaining that thing?

(I'm actually not sure whether TLS sends the actual hash or bases subsequent computations on the assumption that both sides can independently derive it. But if it does the former, it's totally fine to say "it decrypts the hash," which is the objection of the parent of this thread.)

tristanperry · on June 20, 2019

Thanks, TIL :)

xref · on June 19, 2019

I can create a SHA-256 hash, then encrypt it with AES. You would then have to decrypt it to read the hash inside is what the parent is saying.

tialaramex · on June 19, 2019

Most of that description is outdated and/or wrong. Probably this HN article should say (2015).

But yes, of course you can decrypt an encrypted hash, this way you get back the plain hash.

The client calculates a hash, it _encrypts_ that hash, and sends it to the server, the server _decrypts_ it, and then can verify that it has the same calculation.

The reason this is done is that it can detect a situation in which the client and server were persuaded to arrive at the same results by different means, whereupon they should abort the connection. The mechanism in TLS 1.2 and earlier was not very good, a better one is included in TLS 1.3 but alas last I looked it is disabled in popular browsers because it's incompatible with yet more middlebox crapware from "security" companies.

goochtek · on June 19, 2019

It should read "then compares the generated hash to the hash that was received from the client to verify that it matches"

Or something along those lines.

turtles · on June 19, 2019

I assume the reason for doing this is to confirm the symmetric key now in use is known between both parties?

tialaramex · on June 19, 2019

I wrote it above, but more relevant here maybe: No. There's no need to confirm that, if the keys don't match everything will fail anyway and the connection aborts because everything either party sends appears to be gibberish.

The description linked over-simplifies, the hash they're calculating is a summary of the handshake process by which keys are agreed, we want to prove that both saw the _same_ process happen to reach this state.

Suppose I am willing to use archaic method A because I'm a simpleton, although I do know methods C and E which are safer. The wise people running www.google.com only allow method A if you don't know methods B, C, D or E.

Now, I try to connect to www.google.com and unknown to me a Bad Guy is in the middle. I say "Hello, I know methods A, C and E", but the bad guy changes that message to say "Hello, I know method A only". Google replies "OK I guess we can do method A then" and we use method A. The Bad Guy knows how to break method A and now my security is ruined!

But with this Finished message in TLS, www.google.com and I will calculate different hashes, since I know I said "I know methods A, C and E" but www.google.com got a message from me saying "I know method A only" and those don't hash the same.

This proves somebody is tampering with our connection, we must abort.

turtles · on June 20, 2019

ah, cheers.

EE84M3i · on June 18, 2019

I think this would be better if it had more details about font rendering and layout engines.

jdbernard · on June 18, 2019

I agree. We should fork it, flesh out those details and submit a pull request. :)

fouronnes3 · on June 19, 2019

It's amazing how detailed this is, and yet there are still steps missing!

unixpickle · on June 18, 2019

Of course, google in particular is behind a very complex distributed network. Distributed DBs are mentioned, but it would be cool to know more about how web requests are distributed and routed in this system.

ibdkhb · on June 19, 2019

I did this years ago so it might be different now, but I blocked google.com and some subdomains at the firewall, found a document hosted on google docs (google.com domain) link in google.co.uk search result, clicked it and instead of the link failing due to it being blocked at the firewall, a google.se (swedish) server started sending the ip traffic for the blocked document to come down. I never tried blocking google.se and then repeating to see what other google domain would send the document next, but its clear Google have written their own routing to get information around some restrictions beit deliberate or misconfigured. Its also an excellent way to probe what servers have blocks in place or not, ie censorship. Its also pretty much instant rerouting ie subsecond, so their ability to pass instructions to other servers in a timely manner is obvious. I wonder if their servers are using swarm intelligence in area's or not? They did custom build their own machines, which would have given them the opportunity to tear up the rule book some what.

devonkim · on June 19, 2019

You could start with the Maglev paper but I’m not sure if that’s still in production use there https://ai.google/research/pubs/pub44824

notyourwork · on June 19, 2019

It most likely depends on what resource you are referring to. If its a search query that will differ drastically from a gmail request. Therefore, its really case specific in monolithic companies with distributed and decentralized architecture.

ryanf323 · on June 19, 2019

Very comprehensive. However, it is missing how the client machine will ARP the gateway for the MAC address if it is not in its tables.

mizzao · on June 19, 2019

Perhaps you should submit a PR on the repo?

ryanf323 · on June 19, 2019

I should and I will

theandrewbailey · on June 19, 2019

I remember an interview where I answered a similarly in-depth question about what happens when you load a file, from file system traversal, down to the heads moving across the platter (because almost no one used SSDs in 2009).

beamatronic · on June 19, 2019

I used to work with some guys from Taos Consulting. They asked a similar interview question. “What happens when you ping Google”. Your answer was expected to take at least 3 hours.

yread · on June 19, 2019

I also got asked a similar question in a an interview (as a programmer) and I answered it well and they hired me. But I think it's an absolutely terrible question for judging whether a person will be a good hire.

emanlin · on June 19, 2019

Do you find this an useful interview question in 2019? What is or isn’t useful about the answer from candidates?

owenmarshall · on June 19, 2019

I like it, especially for interviewing for ops/devops/sre/whatever we say today.

It’s a good warm up question because pretty much any answer is right, and it quickly lets the candidate get to their comfort zone. It is also amenable to be simplified or tweaked - if the candidate doesn’t know much about CDNs or layer 7 load balancing or whatever, you just move past them. And you can change some of the parameters to be sure the candidate isn’t just memorizing the answer sheet.

It can also give a strong signal where the candidate’s knowledge is weak: if you talk to me at length about Apache throwing read syscalls to pull files from the disk but quickly gloss over the networking bits, that might indicate you don’t know them well.

GauntletWizard · on June 19, 2019

This is precisely what I use it for, and precisely why I like answering it. I will talk in detail about the bits I know, smell bullshit (and look up answers) in the bits I don't, and it gives me an answer to candidates depth in all of them. Backend devs will talk about search and set theory algorithms. SREs will talk protocols. Networking guys will talk layer3-7. Everyone has their own style, and it's certainly the most informative in determining where to prove next.

A good candidate had breadth and depth of knowledge. This one question shows breadth and tells you where to find depth. It doesn't work as well for pure frontend people, but only barely - I almost expect frontend people to understand interrupts and syscalls on that end.

devonkim · on June 19, 2019

I expect front end people to be able to fire up developer tools and tell me where to troubleshoot slow network responses definitively as DNS related before filing a support ticket saying “my queries are slow because of DNS.” I wound up training network engineers how to use Chrome dev tools at a really large company because so many frontend developers knew absolutely nothing about troubleshooting network issues that they were disproportionately filing network support tickets.

I usually try to focus on what the interviewer wants me to demonstrate knowledge on when I answer the question in an interview. As an interviewer I normally avoid the question and make it more job specific. I might ask about troubleshooting slow servers based upon an incident and try to determine the candidate’s thinking pattern and methodology. For developers it’s easier to ask “this block of code is not doing what I expected, what did I do wrong?” where the mistake is a very common one. I actually had a very practical HackerRank problem that asked me to troubleshoot a program and an accompanying docker compose file. If you had prior experience this would take you 2 minutes while you wonder if that’s really the whole problem.

Blackthorn · on June 19, 2019

I do. If a candidate is a frontend developer and can't tell me anything beyond the absolute basics of how http works, that spells trouble.

irq11 · on June 19, 2019

This question is now burnt. Nobody should be using it. (In reality, it was burnt a long time ago, but it’s crispy and black now.)

If you do use it, you’re merely selecting for people who have read posts like this on HN and reddit. Candidates who have memorized any of this will look vastly superior to those who haven’t.

It’s telling that there are already two sibling comments who are arguing that this is a good question - explains a lot about why technical interviews suck.

brokenmachine · on June 19, 2019

Just the fact those people are interested enough to read HN/Reddit for their own sake can't be a bad thing though.

I'm not an employer but I know I'd rather my workmates were the kind of people who are actually interested in computers enough to read about them for pleasure, not just those "straight by the book" types.

toast0 · on June 19, 2019

The problem with this question is if someone gives a good answer, it can be hard to tell if they studied it or they actually know. Maybe you can suss this out with follow ups.

If they don't give a good answer, maybe they haven't looked into networking details and debugging for some reason -- a lot of junior people haven't, but they may have the aptitude to learn and be great at it, but just don't have the knowledge base yet. Although it depends on exactly what you're hiring for, too. If you need the person like me, who will find and fix your weird problems with networking, maybe they should know this, or be able to make fairly plausible guesses; but most people on my team don't need to do that (although it's always nice to have more).

aaaaaar · on June 19, 2019

"it can be hard to tell if they studied it or they actually know."

What is the difference? If they studied it, they now know?

Is it because as an interviewer, you are looking for knowledge by experience, not via book-learning?

toast0 · on June 19, 2019

The difference is if they studied the answer, but didn't grasp the material, they got information to pass the test (maybe), but probably didn't get useful information.

I guess if you stop at each point and ask 'what could go wrong here, and how would you debug it' and they answer that well, then they've gotten the information enough.

irq11 · on June 19, 2019

Nonsense. Maybe you haven’t noticed, but there’s an entire industry of “tech interview test prep” that exists solely to coach candidates to answer these ridiculous questions. Any signal you might have once detected from trivia like this has now been thoroughly gamed.

What you’re doing here is arguing a truism: any candidate who memorizes the answer to the shibboleth is better than the ones who don’t, because it makes you happier that they memorized the shibboleth.

fivre · on June 19, 2019

You notably do not provide any better alternative. Show me some other technical question that can surface useful information about a candidate's knowledge in a typical 30-minute interview.

The question is good because it both involves something most everyone does on a daily basis while providing a wide range of possible areas to explore further: there isn't any single "correct" answer that's possible to cover in a short time, but what candidates do tell you probably indicates what they're most familiar with.

Candidates prepping for this question isn't much of an issue since (a) most simply don't and (b) there's always room to go further into a specific part of the transaction.

irq11 · on June 19, 2019

If you think you can divine a candidate’s skill by tossing chicken entrails into the air and watching how they land, you’ll only consider rational counterarguments if I offer better alternatives?

How about this: ask questions relevant to the job. Stop trying to be clever.

owenmarshall · on June 19, 2019

I agree, technical interviews suck, and I’d be happy if my technical interviewer skills reached “average” ;-)

But I’m still doing them and my employer won’t let me stop - so help me refine a bit. What question(s) would you like to see interviews ask more?

dijit · on June 19, 2019

The questions I know the answers to. :D

Kidding. Don’t listen to the parent. The only problem with this question is that it’s general purpose and very well understood, but those aren’t issues. You should still ask people what port HTTPS uses even if it seems stupid because you’re going into an interview blind and you need to assess the experience of a person very quickly (and, sometimes you get surprised by charismatic fools). All questions are tools, this one will tell you that either the person knows this question very well from studying it (which is valid, and you would be able to tell) or they have some experience with it (more valid) but it will tell you where they are most comfortable talking about and you can dig more into various parts if you want.

mandeepj · on June 19, 2019

relevant - https://www.google.com/search/howsearchworks/?fg=1

RyanShook · on June 19, 2019

A lot is written about the algorithms and search results from Google but I haven’t seen much on crawling and indexing. I imagine there are multiple teams in Google who help maintain the integrity and completeness of their indexes but would love to know more about it considering it might be the largest single repository of knowledge.

shereadsthenews · on June 19, 2019

I believe email is significantly larger than the web. Wouldn't that be your largest repo of knowledge, in aggregate?

RyanShook · on June 19, 2019

Yes, I agree. I just meant the Google index is probably the largest centralized collection of knowledge. Maybe not though, just a guess.

tianshuo · on June 19, 2019

This is almost the same open-ended interview questions I've used for many years, works a charm because it filters the engineers from the chaff (who just copy-pastes code from the internet w/o understanding why)

mrunkel · on June 19, 2019

Me too. This gives a quick insight into what the applicant knows and what they are interested in.

This is usually one of the three questions I require as part of an application.

The other two are a technical question tailored for the position and the third a 'throwaway' that the applicant answer as they see fit. In the past it's been things like "What is the airspeed of an unladen swallow? (European or African)"

dev_north_east · on June 19, 2019

> "What is the airspeed of an unladen swallow? (European or African)"

What are you expecting to get from something like that?

mrunkel · on June 25, 2019

Either they get the reference and say something funny in response, they don't get the reference and google it, or they ask a question like "what are you expecting from something like that?"

In any case, it's a chance to show themselves as a person.

matsemann · on June 19, 2019

Yeah, if you don't understand how the browser has implemented lexical analysis, it surely means you only copy paste code /s

davidmr · on June 19, 2019

That seems like an uncharitable conclusion. I think most people worth hiring (in an interview where that question is asked) should be able to expound at some length on at least some part of the process. What part of the process they spend the most time talking about is an interesting window into their experience.

amelius · on June 19, 2019

If only we knew!

PaulHoule · on June 19, 2019

[flagged]

333c · on June 19, 2019

This isn't a constructive comment. Elaborate why you dislike Google/Google Search, in a way that adds to the discussion. I think there are many valid reasons, and this isn't one.

pgcj_poster · on June 19, 2019

The way I read it, this person wasn't criticizing Google Search but was rather saying that it's stupid to type in www.google.com because you can search from the url bar directly.

333c · on June 19, 2019

Ah, perhaps. That wasn't clear to me.

adtac · on June 19, 2019

I'm pretty sure that was just a joke

333c · on June 19, 2019

From the HN Guidelines[0]:

> Don't be snarky. Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

> Please don't post shallow dismissals, especially of other people's work.

[0]: https://news.ycombinator.com/newsguidelines.html

techslave · on June 19, 2019

i invented this question circa 2009. i have to assume, one of many that independently invented it.

in my version, it’s not “what is behind the scenes “, it’s “tell me everything you can in as much detail as you like, what has to happen to visit www.google.com”. of course in 2009 google had only just become google.com.

shereadsthenews · on June 19, 2019

I've noticed that when anyone claims to have invented something on HN, they get voted down. But I have no reason to doubt it. Please tell us more about how you came to use this question, and why you think it wasn't in circulation before that. I know someone asked me this in an interview in 2010, so it was in wider use by then.

techslave · on June 19, 2019

ARGH! too late to edit. of course i meant 1999 not 2009.

toast0 · on June 19, 2019

Edit: rewritten without snark

I was interviewed on this in 2004 for Yahoo! although the hostname in question was different.

macintux · on June 19, 2019

You mean 1999 perhaps?

0815test · on June 18, 2019

Doesn't it depend on the browser? Does Google Chrome still do that thing where if you fire the browser up and then type www.google.com in it, it actually searches Google for that string and returns a results page, instead of just browsing there?

chromeguy66 · on June 18, 2019

The behaviour of the URL bar is modifiable in most modern browsers. The default on most is automatic detection of URLs - if something looks like an URL, it's treated like one and if it doesn't, it searches your search engine of choice.

pedrocx486 · on June 18, 2019

Doesn't happen to me, at least didn't happen in a while.