I roger that. His name is `Vasanth Krishnamoorthy`. I still have the good fortune of working with him in `WalmartLabs`. Extremely talented, multi-faceted, curious and above all a wonderful human being. I have learnt plenty through my interactions with him and am still learning from him :-). I am serendipitously lucky to have met him in my life.
The main problem with these kinds of exposition topics is that the steps involved are quite dense. By that I mean something like density of real numbers; between any two real numbers there are any number of real numbers. Then, the writer just chooses to expand on those steps that they are comfortable with, or the ones that get more eyeballs. I mean, why not explain how signal travels across various media like air, underwater cables, to and from satellite etc?
Now, don't get me wrong, I am not saying this is useless. I am just saying that if one chooses to give a 30k feet perspective, they had better stay there and not bounce between 40K and 10K.
I’m not seeing not sticking to 30k is really detrimental to how most people would likely consume a doc like this, which would probably be to sample various parts to find what they’re interested in and then dig in.
I love a good reductionist deep dive into things we use everyday. This is a great overview to a wide range of topics.
I wrote something similar a couple of years ago[0] which skips the keyboard and display but goes further into the world of IP packet transmission (and HTTP/2). The parent link is better written though.
The most amazing part about this is that regardless of how much more details one would want to put into it, it is probably practically infeasible to make it complete. It's a wonderful example to demonstrate the importance of the concept of abstraction in CS.
Hashes are one-way, so cannot be decrypted. The server can _compare_ the results of a hash (by doing the hash itself, and comparing the results), though.
You and turtles are suffering from the cryptographic equivalent of a hypercorrection, in the same way that well-intentioned people insist on the propriety of the grammatically impossible phrase "between you and I" (which should be "between you and me," because prepositions take objects, not subjects.) The two of you have had the irreversibility of one-way hashes drilled into your heads, just as many of us were taught when young not to say "me and Susie were playing on the swingset." And you have an allergic reaction to anyone using "decrypt" and "hash" in the same sentence, which can lead to that allergy triggering a false positive. In this case that's what's happening.
Cryptographic hashes are irreversible. That's the point of such a device. But there is nothing stopping someone from taking the result of a cryptographic hash and then encrypting it, and then that someone or someone else decrypting that ciphertext to recover the hash result. E(H(S), k) leads to an encrypted hash, and D(E(H(S), k), k) recovers the hash. It's computationally infeasible to retrieve S. But nobody wanted to do that; they just wanted to know H(S).
You are correct that the server compares the result of the hash (which in context can also be called a "hash," such as "I used SHA-256 on my term paper, and then I spray-painted the hash on the face of the town clock tower, thus proving the existence of my term paper before the class deadline"). Nobody's arguing that. But how did it obtain the thing it's comparing its own result to, without M also obtaining that thing?
(I'm actually not sure whether TLS sends the actual hash or bases subsequent computations on the assumption that both sides can independently derive it. But if it does the former, it's totally fine to say "it decrypts the hash," which is the objection of the parent of this thread.)
Most of that description is outdated and/or wrong. Probably this HN article should say (2015).
But yes, of course you can decrypt an encrypted hash, this way you get back the plain hash.
The client calculates a hash, it _encrypts_ that hash, and sends it to the server, the server _decrypts_ it, and then can verify that it has the same calculation.
The reason this is done is that it can detect a situation in which the client and server were persuaded to arrive at the same results by different means, whereupon they should abort the connection. The mechanism in TLS 1.2 and earlier was not very good, a better one is included in TLS 1.3 but alas last I looked it is disabled in popular browsers because it's incompatible with yet more middlebox crapware from "security" companies.
I wrote it above, but more relevant here maybe: No. There's no need to confirm that, if the keys don't match everything will fail anyway and the connection aborts because everything either party sends appears to be gibberish.
The description linked over-simplifies, the hash they're calculating is a summary of the handshake process by which keys are agreed, we want to prove that both saw the _same_ process happen to reach this state.
Suppose I am willing to use archaic method A because I'm a simpleton, although I do know methods C and E which are safer. The wise people running www.google.com only allow method A if you don't know methods B, C, D or E.
Now, I try to connect to www.google.com and unknown to me a Bad Guy is in the middle. I say "Hello, I know methods A, C and E", but the bad guy changes that message to say "Hello, I know method A only". Google replies "OK I guess we can do method A then" and we use method A. The Bad Guy knows how to break method A and now my security is ruined!
But with this Finished message in TLS, www.google.com and I will calculate different hashes, since I know I said "I know methods A, C and E" but www.google.com got a message from me saying "I know method A only" and those don't hash the same.
This proves somebody is tampering with our connection, we must abort.
Of course, google in particular is behind a very complex distributed network. Distributed DBs are mentioned, but it would be cool to know more about how web requests are distributed and routed in this system.
I did this years ago so it might be different now, but I blocked google.com and some subdomains at the firewall, found a document hosted on google docs (google.com domain) link in google.co.uk search result, clicked it and instead of the link failing due to it being blocked at the firewall, a google.se (swedish) server started sending the ip traffic for the blocked document to come down. I never tried blocking google.se and then repeating to see what other google domain would send the document next, but its clear Google have written their own routing to get information around some restrictions beit deliberate or misconfigured. Its also an excellent way to probe what servers have blocks in place or not, ie censorship. Its also pretty much instant rerouting ie subsecond, so their ability to pass instructions to other servers in a timely manner is obvious. I wonder if their servers are using swarm intelligence in area's or not? They did custom build their own machines, which would have given them the opportunity to tear up the rule book some what.
It most likely depends on what resource you are referring to. If its a search query that will differ drastically from a gmail request. Therefore, its really case specific in monolithic companies with distributed and decentralized architecture.
I remember an interview where I answered a similarly in-depth question about what happens when you load a file, from file system traversal, down to the heads moving across the platter (because almost no one used SSDs in 2009).
I used to work with some guys from Taos Consulting. They asked a similar interview question. “What happens when you ping Google”. Your answer was expected to take at least 3 hours.
I also got asked a similar question in a an interview (as a programmer) and I answered it well and they hired me. But I think it's an absolutely terrible question for judging whether a person will be a good hire.
I like it, especially for interviewing for ops/devops/sre/whatever we say today.
It’s a good warm up question because pretty much any answer is right, and it quickly lets the candidate get to their comfort zone. It is also amenable to be simplified or tweaked - if the candidate doesn’t know much about CDNs or layer 7 load balancing or whatever, you just move past them. And you can change some of the parameters to be sure the candidate isn’t just memorizing the answer sheet.
It can also give a strong signal where the candidate’s knowledge is weak: if you talk to me at length about Apache throwing read syscalls to pull files from the disk but quickly gloss over the networking bits, that might indicate you don’t know them well.
This is precisely what I use it for, and precisely why I like answering it. I will talk in detail about the bits I know, smell bullshit (and look up answers) in the bits I don't, and it gives me an answer to candidates depth in all of them. Backend devs will talk about search and set theory algorithms. SREs will talk protocols. Networking guys will talk layer3-7. Everyone has their own style, and it's certainly the most informative in determining where to prove next.
A good candidate had breadth and depth of knowledge. This one question shows breadth and tells you where to find depth. It doesn't work as well for pure frontend people, but only barely - I almost expect frontend people to understand interrupts and syscalls on that end.
I expect front end people to be able to fire up developer tools and tell me where to troubleshoot slow network responses definitively as DNS related before filing a support ticket saying “my queries are slow because of DNS.” I wound up training network engineers how to use Chrome dev tools at a really large company because so many frontend developers knew absolutely nothing about troubleshooting network issues that they were disproportionately filing network support tickets.
I usually try to focus on what the interviewer wants me to demonstrate knowledge on when I answer the question in an interview. As an interviewer I normally avoid the question and make it more job specific. I might ask about troubleshooting slow servers based upon an incident and try to determine the candidate’s thinking pattern and methodology. For developers it’s easier to ask “this block of code is not doing what I expected, what did I do wrong?” where the mistake is a very common one. I actually had a very practical HackerRank problem that asked me to troubleshoot a program and an accompanying docker compose file. If you had prior experience this would take you 2 minutes while you wonder if that’s really the whole problem.
This question is now burnt. Nobody should be using it. (In reality, it was burnt a long time ago, but it’s crispy and black now.)
If you do use it, you’re merely selecting for people who have read posts like this on HN and reddit. Candidates who have memorized any of this will look vastly superior to those who haven’t.
It’s telling that there are already two sibling comments who are arguing that this is a good question - explains a lot about why technical interviews suck.
Just the fact those people are interested enough to read HN/Reddit for their own sake can't be a bad thing though.
I'm not an employer but I know I'd rather my workmates were the kind of people who are actually interested in computers enough to read about them for pleasure, not just those "straight by the book" types.
The problem with this question is if someone gives a good answer, it can be hard to tell if they studied it or they actually know. Maybe you can suss this out with follow ups.
If they don't give a good answer, maybe they haven't looked into networking details and debugging for some reason -- a lot of junior people haven't, but they may have the aptitude to learn and be great at it, but just don't have the knowledge base yet. Although it depends on exactly what you're hiring for, too. If you need the person like me, who will find and fix your weird problems with networking, maybe they should know this, or be able to make fairly plausible guesses; but most people on my team don't need to do that (although it's always nice to have more).
The difference is if they studied the answer, but didn't grasp the material, they got information to pass the test (maybe), but probably didn't get useful information.
I guess if you stop at each point and ask 'what could go wrong here, and how would you debug it' and they answer that well, then they've gotten the information enough.
Nonsense. Maybe you haven’t noticed, but there’s an entire industry of “tech interview test prep” that exists solely to coach candidates to answer these ridiculous questions. Any signal you might have once detected from trivia like this has now been thoroughly gamed.
What you’re doing here is arguing a truism: any candidate who memorizes the answer to the shibboleth is better than the ones who don’t, because it makes you happier that they memorized the shibboleth.
You notably do not provide any better alternative. Show me some other technical question that can surface useful information about a candidate's knowledge in a typical 30-minute interview.
The question is good because it both involves something most everyone does on a daily basis while providing a wide range of possible areas to explore further: there isn't any single "correct" answer that's possible to cover in a short time, but what candidates do tell you probably indicates what they're most familiar with.
Candidates prepping for this question isn't much of an issue since (a) most simply don't and (b) there's always room to go further into a specific part of the transaction.
If you think you can divine a candidate’s skill by tossing chicken entrails into the air and watching how they land, you’ll only consider rational counterarguments if I offer better alternatives?
How about this: ask questions relevant to the job. Stop trying to be clever.
Kidding. Don’t listen to the parent. The only problem with this question is that it’s general purpose and very well understood, but those aren’t issues. You should still ask people what port HTTPS uses even if it seems stupid because you’re going into an interview blind and you need to assess the experience of a person very quickly (and, sometimes you get surprised by charismatic fools). All questions are tools, this one will tell you that either the person knows this question very well from studying it (which is valid, and you would be able to tell) or they have some experience with it (more valid) but it will tell you where they are most comfortable talking about and you can dig more into various parts if you want.
A lot is written about the algorithms and search results from Google but I haven’t seen much on crawling and indexing. I imagine there are multiple teams in Google who help maintain the integrity and completeness of their indexes but would love to know more about it considering it might be the largest single repository of knowledge.
This is almost the same open-ended interview questions I've used for many years, works a charm because it filters the engineers from the chaff (who just copy-pastes code from the internet w/o understanding why)
Me too. This gives a quick insight into what the applicant knows and what they are interested in.
This is usually one of the three questions I require as part of an application.
The other two are a technical question tailored for the position and the third a 'throwaway' that the applicant answer as they see fit. In the past it's been things like "What is the airspeed of an unladen swallow? (European or African)"
Either they get the reference and say something funny in response, they don't get the reference and google it, or they ask a question like "what are you expecting from something like that?"
In any case, it's a chance to show themselves as a person.
That seems like an uncharitable conclusion. I think most people worth hiring (in an interview where that question is asked) should be able to expound at some length on at least some part of the process. What part of the process they spend the most time talking about is an interesting window into their experience.
This isn't a constructive comment. Elaborate why you dislike Google/Google Search, in a way that adds to the discussion. I think there are many valid reasons, and this isn't one.
The way I read it, this person wasn't criticizing Google Search but was rather saying that it's stupid to type in www.google.com because you can search from the url bar directly.
i invented this question circa 2009. i have to assume, one of many that independently invented it.
in my version, it’s not “what is behind the scenes “, it’s “tell me everything you can in as much detail as you like, what has to happen to visit www.google.com”. of course in 2009 google had only just become google.com.
I've noticed that when anyone claims to have invented something on HN, they get voted down. But I have no reason to doubt it. Please tell us more about how you came to use this question, and why you think it wasn't in circulation before that. I know someone asked me this in an interview in 2010, so it was in wider use by then.
Doesn't it depend on the browser? Does Google Chrome still do that thing where if you fire the browser up and then type www.google.com in it, it actually searches Google for that string and returns a results page, instead of just browsing there?
The behaviour of the URL bar is modifiable in most modern browsers. The default on most is automatic detection of URLs - if something looks like an URL, it's treated like one and if it doesn't, it searches your search engine of choice.
He also ran teams and his leadership style was very humble and encouraging in a workplace that was, at times, the opposite.
So glad to see he's still publishing awesome content like this.
Be sure to check out some of his other popular repos, great stuff!