If you gave that response at an interview, you'd be praised for knowing what a URL shortener is, then asked to drill into more detail. For example, what data store would you use for your hash table? What technology do you use for your web tier? Are those physically on the same box? OK, that's fine. Twitter just bought you and you have 48 hours until you're receiving 100,000 requests per second. What needs to change about your architecture? OK, good answer, good answer. We recently discovered that people are abusing our infrastructure to cloak links to pages delivering malware. We need to be able to ban a domain and yank all their links retroactively. How can you accommodate that? etc, etc
How would you guys answer this one?
Start by saying that you're capable of approaching that problem at multiple different levels of the stack, from the browser chrome to the networking layer to HTTPS to HTTP to Google's (presumed) infrastructure to the browser again. Ask where they want you to drill down. Networking? OK. Evince some knowledge of what DNS is. Mention that Google controls the records for google.com on a nameserver somewhere. Maybe talk about bind a little bit. Mention in a wee bit of detail how the A record propagates out to the rest of the network, including to the user's DNS server. Observe that since it is google.com it is certainly cached there (and is probably already in the OS's DNS cache to boot, you might add). Ask if the interviewer would like to hear about TCP/IP or SSL handshakes or what have you next.
I've had it twice. The first time, with Google, they were genuinely interested to know how deep I could go, so I talked about DNS, TCP, SSL, HTTP etc, but I also talked about application event loops and keyboard drivers and interrupt handlers.
The second time, with Rackspace, I presumed they want the same, but they were a bit bewildered and wanted to keep it higher level (protocols only).
I sort of hope someday to be in an interview where they ask that question. I'm pretty sure I could go an hour straight on it. Possibly two. Obviously, they won't let me. But still, it ought to be fun... I'll probably get stopped somewhere around the CPU trying to figure out which layer of cache the kernel's event handler is in, if not before....
Come interview at Facebook for a Production Engineering position[1]. I bet you your interviewers will push you at least that far or I'll give you a free lunch[2]. Even if it's just for the interview experience.
[1] I work there and love it. My contact details on my user page.
[2] I'll give you a free lunch either way. No purchase required. Void where prohibited.
Can you recommend how to learn this? I don't work with the web or networking, and would like to shore up that part of my knowledge. Are there any good books that gives the high-level overview of it all?
I find that the best way to learn it is to start poking at it. Fire up Wireshark, load google.com, and take a close look at the traffic. You'll see every packet going back and forth, and can try to figure out what each one of them is. Set up a proxy that can MITM the SSL connection, and look at the contents of the packets. Just setting that up will probably tell you a lot about how it all works, and you'll be in a better position to see the plaintexts of all of the handshaking traffic.
The first question has had a semi-famous answer from this post in 2011: https://plus.google.com/+JeanBaptisteQueru/posts/dfydM2Cnepe I'd be interested to know what if any books people recommend as well, though my own strategy is when encountering a black box, and desiring to learn about it, I learn about it using whatever resources I can find to make it transparent (often revealing many internal black boxes...). Occasionally online I'll find someone who made really good visual diagrams that help summarize a thing that I might not have found otherwise in a book, e.g. http://code.google.com/p/corkami/wiki/ELF101 or http://brendangregg.com/linuxperf.html
As for the link shortener infrastructure, I'd ask for more context to get a better idea of the scope. If there is not more context than that, define the scope at which you're operating (eg, "I understand 'infrastructure' as 'the structure of the service' and not as 'the physical infrastructure'" and define yourself what the URL shortener does (going with "a web service with a single endpoint" sounds reasonable to me). It's really important to understand what you're trying to achieve first, especially when faced with a real problem.
Ask the interviewer. One of the things that I find most difficult about interviewing is when the candidate doesn't ask me any questions during the technical parts of an interview. If you aren't sure, you need to ask me -- I am perfectly capable of misphrasing a question or assuming you have context that you don't. Asking your coworkers for help is a perfectly legitimate problem solving strategy.
Interviewing is stressful and hard; no need to make it more stressful and harder on yourself.
I sometimes ask questions and get 'What would you do if there was no one to ask?' and 'What would you do if there was no Google?' and I would like to understand this. Should I have everything rote memorized? Should everything I do be a bespoke solution?
The latter is a gentler way of saying, "no, I want your solution to this trivial problem".
The former is a question about your work process; for instance, if they say, "ok, so build Twitter" and you respond rationally with, "what of Twitter should I build?", they're probably looking for how you operate in an environment with underspecified work. It might be a good idea to talk about how you'd defend your decisions to draw the boundaries of the problem; why you chose to ignore e.g. streaming updates or following; &c.
Can anyone cite a good source that provides a fairly detailed answer to the second question? I'm not looking for a hand-waving forum post but an actual answer aimed at someone who doesn't necessarily already know it.
I'm wondering how to answer that?
* Get a short and memorizable url.
* When you enter an URL hash it, put it in a hash table and use the hash for the link.
> “If I type https://google.com into my browser and press enter, what happens?”
How would you guys answer this one? I'm not sure I've enough knowledge to do that. I'd say:
* first TLS handshake thanks to RSA to share a key
* then a GET
* then the server sends a cache version of Google according to location/cookies/etc...
* then the html gets displayed in the client's browser