> result in hundreds of thousands of wasted engineering hours trying to enable t...

andrewstuart2 · on Jan 1, 2017

Isomorphic. Thank you, I was searching my brain for that word for like half an hour. :-)

> Why should your code care if it is running on my computer or yours?

It shouldn't. But my users already care about perceived latency, and that is directly limited by the speed of light. My users want feedback as quickly as possible that their input has been received, and that something is happening in response. Thanks to the speed of light, this would ideally take place instantly right in front of their eyeballs. That can't happen yet, so as much as can realistically happen on my user's CPU, memory, and storage is the next best thing.

> They do crawl REST APIs. Specifically, ones using Hypertext Markup Language.

What I meant to say was JSON, so I'm contributing to my own pet peeve of saying "REST" and meaning "JSON." :-p

HTML is awesome and does a wonderful job of letting me mark content in a way that it can be efficiently rendered, semantically, and be both human-readable and marginally machine readable. There are two problems, though. The first is that full documents (since the article points out AJAX is not performed by Google) are incredibly repetitive and wasteful, especially when retrieving the same content fragments multiple times.

The second is that it is strongly coupling content and presentation, two orthogonal concepts, much earlier than is optimal. Sure, you can cache full documents and display them when requested again, but the more common case is that a large subset of what I just displayed to my user will be displayed again, with one new item, but has still invalidated my cache because the granularity is at the full-page presentation level, and not the business domain object level. If, instead, I cache and render business objects on the client side, I can be more intelligent and granular with my caching strategy, react much more quickly to my users' feedback, and have a much smaller impact on their constrained devices. Not only that, but transmitting structured business objects instead of presentation-structured content lets me more efficiently reuse that data across devices for which HTML may not be the most effective way to present the data to them.

My personal architectural bents aside, the truth remains that content discovery agents (e.g. indexers) should not be treated as content delivery agents with such a huge influence on content format. This ends up creating (IMO) too much influence over external engineering decisions, rather than allowing engineers to think critically about the right architecture that gives users the best possible experience.

Most importantly, I'm not saying that all the engineering effort should be placed upon the discovery agents. Of course there are limits on how much they can discover on their own, and (as always in matters involving many parties) there need to be good conversations about the state of things, and what we think is the right direction to go to support each other and our users. It's just been my opinion lately that this is not so much a conversation anymore as a unidirectional stream of "best practices" coming from a single group.

paulddraper · on Jan 1, 2017

Yeah, I understand the server-side rendering vs. client-side updating, and the design benefits of API-driven development. And unfortunately, a lot of popular JS frameworks haven't done a great job about helping with these.

Closure Library/Templates was meant to render server-side and bind JS functions after render, or create client-side dynamically. (Interestingly, the historic reasons were performance, not SEO.)

React and Meteor have good server-side stories. Angular 2 is getting one.

I would say there is a lot of low hanging fruit in just avoiding most client-side JS. Take http://wiki.c2.com/ -- the "original" wiki. That should all be static. Same with blogs, documentation, and lots of other public, indexable content.

inlined · on Jan 1, 2017

[Disclosure: I work at Google but don't work on anything related to the crawler]

All this anger aside, I'm actually pretty impressed with the world we live in and proud of my company. Think about how far we've come that merely crawling and indexing the vastness of the internet is so mundane now. Now we should expect the whole internet to be downloaded and executed. That's got to be a great security and integrity problem. Surely someone had tried to break out of the sandbox. Can that be abused to affect SEO of other sites. The easy answer is "spin up a new VM for each page" but that would slow the indexing process down by orders of magnitude.

andrewstuart2 · on Jan 1, 2017

I'm not sure where you're sensing anger. The thread so far is a pretty great example of the discourse I've come to really appreciate on HN. Sure, disagreement may be uncomfortable or feel awkward to read at times, but I think it's easily for the best. I'd much rather have somebody disagree with me and give good reasons than just blindly agree.

andrewstuart2 · on Jan 1, 2017

Yeah, I think I'm probably thinking much more heavily of the heavily-data-driven, dynamic web application use case since that's the kind of thing I've been working on for 5+ years now. I imagine that the vast majority of the internet content actually consists of much more long-form prose that doesn't benefit quite so much from a deferred-rendering approach since it varies little if any from user to user. In fact, that would probably be an overall systemic loss since now the same work is being done many times to render the same content, when it could be done once and cached for all.

And I don't expect Google or anyone to be able to support every edge case, either. I really would just like some sort of better solution that involves a global minimum of effort to achieve the same thing -- indexing what the user actually sees (non-private info, at least), and helping users discover sites that will give them a great experience and not just sites that give indexers a great experience.