Agreed. When scoping out the size of Google+, one of ArchiveTeam's recent projec...

ta8908695 · on Sept 8, 2020

The external assets for a page could be archived separately though, right? I would think that the static G+ assets: JS, CSS, images, etc. could be archived once, and then all the remaining data would be much closer the 120B of real content. Is there a technical reason that's not the case?

dredmorbius · on Sept 8, 2020

In theory.

In practice, this would likely involve recreating at least some of the presentation side of numerous changing (some constantly) Web apps. Which is a substantial programming overhead.

WARC is dumb as rocks, from a redundancy standpoint, but also atomically complete, independent (all WARCs are entirely self-contained), and reliable. When dealing with billions of individual websites, these are useful attributes.

It's a matter of trade-offs.