Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We don't need to have anything as complex as package manager. It would be much easier to just link to these libraries (and any other resource) by the hash of their content. If you do that you would just have to link to resources by their content hash.

I'm not really sure why this isn't already in place.

edit: The reason I'm not sure why because it sure seems to me that multiple threads on the post are all suggesting basically the same simple idea, the ability to serve a file by it's hash (either just by it's hash alone, or by a url + the hash). Personally, I think whatever form these url's take i think it ought to be backward compatible which I think is possible.



> I'm not really sure why this isn't already in place.

Everything we need is already in place, except for a tweak in the caching strategy of the browsers[1]. With Subresource Integrity [2] you provide a cryptographic hash for the file you include, e.g.

  <script src="https://example.com/example-framework.js"
          integrity="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
          crossorigin="anonymous"></script>
As it is, browsers first download the file and then verify it. But you can also switch this around and build a content addressable cache in the browser where it retrieves files by their hash and only issue a request to the network as a fallback option, should the file not already be in cache. Combine this with a CDN which also serves their files via https://domain.com/$hash.js [3] and you have everything you need for a pretty nice browserify alternative, without any new web standardization necessary.

[1] And lot's of optimization to minimize cache eviction and handle privacy concerns, but that are different questions.

[2] https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

[3] Imagine if some CDN would work together with NPM, so every package in NPM would already be present in the CDN.


Folks in W3C webappsec are interested, but the cross-origin security problems are hard. We'd love feedback from developers as to what is still useful without breaking the web. Read this doc and reach out! https://hillbrad.github.io/sri-addressable-caching/sri-addre...


What's the best way to reach out?

I think that using the integrity attribute is great because if it happens it's going to have to work through a lot of tricky implementation details (e.g. things like origin laundering) of moving to an internet of content by hash rather than content by location.

However beyond just having an integrity attribute added to html I am interested in the question of how do we encode an immutable url as well as the content-hash for what it points to (as well as additionally required attributes) into a `canonical hash-url` (i.e. encode all these attributes) that is backward compatible with all the current browsers / devices, and which browsers can use in the future to locate an item by hash and/or by location.

The driving reason for this encoding is make sharing of links to resources more resilient, and backwards compatible. Eventually the browsers could parse apart the `canonical hash-url`s and use their own stores for serving the data, but not until the issues (and likely other unthought of ones) listed in the sri addressable caching document you linked are worked through.


These problems are really hairy. Thankfully, all the privacy issues are only one-bit leakages (and there are TONS of one-bit leakages in web browsers), but the CSP bypass with SRI attack is really cool.

One thing that I've found incredibly disappointing about SRI is that it requires CORS. There's some more information here: https://github.com/w3c/webappsec/issues/418 but it essentially means that you can't SRI-pin content on a sketchy/untrustworthy CDN without them putting in work to enable CORS (which, if they're sketchy and untrustworthy, they probably won't do).

The attack that the authors lay out for SRI requiring CORS is legitimate, but incredibly silly - a site could use SRI as an oracle to check the hash value of cross-domain content. You could theoretically use this to brute force secrets on pages, but this is kind of silly because SRI only works with CSS and JavaScript anyway.


I, as someone who worked on the SRI spec find this incredibly disappointing as well. We've tried to reduce this to "must be publicly cachable", but attacks have proven us wrong.

And unfortunately, there are too many hosts that make the attack you mention credibly silly:

It is not uncommon that the JavaScript served by home routers contains dynamically inserted credentials. And the JSON response from your API is valid JavaScript.


Addendum

To be completely honest: Only reach out if you have solutions for any of the problems or can reduce what you want down to something that is solvable with these problems in mind.

If your solution does not live on the web, you'll have a hard time finding allies in the standards bodies that work on the web :)

You'll have a hard time convincing spec editors and browser vendors already. The working group mailing list is https://lists.w3.org/Archives/Public/public-webappsec/

If you have minimal edits to the spec, we can take it straight to Github. SRI spec contains a link to the repo.


Well, HTTP does have etag ( https://en.wikipedia.org/wiki/HTTP_ETag ) but of course that would still require a request sending the known etag, to either get 304 not modified or the content. So how about a way to put the etags of assets into the header of the document loading them or something? Then the browser can decide if it wants to make that request.

And I also have the feeling that surely, just this exist, I just don't know about it ^^

Linking things by hash, other than being so very ugly, consider this case: you have this asset that changes every minute, and is several MB. If a users users your site too much, you will flush out out all other cached stuff with old versions of that file that will never get referenced again. That just strikes me as extremely wasteful, that is, you get a short term boost but even worse performance overall. If other sites do it too much, it will mean your own stuff will not even be cached when visitors come back.


It's called subresource integrity. The spec is at: https://www.w3.org/TR/SRI/

Mozilla already implements it: https://hacks.mozilla.org/2015/09/subresource-integrity-in-f...


Does this make the browser avoid the request if it already has a cached script with that hash? If so, that would indeed be exactly what I was hoping for, except it would need to be extended for all things, not just javascript. Anyway, thanks!


> We don't need to have anything as complex as package manager.

Maybe you're right. But still, today we need very complex build tools and silly server hacks like setting expire headers in 10 years.

I wish instead it was as simple as deploying my 3kb app.js and tell the browser: "Hello there, here's my manifest with all the dependecies I need to run my app. Thank you.".


> It would be much easier to just link to these libraries (and any other resource) by the hash of their content.

That sounds like IPFS, [1].

[1] https://ipfs.io/#how


I hope that one day all the browsers will implement IPFS for this!


> It would be much easier to just link to these libraries (and any other resource) by the hash of their content.

And that's exactly what Nix and Guix do. Elegant solutions to age old problems.


Linking by hash to all common resources would be bad. You do want fixes and updates, after all.


Links are already tied to some specific version. Nobody serves "jquery.js" that points to a constantly updated version of jquery. Nobody that's sane anyway.


Fair. My point was more that for many resources, you probably do appreciate this. Certainly package managers do this. Not sure why web would be different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: