What if the checksum was the same and you accepted the cache hit if the checksum...

eertami · on April 22, 2016

>How easy is it to create a malicious file that matches the checksum of a known file?

I'd say not easy at all, practically impossible.

https://en.wikipedia.org/wiki/Preimage_attack

AnkhMorporkian · on April 22, 2016

> I'm concerned about people like me who use noscript selectively. How easy is it to create a malicious file that matches the checksum of a known file?

SHA-256? Very, very, very, very hard. I don't believe there are any known attacks for collisions for SHA-256.

jval43 · on April 22, 2016

I think even a collision (any collision) has yet to found.

dinkumthinkum · on April 23, 2016

People make too big a deal of this collision stuff, a lot of times these are very theoretical would require tremendous computation. Anyway, for this use case, even md5, how likely really to make a useful malicious that file collides with a particular known and widely used one? I dunno seems pretty unlikely.

dexterdog · on April 22, 2016

And if you worry about that you can always use 384. Plus a side benefit is that 384 is faster on a 64-bit processor.

kevincox · on April 22, 2016

It would be interesting if browsers start implementing a content-addressable cache. So as well as caching resource by URI also cache by hash. Then SRI requests could be served even if the URL was different.

Of course this would need a proposal or something but it would be interesting to consider.

SixSigma · on April 23, 2016

Plan9's Venti file storage system is content addressable.

http://plan9.bell-labs.com/sys/doc/venti/venti.html

Also available on *nix

jMyles · on April 22, 2016

> How easy is it to create a malicious file that matches the checksum of a known file?

As others have pointed out, it's quite difficult. But here's another way to think about it: if hash collisions become easy in popular libraries, the whole internet will be broken and nobody will be thinking about this particular exploit.

Servers won't be able to reliably update. Keys won't be able to be checked against fingerprints. Trivial hash collisions will be chaos. Fortunately, we seem to have hit a stride of fairly sound hash methods in terms of collision freedom.

llis · on April 22, 2016

I think this vaguely reminds me of the Content Centric Networking developed by PARC. There's 1.0 implementation of a protocol on github (https://github.com/PARC/CCNx_Distillery). A CCNx enabled browser could potentially get the script from a CCN by referring to it's signature alone (it being a sha-256 checksum or otherwise).

wallacoloo · on April 22, 2016

This seems a little redundant - why not just

    <script src="jQuery-1.12.2.min.js" sha-256="31be012d5df7152ae6495decff603040b3cfb949f1d5cf0bf5498e9fc117d546"></script>

? If you wanted to explicitly fetch from google if the client doesn't have a cached copy, then instead do

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.2/jquery.min.js" sha-256="31be012d5df7152ae6495decff603040b3cfb949f1d5cf0bf5498e9fc117d546"></script>

The first would seem preferable though, as loading from an external source would expose the user to cross-site tracking.

newjersey · on April 22, 2016

> The first would seem preferable though, as loading from an external source would expose the user to cross-site tracking.

You're right in that the first one you had with just the sha-256 would be pretty much equivalent as what I had especially given that hn readers have resoundingly given support to the idea that it is non-trivial to create a malicious file with the same hash as our script file. I was simply trying to be cautious and retain some control for the web application (even if the extra sense of security is misplaced).

This is the use case I'm trying to protect by adding a new "canonical" reference that the web application decides. As others in this thread have said, it is very unlikely that someone will be able to craft a malicious script with the same hash as what I already have. The reason I still stand by including both is firstly compatibility (I hope browsers can simply ignore the sha-256 hash and the authorized cache links if they don't know what to do with it).

As a noscript user, I do not want to trust y0l0swagg3r cdn (just giving an example, please forgive me if this is your company name). NoScript blocks everything other than a select whitelist. If the CDN happens to be blocked, my website should still continue to function loading the script from my server.

My motivation here was to allow perhaps even smaller companies to sort of pool their common files into their own cdn? <script src="jimaca.js" authoritative-cache-provider="https://cdn.jimacajs.example.com/v1/12/34/jimaca.js""></scri... I also want to avoid a situation where Microsoft can come to me and tell me that I can't name my js files microsoft.js or something. The chances of an accidental collision are apparently very close to zero so I agree with you that there is room for improvement. (:

This is definitely not an RFC or anything formal. I am just a student and in no position to actually effect any change or even make a formal proposal.

15155 · on April 23, 2016

If accompanied by the exact same sha-256 hash idea, loading from any external source cannot expose the user to any additional risk.

SHA + CDN url list (for whitelisting/reliability purposes - public/trusted, and then private for reliability) would be ideal.