For supply chain security, you might be interested in cargo-vet[0], a tool for coordinating and requiring manual reviews of open source dependencies. Both Mozilla and Google[1] have started publishing their audits.toml files, which are machine-readable files describing what source code reviews they have performed.
Is there a reason why they couldn't split the load across multiple HSM? For something so sensitive I would've expected a design where one or more root/master keys (held in HSM) are periodically used to sign certificates for temporary keys (which are also held in HSM). The HSMs with the temporary keys would handle the production traffic. As long as the verification process can validate a certificate chain, then this design should allow them to scale to as many HSMs as are needed to handle the load...
HSM are expensive, the performance is bad, and administration is a pain. They're almost certainly running many clusters of their auth servers around the world, and would need significant capacity at all the locations, in case traffic shifts.
It's probably a better idea to pursue short lived private keys, rather than HSMs. If the timeline is accurate, the key was saved in a crash dump in 2021 and used for evil in 2023, monthly or quarterly rotation would have made the key useless in the two year period.
A certificate chain is a little too long to include in access tokens, IMHO, but I don't know how Microsoft's auth systems work.
Saving you a click: despite what the repo title might suggest, while the code is open source, the model weights cannot be used commercially without permission.
> The code in this repository is open-source under the Apache-2.0 license. The InternLM weights are fully open for academic research and also allow commercial use with written permission from the official team. For inquiries about commercial licenses and collaborations, please contact internlm@pjlab.org.cn.
FWIW, spot prices for c5a.24xlarge in us-east-2b and us-east-2c seem to have been under $0.92/hr for most of the last 3 months. So, assuming some flexibility on the choice of region, that would adjust your estimate to $0.92 / $1.69 * $1233.70/mo = $671.60/mo, which looks a lot more reasonable. Hopefully I did that math right. Data egress prices are definitely still ridiculous, I agree.
True! It sometimes drops even more, which definitely makes spot instances attractive. The r5.16xlarges had a ~80% discount and a <5% termination rate for while.
Also, if all of scientific computing switched to AWS in order to exploit spot instance pricing, I don't think those market dynamics would stay the same.
As an aside, I've had trouble created large clusters of high memory instances in us-east-2. They might have increased capacity recently, though.
That all makes sense, but it doesn't seem to apply to the example code in the article, right? `inc` doesn't decode to a single fused uop on Ivy Bridge. AFAIK, the example code in both cases decodes to the same number of uops in the fused domain...
It doesn't decode to a single fused uop, but rather 2 fused pairs of uops (so 4 total unfused uops). So there is fusion going on, twice (the load and ALU op are fused, and the two store uops are fused).
If you use the three-instruction sequence the load and ALU op can't fuse, which potentially makes it slower (but not in this case since the bottleneck is elsewhere).
I believe you're thinking about `add`. According to Agner Fog's instruction tables, the load and ALU uops are fused for `add`s, but not in the case of `inc`
Yes weird - it was pointed out somewhere elsewhere here and I updated some of my comments but not this one.
It's quite unusual that add gets the 2-uop treatment but inc doesn't. Yes, they treat flags differently, but that's mostly been resolved though flag renaming, and the reg forms of inc don't suffer any penalty.
I'll have to double check if this is true. If it is, compilers should generally be preferring add [mem], 1 then (except perhaps when optimizing for size) - the difference in the flag behavior is pretty much never relevant for compiled code.
Renaming is unrelated to my guess about the flags. The point is that there's a limit to how many inputs a fused uop can have, 3, and the flags register may become one input too many to be able to fuse the uops. For example,
inc [rdi+rbx]
has the obvious rdi and rbx dependencies, the flags register, plus (presumably, depending on implementation details) an allocated virtual register for the 1 that is added. On the register forms this limit is never a bottleneck.
You also see the same behavior, according to Agner, on SHR/SHL m, i, which may or may not alter some flags depending on shift amounts, and strangely on NOT m, which explicitly does not alter the flags in any situation. This latter one makes little sense.
Sure, but everything you say about inc is true of add as well, but add double-fuses fine (by "double-fuse" I mean it is 2/4 ops in the fused/unfused domains unlike inc which is 3/4). In general many RMW instructions (double) fuse and most (all?) also modify the flags.
I doubt there is a virtual register for the 1 really - sure there is some storage for it somewhere in the ROB or the scheduler or whatever, but it doesn't need to be "renamed" in the usual sense since it's not visible anywhere. In any case, the add case is "worse" since it can have a real architectural register there, not just an implied immediate 1.
Yes, there is a definitely a limit on the number of inputs a uop can have - and you can see this in the effect of "unlamination" which is where a uop fuses in the uop cache, but then unfuses after that and so mostly acts like an unfused uop (except for uop cache space). This shows up with indexed addressing modes.
For example:
add [rax], 1
fully double-fuses, but:
add [rax + rbx], 1
Double-fuses only in the uop cache (counts as 2 there), but unlaminates once after that (counts as 3 in the rest of the fused domain).
Interestingly though this guy:
add [rax], rbx
Still fully double-fuses everywhere, despite having the same number of input registers as the add [rax + rbx], case. Probably it's easier for the renamer though because the registers are spread across the uops more evenly rather than being concentrated in the load uop?
Moving away from RMW to load-op there are other indications flags aren't a problem: things like BMI shrx/shlx/rorx with memory operand don't fuse despite that these don't update flags at all. On the other and ANDN, which is similarly in BMI and is also 3-arg instruction (distinct src/dest) and updates flags does fuse! So actually I'd say updating the flags in a consistent way makes it more likely to fuse.
Maybe that's the answer then?
Anything that updates the flags in the "standard way" - i.e., SF,ZF,CF,OF all set to something, can (potentially) micro-fuse. Anything which doesn't - whether that is updating fewer flags (inc) or no flags (shrx) or updating them "weirdly" (shl and friends) isn't eligible. Interesting theory and still consistent in broad strokes with your "it's the flags!" claim.
This theory is cool, but I don't think it works, all things considered. PDEP and PEXT should have the same unfused behavior as SHLX, since they also do not change any flags, but they _do_ fuse. BEXTR should (or could) fuse, but doesn't. So I don't know.
The important thing to take away from this article is that MD6 really shouldn't be used in any production software, unfortunately. MD6 didn't even make it past the first round of the SHA-3 competition, so it hasn't received much attention from cryptanalysts.
Cryptohipsters (can I coin this term?) should take a look at Skein (a third-round SHA-3 candidate), BLAKE2 (the successor of a third-round SHA-3 candidate), and Keccak (the SHA-3 winner). These hash functions have undergone much more analysis. Notably, BLAKE2 is faster than MD5 in many cases, but without the security problems of MD5.
There's a bit of bash boilerplate, but honestly it was about what I would expect, given a structure with so many layers of indirection.
Pain points:
* Switching between bash and jq's filtering language led me to use string interpolation with bash variables. Malicious inputs can probably exploit this (and it was just awkward anyway).
* A "select one" filter would be nice, instead of select + get first element.
jq is powerful enough to express it in one query: it has variables (using expr as name) which make this thing at least vaguely feasible. That doesn't mean you should, but you could:
.models
| (.[]
| if .title == "farmers"
then
(.fields | .[] | if .name == "Fruits" then .key else empty end)
as $fruits
| (.fields | .[] | if .name == "Full name" then .key else empty end)
as $name
| .entities
| .[]
| if .[ $name ] == "Bob, the farmer" then .[ $fruits ] else empty end
else empty end)
as $fruits
| .[]
| if .["title"] == "fruits"
then
( .fields | .[] | if .name == "Name" then .key else empty end) as $fruit_name
| .entities | [ .[] | {(._id): .[$fruit_name]} ] | add as $lookup
| $fruits | .[] | $lookup[.]
else empty
end
(I'm not claiming this is the best way to write that query, but it's the first one I came up with.)
Here's the equivalent using jsonaxe[1]. The main difference is the python-like syntax, which is either good or bad depending on your tastes. The pain point about string interpolation remains, tho.
Heads up to anyone considering using this: the author wrote their own crypto code[1]. I would recommend against using this until that is fixed... I've already spotted a few vulnerabilities.
That doesn't immediately mean that the library is useless.
> until that is fixed
I disagree with the word "fixed", as if it's broken. He probably used the highest-level primitives he could to achieve the requirements.
> I've already spotted a few vulnerabilities.
It'd probably be more constructive to open an issue detailing the vulnerabilities rather than saying "I've spotted some, use NaCl" and leaving it at that. What makes you so sure that NaCl is even a suitable replacement without knowing all the considerations that went into the project?
I don't necessarily disagree, but at some point the buck has to stop, right? How would you implement this any other way? The author didn't implement AES or so on himself, he uses standard library encryption and applies it as appropriate. You should probably report the issues you find to federico.ceratto-at-gmail.com (from Github).
The author should use a library that provides a simple "encryptWithPublicKey" method, so that any choices about RSA key size, AES mode of operation, etc are all taken care of. NaCl[1] would probably be best, since it's written and audited by prominent cryptographers.
There are a tremendous number of other ways this could be implemented.
Authenticated encryption? GCM? XTS?
Salt the CFB? Guard against interblock attacks?
The crypto needs to be completely reworked. This is an asymmetric kek around symmetric encryption, which is done in many other projects.
Half-backed crypto such as this is worse than no crypto at all, as it lulls people into believing they are using a valid cryptographic system. But, the project implements (poorly) a subset of what is needed and pushes the rest into application code - but app writers don't know this and wouldn't know what to implement even if they know of the shortcomings.
Cryptographers see this all the time. People think they invented a new concept but only implemented a well-known design but did it incompletely and with well-known flaws in the crypto. Then, people defend the system, when it would be far easier to use better primitives.
It says something if you're using a package called PyCrypto for RNG and that happens to be an insecure approach. You would think with that name it was the right way to do it.
It looks like the file is replaced every write, too, which removes most of the hard use cases. It really seems to me that he could just use PyNaCl to encrypt the files and not have to bother with all the custom crypto. I don't know what the intentions and tradeoffs are, though, so I can't be sure.
You could make similar threat-model arguments as are made about FDE, but that's not really a good excuse when authentication would be technically easy in this case.
Modern password hashes are designed to use a large amount of RAM in addition to CPU time in order to make password cracking using ASICs and GPUs more difficult. The paper on Argon2[1], the winner of the recent password hashing competition, would be a good read if you're interested in learning more about how password hashes are designed.
[0] https://github.com/mozilla/cargo-vet
[1] https://opensource.googleblog.com/2023/05/open-sourcing-our-...