Yeah they definitely didn't do that in the past. We've lost "as a large language model" and "it's important to remember" but gained "you're absolutely right!"
I would have thought they'd add "don't apologise!!!!" or something like that to the system prompt like they do to avoid excessive lists.
Yes. And those random Internet chatter almost certainly doesn't know what they are talking about at all.
First, nobody is training on H20s, it's absurd. Then their logic was, because of high inference demand of DeepSeek models there are high demand of H20 chips, and H20s were banned so better not release new model weights now, otherwise people would want H20s harder.
Which is... even more absurd. The reasoning itself doesn't make any sense. And the technical part is just wrong, too. Using H20 to serve DeepSeek V3 / R1 is just SUPER inefficient. Like, R1 is the most anti-H20 model released ever.
The entire thing makes no sense at all and it's a pity that Reuters fall for that bullshit.
MLA uses way more flops in order to conserve memory bandwidth, H20 has plenty of memory bandwidth and almost no flops. MLA makes sense on H100/H800, but on H20 GQA-based models are a way better option.
Not sure what you are referring to—do you have a pointer to a technical writeup perhaps? In training and inference MLA has way less flops than MHA, which is the gold standard, and way better accuracy (model performance) than GQA (see comparisons in the DeepSeek papers or try deepseek models vs llama for long context.)
More generally, with any hardware architecture you use, you can optimize the throughput for your main goal (initially training; later inference) by balancing other parameters of the architecture. Even if training is suboptimal, if you want to make a global impact with a public model, you aim for the next NVidia inference hardware.
Didn't deep-seek figure out how to train with mixed precision and so get much more out of the cards, with a lot of the training steps able to run at what was traditionally post training quantization type precisions (block compressed).
Contrary to common belief, it's fine to say negative things about the government in this case, as long as you are not Chinese. They may argue with you (or laugh at you for some even weirder reasons) and you both may have an unpleasant conversation, but that's it.
Apple pushes a narrative that their devices are secure (not private, but secure). And my less tech-savvy friends sincerely believe that it's due to it being a walled garden, with curated software only.
https://www.minimaxi.com is their website for the Chinese parent company 上海稀宇科技有限公司, https://minimax.io is their international website for the Singapore based company Nanonoble Pte Ltd that handles operations outside of China.
What source do you want? I have a few friends who work for them and they all live in either Shanghai (most) or Beijing. And I've never seen anyone who claimed they are based in Singapore or anywhere else before. Does this work?
Wikipedia in itself is no source, and after reading parents message I went there to check to and surprise surprise, neither of the statements have sources attached to it. None of the linked articles have any information about where their headquarters is either.
If someone knows of a trustworthy article that states it outright, please feel free to share.
I'm the OP who claimed it was Singaporean, after checking LinkedIn. I then found the Wikipedia page, which I posted above. Amongst the comments here there is also a link to a Bloomberg article about a potential IPO. I don't have a dog in the race. Just passing on what I found.
Unwrap can definitely bring down a Rust program but it's definitely not nearly as bad as null pointers in C, C++, Java, etc.
1. You have to explicitly add `unwrap()`, in C you just have to forget to add a null check which is a lot easier to do. The compiler won't remind you like it does in Rust. The bug is opt-out-if-you-remember not opt-in-if-you're-lazy.
2. The crash is safe. Usually a null pointer crash is safe but I've definitely seen cases where it leads to impossible-to-debug random failures.
3. You can see `unwrap()`s in code review easily.
4. You can actually catch panics, which is probably a good idea in high availability systems like a web server. I don't know if people actually do this in practice though.
Usually it's possible to write similar bugs in Rust but it's also far less likely that you would (though it does definitely happen). So Rust does at least help with this even if it doesn't fully prevent the issue.
but a call to unwrap is usually more explicit than a null pointer dereference when you are reviewing code. if you are deserializing something from an external source and calling unwrap() on some optional fields to convert them to non-option types then this should raise alarm bells. of course maybe everyone agrees the external source should not be sending such data and it goes into prod anyway. but also its possible everyone agrees its worth putting some extra effort into not crashing the process in such a situation because there is too much risk.
>calling unwrap() on some optional fields to convert them to non-option types then this should raise alarm bells
Yeah, definitely. And the equivalent without optional types, dereferencing a null pointer, might happen because they don't even realize it could be null in the first place. Not everyone writes "assert(ptr != 0)" every time they assume a pointer isn't null, because it happens frequently (if the code doesn't use references enough, which IIRC happens with Google).
When you have an option type, you're made aware of it explicitly, and calling `.unwrap()` should, like you said, raise alarm bells and make you think anout whether you actually want to crash the program.
In most situations panicking and deferencing a null pointer leads to the exact same scenario: The binary crashes. You can unwind and catch panics in Rust, but I’m not sure if that would have helped in this scenario as it might have immediately went directly into the fault code again.
However, I would assume that the presence of an «unwrap» would have been caught in code review, whereas it’s much harder to be aware of which pointers can be null in Java/C++.
> In most situations panicking and deferencing a null pointer leads to the exact same scenario: The binary crashes.
This is a false and dangerous misconception people seem to get wrong a lot of the time. There's no guarantee that's the case, especially when working in C where pointers are often subscripted.
It's the common behavior that a trap occurs but nothing dictates that's actually what will happen.
reply