More

rfoo · 2025-07-21T07:06:47 1753081607

The problem is worktrees in this case.

rfoo · 2025-07-14T12:42:45 1752496965

And threads are just 100x better than stackless coroutines, period.

Especially when we have actual threads with UMCG.

rfoo · 2025-07-11T20:59:32 1752267572

Wikipedia listed a FAIR alumni as cofounder for this "Moonshot AI". Make it funnier probably.

rfoo · 2025-07-08T07:38:19 1751960299

TBH I'm tired of only the "Ah, an excellent follow-up! You are absolutely right <...> My apologies" part.

IshKebab · 2025-07-08T08:13:54 1751962434

Yeah they definitely didn't do that in the past. We've lost "as a large language model" and "it's important to remember" but gained "you're absolutely right!"

I would have thought they'd add "don't apologise!!!!" or something like that to the system prompt like they do to avoid excessive lists.

rfoo · 2025-07-02T14:27:26 1751466446

> the speed at which the GPUs can communicate is key

Guess what a telco equipment company is good at :p

rfoo · 2025-06-28T10:14:17 1751105657

Yes. And those random Internet chatter almost certainly doesn't know what they are talking about at all.

First, nobody is training on H20s, it's absurd. Then their logic was, because of high inference demand of DeepSeek models there are high demand of H20 chips, and H20s were banned so better not release new model weights now, otherwise people would want H20s harder.

Which is... even more absurd. The reasoning itself doesn't make any sense. And the technical part is just wrong, too. Using H20 to serve DeepSeek V3 / R1 is just SUPER inefficient. Like, R1 is the most anti-H20 model released ever.

The entire thing makes no sense at all and it's a pity that Reuters fall for that bullshit.

reliabilityguy · 2025-06-28T11:48:27 1751111307

> Using H20 to serve DeepSeek V3 / R1 is just SUPER inefficient. Like, R1 is the most anti-H20 model released ever.

Why? Any chance you have some links to read about why it’s the case?

terafo · 2025-06-28T11:56:44 1751111804

MLA uses way more flops in order to conserve memory bandwidth, H20 has plenty of memory bandwidth and almost no flops. MLA makes sense on H100/H800, but on H20 GQA-based models are a way better option.

pama · 2025-06-28T19:07:23 1751137643

Not sure what you are referring to—do you have a pointer to a technical writeup perhaps? In training and inference MLA has way less flops than MHA, which is the gold standard, and way better accuracy (model performance) than GQA (see comparisons in the DeepSeek papers or try deepseek models vs llama for long context.)

More generally, with any hardware architecture you use, you can optimize the throughput for your main goal (initially training; later inference) by balancing other parameters of the architecture. Even if training is suboptimal, if you want to make a global impact with a public model, you aim for the next NVidia inference hardware.

cma · 2025-06-28T17:06:45 1751130405

Didn't deep-seek figure out how to train with mixed precision and so get much more out of the cards, with a lot of the training steps able to run at what was traditionally post training quantization type precisions (block compressed).

reliabilityguy · 2025-06-28T12:00:16 1751112016

MLA as in multi-head latent attention?

terafo · 2025-06-28T12:08:30 1751112510

reliabilityguy · 2025-06-28T12:26:34 1751113594

Ah, gotcha. Thank you

rfoo · 2025-06-24T19:08:25 1750792105

Contrary to common belief, it's fine to say negative things about the government in this case, as long as you are not Chinese. They may argue with you (or laugh at you for some even weirder reasons) and you both may have an unpleasant conversation, but that's it.

rfoo · 2025-06-19T18:05:39 1750356339

Apple pushes a narrative that their devices are secure (not private, but secure). And my less tech-savvy friends sincerely believe that it's due to it being a walled garden, with curated software only.

Apple made no attempt clarifying this.

rfoo · 2025-06-18T09:33:09 1750239189

> A Singapore based company, according to LinkedIn

Nah, this is a Shanghai-based company.

diggan · 2025-06-18T11:24:46 1750245886

[flagged]

Deathmax · 2025-06-18T11:46:31 1750247191

https://www.minimaxi.com is their website for the Chinese parent company 上海稀宇科技有限公司, https://minimax.io is their international website for the Singapore based company Nanonoble Pte Ltd that handles operations outside of China.

rfoo · 2025-06-18T13:52:06 1750254726

What source do you want? I have a few friends who work for them and they all live in either Shanghai (most) or Beijing. And I've never seen anyone who claimed they are based in Singapore or anywhere else before. Does this work?

noelwelsh · 2025-06-18T11:32:26 1750246346

https://en.wikipedia.org/wiki/MiniMax_(company)

diggan · 2025-06-18T11:37:45 1750246665

Wikipedia in itself is no source, and after reading parents message I went there to check to and surprise surprise, neither of the statements have sources attached to it. None of the linked articles have any information about where their headquarters is either.

If someone knows of a trustworthy article that states it outright, please feel free to share.

noelwelsh · 2025-06-18T11:43:27 1750247007

I'm the OP who claimed it was Singaporean, after checking LinkedIn. I then found the Wikipedia page, which I posted above. Amongst the comments here there is also a link to a Bloomberg article about a potential IPO. I don't have a dog in the race. Just passing on what I found.

rfoo · 2025-06-15T10:15:32 1749982532

Pretty sure it's C++.

inb4 someone yells Rust. No, your `.unwrap()` happily panics in Rust too.

IshKebab · 2025-06-15T11:51:56 1749988316

Unwrap can definitely bring down a Rust program but it's definitely not nearly as bad as null pointers in C, C++, Java, etc.

1. You have to explicitly add `unwrap()`, in C you just have to forget to add a null check which is a lot easier to do. The compiler won't remind you like it does in Rust. The bug is opt-out-if-you-remember not opt-in-if-you're-lazy.

2. The crash is safe. Usually a null pointer crash is safe but I've definitely seen cases where it leads to impossible-to-debug random failures.

3. You can see `unwrap()`s in code review easily.

4. You can actually catch panics, which is probably a good idea in high availability systems like a web server. I don't know if people actually do this in practice though.

Usually it's possible to write similar bugs in Rust but it's also far less likely that you would (though it does definitely happen). So Rust does at least help with this even if it doesn't fully prevent the issue.

benmmurphy · 2025-06-15T11:37:55 1749987475

but a call to unwrap is usually more explicit than a null pointer dereference when you are reviewing code. if you are deserializing something from an external source and calling unwrap() on some optional fields to convert them to non-option types then this should raise alarm bells. of course maybe everyone agrees the external source should not be sending such data and it goes into prod anyway. but also its possible everyone agrees its worth putting some extra effort into not crashing the process in such a situation because there is too much risk.

trealira · 2025-06-15T12:42:09 1749991329

>calling unwrap() on some optional fields to convert them to non-option types then this should raise alarm bells

Yeah, definitely. And the equivalent without optional types, dereferencing a null pointer, might happen because they don't even realize it could be null in the first place. Not everyone writes "assert(ptr != 0)" every time they assume a pointer isn't null, because it happens frequently (if the code doesn't use references enough, which IIRC happens with Google).

When you have an option type, you're made aware of it explicitly, and calling `.unwrap()` should, like you said, raise alarm bells and make you think anout whether you actually want to crash the program.

hypeatei · 2025-06-15T10:54:49 1749984889

How is panicking the same as dereferencing a null pointer, which is undefined behavior?

judofyr · 2025-06-15T11:12:30 1749985950

In most situations panicking and deferencing a null pointer leads to the exact same scenario: The binary crashes. You can unwind and catch panics in Rust, but I’m not sure if that would have helped in this scenario as it might have immediately went directly into the fault code again.

However, I would assume that the presence of an «unwrap» would have been caught in code review, whereas it’s much harder to be aware of which pointers can be null in Java/C++.

junon · 2025-06-15T13:03:26 1749992606

> In most situations panicking and deferencing a null pointer leads to the exact same scenario: The binary crashes.

This is a false and dangerous misconception people seem to get wrong a lot of the time. There's no guarantee that's the case, especially when working in C where pointers are often subscripted.

It's the common behavior that a trap occurs but nothing dictates that's actually what will happen.

judofyr · 2025-06-16T21:35:47 1750109747

Yes! We’re in absolute agreement here! That’s why I said «most situations» and not «all situations».

kabdib · 2025-06-15T10:39:00 1749983940

you don't even need turing-completeness to write a bug that takes down prod :-)