Why are there actually only a few people in the world able to do this? The basic...

libraryofbabel · 2025-07-12T01:11:59 1752282719

The basic concept plus a lot of money spent on compute and training data gets you pretraining. After that to get a really good model there’s a lot more fine-tuning / RL steps that companies are pretty secretive about. That is where the “smart decisions” and knowledge gained by training previous generations of sota models comes in.

We’d probably see more companies training their own models if it was cheaper, for sure. Maybe some of them would do very well. But even having a lot of money to throw at this doesn’t guarantee success, e.g. Meta’s Llama 4 was a big disappointment.

That said, it’s not impossible to catch up to close to state-of-the-art, as Deepseek showed.

ivape · 2025-07-12T23:51:20 1752364280

I’d also add that no one predicted the emergent properties of LLMs as they followed the scaling laws hypothesis. GPT showed all kinds of emergent stuff like reasoning/sentiment analysis when we went up an order of magnitude on the number of parameters. We don’t don’t actually know what would emerge if we trained a quadrillion param model. SOTA will always be mysterious until we reach those limits, so, no, companies like Cursor will never be on the frontier. It takes too much money and requires seeking out things we haven’t ever seen before.

seanhunter · 2025-07-12T05:27:27 1752298047

Why are there so few people in the world able to run 100m in sub 10s?

The basic concept is out there: run very fast.

Lots of people running every day who could be poached. No shortage of those I assume.

Good running shoes still seem the most important to me.

sideshownz · 2025-07-12T01:07:19 1752282439

1. Cost to hire is now prohibitive. You're competing against companies like Meta paying tens of millions for top talent.

2. Cost to train is also prohibitive. Grok data centre has 200,000 H100 Graphics cards. Impossible for a startup to compete with this.

tonyhart7 · 2025-07-12T07:31:03 1752305463

"Impossible for a startup to compete with this."

its funny to me since xAI literally the "youngest" in this space and recently made an Grok4 that surpass all frontier model

it literally not impossible

lukan · 2025-07-12T08:14:03 1752308043

I mean, that's a startup backed by the richest man in the world who also was engaged with OpenAI in the beginning.

I assume startup here means the average one, that has a little bit less of funding and connections.

ascorbic · 2025-07-12T15:58:00 1752335880

The richest man in the world, who could also divert the world's biggest GPU order from his other company

tonyhart7 · 2025-07-12T08:42:11 1752309731

so is Meta(fb) and Apple but that doesn't seem to be the case

money is "less" important factor, I don't say they don't matters but much less than you would think

re-thc · 2025-07-12T17:31:34 1752341494

xAI isn’t young. The brand, maybe. Not the actual history / timeline. Tesla was working on AI long ago.

xAI was just spun out to raise more money / fix the x finance issues.

ako · 2025-07-12T08:22:47 1752308567

Most startups don't have Elon Musk's money.

riwsky · 2025-07-12T05:07:55 1752296875

Because it’s not about “who can do it”, it’s about “who can do it the best”.

It’s the difference between running a marathon (impressive) and winning a marathon (here’s a giant sponsorship check).

crystal_revenge · 2025-07-13T01:09:25 1752368965

There are plenty of people theoretically capable of doing this, I secretly believe some of the most talented people in this space are randos posting on /r/LocalLlama.

But the truth is to have experience building models at this scale requires working at a high level job at a major FAANG/LLM provider. Building what Meta needs is not something you can do in your basement.

The reality is the set of people who really understand this stuff and have experience working on it at scale is very, very small. And the people in this space are already paid very well.

bluelightning2k · 2025-07-13T11:45:28 1752407128

It's a staggeringly bad deal. It's a hugely expensive task where unless you are the literal best in the world, you would never even see any usage. And even for those who are BOTH best and well known they have to be willing to lose billions on repeat with no end in sight.

It's very very rare to have winner takes all to such an extreme degree as code llm models

nmfisher · 2025-07-15T03:11:49 1752549109

I don't think it's literally "winner takes all" - I regularly cycle between Gemini, DeepSeek and Claude for coding tasks. I'm sure any GPT model would be fine too, and I could even fall back to Qwen in a pinch (exactly what I did when I was in China recently with no ability to access foreign servers).

Claude does have a slight edge in quality (which is why it's my default) but infrastructure/cost/speed are all relevant too. Different providers may focus on one at the expense of the others.

One interesting scenario where we could end up is using large hosted models for planning/logic, and handing off to local models for execution.

phillipcarter · 2025-07-12T01:01:14 1752282074

I'd recommend reading some of the papers on what it takes to actually train a proper foundation model, such as the Llama 3 Herd of Models paper. It is a deeply sophisticated process.

Coding startups also try to fine-tune OSS models to their own ends. But this is also very difficult, and usually just done as a cost optimization, not as a way to get better functionality.

vachina · 2025-07-12T09:45:51 1752313551

You need a person that can hit the ground running. Compute for LLM is extremely capital intensive and you’re always racing against time. Missing performance targets can mean life or death of the company.