fyi Yann LeCun, Chief AI Scientist at Meta, said: “To people who see the perform...

melenaboija · 2025-01-27T20:21:38 1738009298

Lol this is almost comical.

As if anyone riding this wave and making billions is not sitting on top of thousands of papers and millions of lines of open source code. And as if releasing llama is one of the main reasons we got here in AI…

basch · 2025-01-27T21:50:59 1738014659

I’m almost shocked this spooked the market as much as it did, as if the market was so blind to past technological innovation to not see this coming.

Innovation ALWAYS follows this path. Something is invented in a research capacity. Someone implements it for the ultra rich. The price comes down and it becomes commoditized. It was inevitable that “good enough” models became ultra cheap to run as they were refined and made efficient. Anybody looking at LLMs could see they were a brute forced result wasting untold power because they “worked” despite how much overkill they were to get to the end result. Them becoming lean was the obvious next step, now that they had gotten pretty good to the point of some diminishing returns.

ddalex · 2025-01-27T22:52:50 1738018370

sure, but what nobody expected how QUICKLY the efficiency progress has been - aviation took about 30 years to progress from "the rich" to "everybody", personal computers about 20 years (from 1980s to 2000s), I think the market expected to have at least 10 years of "rich premium" - not 2 years and get taken to the cleaners by the economic archenemy, China

basch · 2025-01-28T02:09:19 1738030159

The Google transformer paper was 2017. ChatGPT was the “we can give a version away of this for free.” Llama was “we can afford to give away the whole product for free to even the playing field.” Every tech giant comes out with a comparable product simultaneously. And now a hedge fund, not even a megacap company, can churn out a clone by hiring a small or medium size engineering team.

Really this should be an indictment of corporate bloat, having hundreds of thousand headcount companies distracted by performance reviews, shareholders, marketing, rebuilding the same product they launched two years ago under a new name.

whinendine · 2025-01-28T02:29:56 1738031396

Transformer paper was 2017

fuzztester · 2025-01-28T19:51:52 1738093912

>Really this should be an indictment of corporate bloat, having hundreds of thousand headcount companies distracted by performance reviews, shareholders, marketing, rebuilding the same product they launched two years ago under a new name.

Yeah.

There are some shorter words or acronyms for it though, roughly equivalent to your about 30-word paragraph above:

IBM DEC Novell Oracle MS Sun HP ... MBA , all in their worse days or incarnations or ...

timschmidt · 2025-01-28T02:17:19 1738030639

Anyone who's ever read Kurzweil isn't surprised.

XorNot · 2025-01-27T22:40:34 1738017634

The notion I now believe more fully is that the money people - managers, executives, investors and shareholders - like to hear about things in units they understand (so money). They don't understand the science, or the maths and in so much as they might acknowledge it exists it's an ambient concern: those things happen anyway (as far as they can tell), and so they don't know how to value them (or don't value them).

Because we saw, what a week ago the leading indicator that the money people were now feeling happy they were in charge which was that weird not-government US$500 billion investment in AI announcement. And we saw the same being breathlessly reported when Elon Musk founded xAI and had "built the largest AI computer cluster!"...as though that statement actually meant anything?

There was a whole heavily implied analogy going on of "more money (via GPUs) === more powerful AIs!" - ignoring any reality of how those systems worked, their scaling rules or the fact that inferrence tended to run on exactly 1 GPU.

Even the internet activist types bought into this, because people complaining about image generators just could not be convinced that the Stable Diffusion models ran locally on extremely limited hardware (the number of arguments where people would discuss this and imply a gate while I'm sitting their with the web GUI in another window on my 4 year old PC).

Groxx · 2025-01-28T02:23:38 1738031018

I would generally agree, but the market isn't rational about the future prospects of a company. It's rational about "can I make money off this stock" and nothing else matters in the slightest.

Riding hype, and dumping at the first sign of issues, follows that perfectly well.

j-krieger · 2025-01-28T16:04:37 1738080277

> I’m almost shocked this spooked the market as much as it did, as if the market was so blind to past technological innovation to not see this coming.

Regulatory capture only benefits you nationally. You might even get used to it.

pilooch · 2025-01-27T20:25:01 1738009501

Sure but it's good to recognize Meta never stopped publishing even after Openai and deepmind most notably stopped sharing the good sauce. From clip to dinov2 and llama series, it's a serious track to be remembered.

AnimeLife · 2025-01-27T20:54:13 1738011253

But there is a big difference, llama is still way behind chatgpt and one of the key reasons to open source it could have been to use open source community to catch up with chatgpt. Deepseek on contrary is already at par with chatgpt.

llm_trw · 2025-01-27T21:05:08 1738011908

Llama is worse than gpt4 because they are releasing models 1/50th to 1/5th the size.

R1 is a 650b monster no one can run locally.

This is like complaining an electric bike only goes up to 80km/h

thot_experiment · 2025-01-27T23:44:36 1738021476

R1 distills are still very very good. I've used Llama 405b and I would say dsr1-32b is about the same quality, or maybe a bit worse (subjectively within error) and the 70b distill is better.

potamic · 2025-01-28T05:11:32 1738041092

What hardware do you need to be able to run them?

llm_trw · 2025-01-28T07:21:39 1738048899

The distils run on the same hardware as the llama models they are based on llama models anyway.

The full version... If you have to ask you can't afford it.

kandesbunzler · 2025-01-27T20:59:55 1738011595

Yea no shit, that's because meta is behind and Noone would care about them if it wasn't open source

troyvit · 2025-01-27T21:47:30 1738014450

Right, so it sounds like it's working then given how much people are starting to care about them in this sphere.

We can laugh at that (like I like to do with everything from Facebook's React to Zuck's MMA training), or you can see how others (like Deepseek and to a lesser extent, Mistral, and to an even lesser extent, Claude) are doing the same thing to help themselves (and each other) catch up. What they're doing now, by opening these models, will be felt for years to come. It's draining OpenAI's moat.

fragmede · 2025-01-28T01:06:26 1738026386

How's that old chestnut go? "First they laugh at us..."?

Herring · 2025-01-27T20:28:27 1738009707

There's no need to read it uncharitably. I'm the last person you can call a FB fan, I think overall they're a strong net negative to society, but their open source DL work is quite nice.

baxtr · 2025-01-27T20:37:55 1738010275

Just to add on the positive side: their quarterly meta threats report is also quite nice.

A4ET8a8uTh0_v2 · 2025-01-27T21:11:38 1738012298

This. Even their less known work is pretty solid[1] ( used it the other day and was frankly kinda amazed at how well it performed under the circumstances ). Facebook/Meta sucks like most social madia does, but, not unlike Elon Musk, they are on the record of having some contributions to society as a whole.

[1]https://github.com/facebook/zstd

A4ET8a8uTh0_v2 · 2025-01-27T21:07:58 1738012078

<< And as if releasing llama is one of the main reasons we got here in AI…

Wait.. are you saying it wasn't? Just releasing it in that form was a big deal ( and heavily discussed on HN, when it happened ). Not to mention, a lot of the work that followed on llama partly because it let researches and curious people dig deeper into internals.

blackeyeblitzar · 2025-01-27T21:13:08 1738012388

Yann LeCun also keeps distorting what open source is. Neither Llama nor DeepSeek are open source, and they never were. Releasing weights is not open source - that’s just releasing the final result. DeepSeek does use a more permissive license than Llama does. But they’re not open source because the community does not have the necessary pieces to reproduce their work from scratch.

Open source means we need to be able to reproduce what they’ve built - which means transparency on the training data, training source code, evaluation suites, etc. For example, what AI2 does with their OLMo model:

https://allenai.org/blog/olmo2

Onawa · 2025-01-27T21:48:26 1738014506

Deepseek R1 is the closest thing we have to fully open-source currently. Open enough that Huggingface is recreating R1 completely out in the open. https://github.com/huggingface/open-r1

blackeyeblitzar · 2025-01-28T00:39:36 1738024776

What they’re recreating is the evidence that some of the techniques work. But they’re starting with R1 as the input into those steps, not starting from scratch. I don’t think their work includes creating a base model.

serjester · 2025-01-28T01:37:56 1738028276

The fundamental problem is that AI depends on massive amounts of IP theft. I’m not going to argue if that’s right or wrong, but without it we won’t even have open weights models.

9rx · 2025-01-28T14:08:46 1738073326

IPv4 or IPv6?

bli940505 · 2025-01-27T20:38:46 1738010326

I don’t buy this at all. If DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same? Companies making proprietary models have the advantage of using w/e is out there from the open source community AND the proprietary research they have been working on for years.

dragonwriter · 2025-01-27T20:48:21 1738010901

> If DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same?

They can “profit” (benefit in product development) from it.

They just can't profit (return gains to investors) much from it, because that requires a moat rather than a market free for all that devolves into price competition and drives market clearing price down to cost to produce.

__MatrixMan__ · 2025-01-27T20:51:21 1738011081

Yes but in proprietary research you've got fewer peers to bounce ideas off of, and you've got extra constraints to deal with re: coming up with something that's useful in tandem with whatever other proprietary bits are in your stack.

All that cloak and dagger stuff comes at a cost, so it's only worth paying if you think you can maintain your lead while continuing to pay it. If the open source community is able to move faster because they are more focused on results than you are, you might as well drop the charade and run with them.

It's not clear that that's what will happen here, but it's at least plausible.

14u2c · 2025-01-27T21:27:28 1738013248

> DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same?

DeepSeek did something legitimately innovative with their addition of Group Relative Policy Optimization. Other firms are certainly free to innovate as well.

roncesvalles · 2025-01-27T20:40:57 1738010457

That argument doesn't go anywhere. It's like asking, if the Chinese could do it, why couldn't the Americans?

They just didn't.

bli940505 · 2025-01-27T20:52:50 1738011170

But it sounds like, from that quoted statement, that LeCun from Meta thinks “open sourced work” is why China was able to surpass (or at least compete with) American AIs. Which sounds like a lame excuse for Meta.

Vegenoid · 2025-01-27T22:01:03 1738015263

Putting too much thought into the statement Meta's chief AI scientist made about how the new AI innovation is actually because of Meta is probably not going to be fruitful.

philosopher1234 · 2025-01-27T21:12:22 1738012342

I think we should hold ourselves to a higher standard than this. I don’t see why we couldn’t apply reasoning to this question just like any other.

arccy · 2025-01-27T20:50:25 1738011025

sunk cost fallacy / tunnel vision of their existing approaches.

reissbaker · 2025-01-27T21:21:24 1738012884

If training runs are now on the $6MM/run for SOTA model scale, I think on the contrary: closed labs are screwed, in the same way that Linux clobbered Windows for server-side deployments. Why couldn't Windows just copy whatever Linux did? Well, the codebases and research directions diverged, and additionally MS had to profit off of licensing, so for wide-scale deployments Linux was cheaper and it was faster to ship a fix for your problem by contributing a patch than it was to beg and wait for MS... Causing a virtuous cycle (or, for Microsoft, a vicious cycle) where high-tech companies with the skills to operate Linux deployments collaborated on improving Linux, and as a result saw much lower costs for their large deployments, while also having improved flexibility, which then incentivized more companies to do the same. The open models are becoming much cheaper, and if you want something different you can just run your own finetune on your own hardware.

Worse for the proprietary labs is how much they've trumpeted safety regulations. They can't just release a model without extensive safety testing, or else their entire regulatory push falls apart. DeepSeek can just post a new model to Hugging Face whenever they feel like it — most of their Tiananmen-style filtering isn't at the model level, it's done manually at their API layer. Ditto for anyone running finetunes. In fact, circumventing filtering is one of the most common reasons to run a finetune... A week after R1's release, there are already uncensored versions of the Llama and Qwen distills published on HF. The open source ecosystem publishes faster.

With massively expensive training runs, you could imagine a world where model development remained very centralized and thus the few big labs would easily fend off open-source competition: after all, who would give away the results of their $100MM investment? Pray that Zuck continues? But if the training runs are cheap... Well, there are lots of players who might be interested in cutting out the legs from the centralized big labs. High Flyer — the quant firm that owns DeepSeek — no longer is dependent on OpenAI for any future trading projects that use LLMs, for the cost of $6MM... Not to mention being immune from any future U.S. export controls around access to LLMs. That seems very worthwhile!

As LeCun says: DeepSeek benefitted from Llama, and the next version of Llama will likely benefit from DeepSeek (i.e. massively reduced training costs). As a result, there's incentive for both companies to continue to publish their results and techniques, and that's bad news for the proprietary labs who need the LLMs themselves to be profitable and not just the application of LLMs to be profitable... Because the open models will continue eating their margins away, at least for large-scale deployments by competent tech companies (i.e. like Linux on servers).

sigmaisaletter · 2025-01-27T22:32:37 1738017157

> Why couldn't Windows just copy whatever Linux did?

They kinda did: https://en.wikipedia.org/wiki/Azure_Linux

tsimionescu · 2025-01-28T08:59:20 1738054760

Azure Linux is Linux. Microsoft is one of the biggest contributors to Linux in general, in terms of commits/release, and has been for a lot of years now. That doesn't mean Windows is doing what Linux did - Windows is largely still entirely different from Linux at both the kernel and user's pace level, and improvements in one have little to no bearing on the other.

tucnak · 2025-01-27T20:28:09 1738009689

I'm still not sure why they keep LeCun at Facebook; his single most-cited contribution to the field in 2024 has been with NYU[0], not Facebook. What is his role at Facebook exactly, has he explained it? I recall him making all the wrong predictions in 2023 what's changed? Chollet is similarly a mystery to me; it feels like these guys were busy riffing CNN's when the Transformer came about and since then have been trying to far-out in search of gold.

[0]: https://arxiv.org/abs/2406.16860

juunpp · 2025-01-28T02:49:02 1738032542

Muddling the term 'open source' is one of his latest achievements, for example.

HarHarVeryFunny · 2025-01-27T22:48:12 1738018092

I'm also a bit unclear on why LeCun is so well regarded. I've nothing against him, and his opinions shared on Twitter seem eminently sensible, but at the end of the day it seems his main accomplishment (and/or credit assignment) was inventing CNNs back in the 80's and using them for reading handwriting on checks.

Looking back at the PDP handbook, it's not even clear that LeCun deserves the credit for CNNs, and he himself gives credit for the core "weight sharing" idea to Rumelhart.

Chollet's claim to fame seems to be more as creator of Keras than researcher, which has certainly been of great use to a lot of people. He has recently left Google and is striking out to pursue his own neuro-symbolic vision for AGI. Good luck to him - seems like a nice and very smart guy, and it's good to see people pursuing their own approaches outside of the LLM echo chamber.

madeofpalk · 2025-01-27T21:59:31 1738015171

What makes "open source" DeepSeek fundamentally different that is a marvel that it surpassed proprietary models?

adventured · 2025-01-27T23:14:47 1738019687

It's not and it hasn't surpassed GPT. A lot of that is headline hype.

They literally used GPT and Llama to help build DeekSeek, it responds thinking that it's GPT in countless queries (which people have been posting screenshots of). They 'cheated' exactly as Musk did to build xAI's model/s. So much of this is laughable scaremongering and it's absolutely not an accomplishment of large consequence.

It's a synth LLM.

girvo · 2025-01-28T01:26:56 1738027616

Though it is still a fascinating result that shows that the giant frontier models could be made much more efficient, and how to do so.

pkkkzip · 2025-01-27T20:40:03 1738010403

honestly reads like someone trying to justify his massive salary to his boss who is realizing he can just hire someone for 30x less money.

isn't LeCun basically admitting that he and his team didn't have the creative insight to utilize current research and desperately trying to write off the blindside with exceptionalism?

not a good look tbh

nine_k · 2025-01-27T21:14:53 1738012493

It's like saying that a diesel engine is 6x more efficient than a steam engine, so the guys who spent time working on steam engines just wasted their time and money.

The thing is that the steam engine guys researched thermodynamics and developed the mechanics and tooling which allowed the diesel engine to be invented and built.

Also, for every breakthrough like DeepSeek which is highly publicized, there are dozens of fizzled attempts to explore new ideas which mostly go unnoticed. Are these wasted resources, too?

9rx · 2025-01-28T15:24:37 1738077877

> Are these wasted resources, too?

Given your take, this is a meaningless question, no?

As you point out, all resource usage that lead up to the creation of the diesel engine were necessary preconditions. While one might be able to imagine a parallel universe where the diesel engine was created in another way without all the things in between that might feel like a waste, that is not this universe. In this one, it took what it took.

Same goes for AI. That AI researcher had to eat that sandwich double wrapped in plastic, subsequently placed in another plastic bag in order to get to where he got. Which might feel like a "waste of resources". I am sure you can easily imagine a parallel universe where he didn't eat something that used up so much plastic. But that was the precondition necessary in this universe.

So, ultimately, either everything is a waste of resources or nothing is. And there is no meaning in trying to find a distinction between those two.

pkkkzip · 2025-01-27T23:35:46 1738020946

liamwire · 2025-01-28T06:41:46 1738046506

Would this extrapolate to the thousands of lightbulb prototypes it took to arrive at the first working one? Rinse repeat for your preferred innovation.

Resource allocation in this context isn’t at all binary.

HarHarVeryFunny · 2025-01-27T22:31:31 1738017091

LeCun has nothing to do with LLamA ... that was built by Meta's GenAI group.

LeCun is in a different part of the organization - FAIR (FaceBook AI Research), and isn't even the head of that. He doesn't believe that LLMs will lead to AGI, and is pursuing a different line of research.

jstummbillig · 2025-01-27T21:23:59 1738013039

Meh. It's not as if OpenAI is unable to access open source. The delta is not in open source but in DeepSeek talent.