It's a 7B "unified model" LLM/VLM (not a diffusion model!) that out-benchmarks D...

lenerdenator · 2025-01-27T17:34:20 1737999260

> restricts military use

I'm sure the powers-that-be will absolutely pay attention to that clause.

operator-name · 2025-01-27T19:50:08 1738007408

You could say the same for the GPL, yet it's wording is enough to curb adoption from corporations.

Large organisations like the military have enough checks and balances to avoid these kind of licences with a 10ft pole.

qwertox · 2025-01-27T18:17:40 1738001860

Yeah, they should! Not that the missile then makes a 180° turn to "return to sender" because it noticed that the target is a Chinese military base.

culi · 2025-01-27T18:28:12 1738002492

The code is open sourced

jprete · 2025-01-27T18:32:32 1738002752

There's no meaningful inspection of LLM code, because the real code is the model weights.

mschoening · 2025-01-27T18:32:28 1738002748

See Sleeper Agents (https://arxiv.org/abs/2401.05566).

cosmojg · 2025-01-27T20:17:29 1738009049

Who in their right mind is going to blindly take the code output by a large language model and toss it on a cruise missile? Sleeper agents are trivially circumvented by even a modicum of human oversight.

carimura · 2025-01-27T18:32:21 1738002741

but what about training data?

culi · 2025-01-28T07:23:37 1738049017

The weights and data pipeline are open sourced and described explicitly in the paper they published. The non-reasoning data isn't nearly as interesting as the reasoning data though

Aaronstotle · 2025-01-27T17:36:04 1737999364

How are these licenses enforceable?

reissbaker · 2025-01-27T17:39:43 1737999583

Lawsuits, but it's mainly just CYA for DeepSeek; I doubt they truly are going to attempt to enforce much. I only mentioned it because it's technically not FOSS due to the content restrictions (but it's one of the most-open licenses in the industry; i.e. more open than Llama licenses, which restrict Meta's largest competitors from using Llama at all).

jiggawatts · 2025-01-28T00:51:18 1738025478

I've always wondered why nobody has tried to scale image-generation models to modern LLM sizes, such as 200-500B parameters instead of 1-7B...