More

eden-u4 · 2026-02-25T13:03:56 1772024636

This is basically a diffusion model: start from a random seed, use a generative process to transform it into something

eden-u4 · 2025-12-28T15:08:17 1766934497

mine too, but none was such a dick. also, anything related to school (particularly at a young age), is not viewed as something to boast of (at least in my experience in italy, serbia and portugal).

eden-u4 · 2025-12-02T19:39:45 1764704385

Also, they have all the infra to actually use all that tpus advantage (as well as actual researchers, contrariwise to OpenAI)

laluser · 2025-12-02T20:13:27 1764706407

That will be less of a problem since OAI can spill out to other providers as needed if their own capacity is under high utilization. They already use coreweave, aws, azure, etc. Google doesn't do that as far as I know and don't see why they would, so they are stuck eating the capacity planning.

eden-u4 · 2025-11-18T09:31:10 1763458270

what type of experiments did you run in less than a week to be so dismissing? (seriously curious)

hodgehog11 · 2025-11-18T10:10:55 1763460655

JEPA has been around for quite a while now, so many labs have had time to assess its viability.

byyoung3 · 2025-11-19T12:48:14 1763556494

Jepa wasn't born last week

eden-u4 · 2025-11-08T21:25:07 1762637107

Not OP, but I guess based on your comment:

> But the extent and the way in which Zig specifically puts it to use -- which includes, but is not limited to, how it is used to replace other features that can then be avoided (and all without macros) -- is unprecedented.

That MrWhite wanted to knkw an example of Zig's comptime that is not merely a "macro", rather the usage as a replacement of other features (I guess more complex..)

PS just interested in zig, I'd like some pointer to these cool feature :)

eden-u4 · 2025-11-06T07:54:16 1762415656

wow, thanks for this long explanation.

eden-u4 · 2025-08-30T04:29:15 1756528155

so they are basically using a similar idea to that of a stirling engine in thermoelectric generator or they use a different mechanism to produce energy?

vasco · 2025-08-30T08:15:52 1756541752

Two materials (often n-type and p-type semiconductors) are joined at two junctions, one junction is heated and the other cooled. The temperature difference makes charge diffuse from the hot side toward the cold side, and this charge is what turns into the seebeck voltage they describe. It was just very hard to get anything meaningful out of this because you can't easily get such a temperature difference. If you've read of the peltier effect, it's the same thing as this, just in reverse.

kalx · 2025-08-30T05:07:30 1756530450

They both use heat. But stirling converts mechanically, whereas STEM converts energy to electricity «directly».

eden-u4 · 2025-07-07T16:11:19 1751904679

No open model/weights?

krasin · 2025-07-07T17:28:36 1751909316

Not only they do not release models/weights. They don't even tell the size of the models!

The linked whitepaper is pretty useless, and I am saying as a big fan of diffusion-transformers-for-not-just-images-or-videos approach.

Also, Gemini Diffusion ([1]) is way better at coding than Mercury offering.

1. https://deepmind.google/models/gemini-diffusion/

eden-u4 · 2025-07-05T09:28:45 1751707725

I don't have much experience with ROCm for large trainings, but NVIDIA is still shit with driver+cuda version+other things. The only simplification is due to ubuntu and other distros that already do the heavy lift by installing all required components, without much configuration.

npteljes · 2025-07-05T10:42:45 1751712165

Oh I'm sure. The thing is that with AMD I have the same luxury, and the wretched thing still doesn't work, or has regressions.

int_19h · 2025-07-05T18:27:56 1751740076

On Ubuntu, in my experience, installing the .deb version of the CUDA toolkit pretty much "just works".

eden-u4 · 2025-05-16T06:03:34 1747375414

why don't you ask the model about the shrinked system prompt and the original system prompt? in this way you can infer whether the same relevant informations are "stored" in the hidden state of the model.

Or better yet, check directly the hidden state difference between a model feed with the original prompt and one with the shrinked prompt.

This should avoid remove the randomness of the results.