Hacker Newsnew | past | comments | ask | show | jobs | submit | eden-u4's commentslogin

This is basically a diffusion model: start from a random seed, use a generative process to transform it into something

mine too, but none was such a dick. also, anything related to school (particularly at a young age), is not viewed as something to boast of (at least in my experience in italy, serbia and portugal).


Also, they have all the infra to actually use all that tpus advantage (as well as actual researchers, contrariwise to OpenAI)


That will be less of a problem since OAI can spill out to other providers as needed if their own capacity is under high utilization. They already use coreweave, aws, azure, etc. Google doesn't do that as far as I know and don't see why they would, so they are stuck eating the capacity planning.


what type of experiments did you run in less than a week to be so dismissing? (seriously curious)


JEPA has been around for quite a while now, so many labs have had time to assess its viability.


Jepa wasn't born last week


Not OP, but I guess based on your comment:

> But the extent and the way in which Zig specifically puts it to use -- which includes, but is not limited to, how it is used to replace other features that can then be avoided (and all without macros) -- is unprecedented.

That MrWhite wanted to knkw an example of Zig's comptime that is not merely a "macro", rather the usage as a replacement of other features (I guess more complex..)

PS just interested in zig, I'd like some pointer to these cool feature :)


wow, thanks for this long explanation.


so they are basically using a similar idea to that of a stirling engine in thermoelectric generator or they use a different mechanism to produce energy?


Two materials (often n-type and p-type semiconductors) are joined at two junctions, one junction is heated and the other cooled. The temperature difference makes charge diffuse from the hot side toward the cold side, and this charge is what turns into the seebeck voltage they describe. It was just very hard to get anything meaningful out of this because you can't easily get such a temperature difference. If you've read of the peltier effect, it's the same thing as this, just in reverse.


They both use heat. But stirling converts mechanically, whereas STEM converts energy to electricity «directly».


No open model/weights?


Not only they do not release models/weights. They don't even tell the size of the models!

The linked whitepaper is pretty useless, and I am saying as a big fan of diffusion-transformers-for-not-just-images-or-videos approach.

Also, Gemini Diffusion ([1]) is way better at coding than Mercury offering.

1. https://deepmind.google/models/gemini-diffusion/


I don't have much experience with ROCm for large trainings, but NVIDIA is still shit with driver+cuda version+other things. The only simplification is due to ubuntu and other distros that already do the heavy lift by installing all required components, without much configuration.


Oh I'm sure. The thing is that with AMD I have the same luxury, and the wretched thing still doesn't work, or has regressions.


On Ubuntu, in my experience, installing the .deb version of the CUDA toolkit pretty much "just works".


why don't you ask the model about the shrinked system prompt and the original system prompt? in this way you can infer whether the same relevant informations are "stored" in the hidden state of the model.

Or better yet, check directly the hidden state difference between a model feed with the original prompt and one with the shrinked prompt.

This should avoid remove the randomness of the results.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: