> The whole training process took about 7/14 days on a cluster of 16/32 nodes fo...

reissbaker · 2025-01-27T17:48:14 1738000094

For reference, this is about ~$110k for a training run to beat Dall-E 3, assuming a (somewhat expensive) $1.30/hr for an A100 40GB.

The former CEO of Stability estimated the Dall-E 2 training run cost as about $1MM: https://x.com/EMostaque/status/1547183120629342214

just-ok · 2025-01-27T18:00:28 1738000828

How does generating images with 90% less pixels count as beating DALL•E?

1024core · 2025-01-27T18:05:26 1738001126

There are plenty of models around that will reliably upscale an image. That's not the hard part.

jug · 2025-01-27T18:33:41 1738002821

Even the latest AI up scalers will have a 384x384 look pretty terrible when put against e.g SDXL @ 1024x1024 native. It's just too little to work on.

culi · 2025-01-27T18:31:41 1738002701

I think they're referring to specific benchmarks

carimura · 2025-01-27T18:39:54 1738003194

just following this whole story lightly but is there reason to believe (or not) this data coming from them?

maxglute · 2025-01-27T21:02:52 1738011772

At least for R1, folks more technical than me said the optmizations DeepSeek made doesn't make sense unless they were gimped by limited hardware. But the caveate being the limited hardware weren't actually super limited - NVIDIA exported gimped but still powerful hardware that was considerd legal under export controls - but DeepSeek engineers found optimizations to basically unlock full compute power. Maybe something similar.

buyucu · 2025-01-27T21:17:56 1738012676

I used more to fine-tune SDXL and it looked horrible.

rightbyte · 2025-01-27T17:52:49 1738000369

I believe that is University lab level of compute, right?

It is so nice to see that you don't need tech oligarch level of compute for stuff like this.

bangaladore · 2025-01-27T18:31:09 1738002669

A quick research shows a 8 GPU A100 80GB server can easily be 120-150k a pop. So you are looking a few million in hardware costs if you wanted these on prem. The energy cost for the training is insignificant from my calculations.

So yeah, I imagine this is not a big deal for large, well funded, universities.

Biggest issue with these is ROI (obviously not real ROI) as GPUs have been progressing so fast recently for AI usecases that unless you are running them 24/7 what's the point of having them onprem.

rightbyte · 2025-01-27T19:40:47 1738006847

Ye I mean, you don't have to do it, just that you can, can be enough.