Hacker News new | past | comments | ask | show | jobs | submit login

> The whole training process took about 7/14 days on a cluster of 16/32 nodes for 1.5B/7B model, each equipped with 8 Nvidia A100 (40GB) GPUs.



For reference, this is about ~$110k for a training run to beat Dall-E 3, assuming a (somewhat expensive) $1.30/hr for an A100 40GB.

The former CEO of Stability estimated the Dall-E 2 training run cost as about $1MM: https://x.com/EMostaque/status/1547183120629342214


How does generating images with 90% less pixels count as beating DALL•E?


There are plenty of models around that will reliably upscale an image. That's not the hard part.


Even the latest AI up scalers will have a 384x384 look pretty terrible when put against e.g SDXL @ 1024x1024 native. It's just too little to work on.


I think they're referring to specific benchmarks


just following this whole story lightly but is there reason to believe (or not) this data coming from them?


At least for R1, folks more technical than me said the optmizations DeepSeek made doesn't make sense unless they were gimped by limited hardware. But the caveate being the limited hardware weren't actually super limited - NVIDIA exported gimped but still powerful hardware that was considerd legal under export controls - but DeepSeek engineers found optimizations to basically unlock full compute power. Maybe something similar.


I used more to fine-tune SDXL and it looked horrible.


I believe that is University lab level of compute, right?

It is so nice to see that you don't need tech oligarch level of compute for stuff like this.


A quick research shows a 8 GPU A100 80GB server can easily be 120-150k a pop. So you are looking a few million in hardware costs if you wanted these on prem. The energy cost for the training is insignificant from my calculations.

So yeah, I imagine this is not a big deal for large, well funded, universities.

Biggest issue with these is ROI (obviously not real ROI) as GPUs have been progressing so fast recently for AI usecases that unless you are running them 24/7 what's the point of having them onprem.


Ye I mean, you don't have to do it, just that you can, can be enough.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: