Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd be very open to paying for a "license" of ChatGPT that I run on my own horde of nVidia GPU's. Something like $2k per year for a "secured" personal license would be awesome. The largest line item for all stable diffusion / AI things is compute cost.


How big is your horde of GPU's? I'm gonna guess that a single instance of ChatGPT is ~2400GB of weights, so you're gonna be needing 105 RTX4090's costing a whopping $210k to run the thing...

And swapping data between SSD and video RAM to run it on smaller cheaper hardware probably isn't gonna be possible because transformer architectures typically need to reuse the weights once for every token emitted, so even the fastest SSD's would be too slow.

And if you have 105 RTX4090's, you're probably gonna run afoul of nvidias 'no datacenter use' terms. So you'll have to splash out for models allowed in datacenters, which are 5x the price or so for the same amount of compute.


> ChatGPT is ~2400GB of weights

Pretty sure it is just the GPT3 DaVinci model, which is 175B parameters, so approximately 700GB, or about half that at half precision.


GPT3 DaVinci has a context window of 2048 tokens. ChatGPT seems to have a context window of 8192 tokens (from testing how far back it can remember).

To me, that suggests the model is probably at least 4x larger, possibly 16x (a bunch of layers scale with the square of the window size).

Obviously, other bits of the model design may have changed to reduce parameter count.


How far back was it able to remember in your tests?


~8000 tokens. You can test yourself with a query of the form:

> "Please remember the word Dog. I will ask you for it later."

> "Please say the letter A 500 times."

> "Please say the letter A 500 times."

> "Please say the letter A 500 times."

...

> "Please say the letter A 500 times."

> "Please say the letter A 500 times."

> "What was the word I asked you to remember?"

It appears both user input and AI generated output equally uses the token window. It's hard to measure precisely because the tokenizer they use isn't published, but I make the assumption that ~1 word = 1 token.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: