Hacker News new | past | comments | ask | show | jobs | submit login
Nvidia's latest AI PC boxes sound great – for data scientists with $3k to spare (theregister.com)
44 points by rntn 18 days ago | hide | past | favorite | 55 comments



Memory bandwidth is quite low for the price. $3,000 for 128GB @ 273GB/s.

You can buy a Mac Studio with an M4 Max for $3,500. 128GB unified memory @ 546 GB/s.

You're also getting a much faster CPU and more usable daily computer.

I suppose if you're a CUDA developer, this thing is probably better though I doubt you'd be training anything worthwhile on a computer this weak. Nvidia advertises the DGX Spark as a machine that mimics very large DGX clusters so the environment is the same. But in terms of hardware specs, it's very disappointing for $3,000.

DGX Station is another beast. It's Blackwell Ultra with 288GB HBM3e and 496GB LPDDR5X. I'm guessing $150k - $200k.


I love MacBooks and Mac machines however they aren't cut out for specialised work.

I have a M4 Max with 128 GB RAM. If I play Civ VI (a game released in 2016) on it for a few hours without limiting the FPS it will heat up till it turns itself off.

It's not cut out for heavy loads consistently like games or crypto mining. It's cut out for a heavy loads like Xcode compiling for a few minutes then back to editing text.

My gaming machine which has poorer performance (in fps terms) and is equipped with AMD 3700 and a 2080 Super can play Civ VI indefinitely without breaking a sweat.


Couple of things: This must be new to the M4? I've played Civ VI for hours and hours on an M1 Max without issue.

If someone is looking at this Nvidia box, they're likely fine with a desktop footprint, in which case they'd be looking at the Mac Studio which should not have any thermal issues whatsoever. I'm guessing you're on a laptop?

If they are insistent on a laptop format, you can alleviate overheating issues pretty easily with some thermal pads and running the laptop on a cooling base when you're running heavy operations:

https://www.youtube.com/watch?v=IACHo5y9Los


That's pretty surprising and breaks my mental model of how these chips perform. Do you think that's because they don't have the raw FLOPs, something inefficient in the Apple/Metal rendering pipeline, something about Civ VI and how it was converted (i.e. x86 or DirectX emulation), or something else entirely?

I've had no trouble with a (base) M4 mini for regular dev work, though I compile remotely and haven't played any games on it.


FWIW I also just started seeing comments from other devs on my team today about their M4 mini's thermally shutting down during a compile run in Android Studio frequently.

Shocking to say the least imo.


Did you change the fan settings? Default is for fans to be quiet. If you go to energy management in settings you can choose “more power” or something instead of automatic. Your Mac will be louder but throttle less.


I’m curious if Civilization VII has the same issue. Baldur’s Gate 3 was released around the same timeframe as Civ 6 and has some pathological behavior on higher-core-count machines.


They’re not playing civ vi. They’re using the gpu and memory exclusively. When I’m running deepseek on my M1 Max it barely heats up.


Only looking at memory bandwidth as a measure of performance give you an incomplete picture. You also need to look at how much of that bandwidth your processor (CPU, GPU, NPU, etc.) can actually consume because it can be far less than the memory modules are capable of.


You can also get an Epyc 9115 for $800, motherboard for $640, and 12 16-GiB ddr5-6400 dims for $1400, that gives you 614.4 GiB/sec, for around $2800. You may also want to add in a small GPU to do prompt processing (inference on a CPU is memory bandwidth bound, prompt processing is processing bound).


How does CPU-based inference compare to GPU-based inference, performance-wise? And aren’t these machines likely to be used for training?


In which world do you get 614GiB/s memory bandwidth with an Epyc?

I think the best you can dream of is 480.0 GB/s, so 447 GiB/s.


I was going by the number of memory channels the CPU spec says it supports (12). But apparently I was wrong, as that gets bottlenecked by the number of CCDs on the chip. In which case you would need to go with a much higher end epyc processor, and then there are other limits. So much for napkin math


Mac Studio is virtually useless if you work with long contexts. You will end up with having to wait minutes before the first token comes out.


As someone who doesn’t know that much about AI performance, why is that?


The M series CPUs have very good memory bandwidth and capacity which lets them load in the billions of weights of a large LLM quickly.

Because the bottleneck to producing a single token is typically the time taken to get the weights into the FPU macs perform very well at producing additional tokens.

Producing the first token means processing the entire prompt first. With the prompt you don't need to process one token before moving on to the next because they are all given to you at once. That means loading the weights into the FPU onlu once for the entire prompt, rather than once for every token. That means the bottleneck isn't the time to get the weights to the FPU, it's the time taken to process the tokens.

Macs have comparatively low compute performance (M4 Max runs at about 1/4 the FP16 speed of the small nvidia box in this article, which itself is roughly 1/4 the speed of a 5090 GPU).


Time point 1 is processing all the tokens from the original prompt.

Time point 2 is replying.


Next token is mostly bandwidth bound, prefill/ingest can process tokens in parallel and starts becoming more compute heavy. Next token(s) with speculative decode/draft model also becomes compute heavy since it processes several in parallel and only rolls back on mispredict.


Why do all articles say "with 784 GB of unified memory" for the Station when on the official spec page it's not listed as unified but as "GPU Memory: Up to 288GB HBM3e | 8 TB/s" and "CPU Memory: Up to 496GB LPDDR5X | Up to 396 GB/s"?


Some discussion of this box here: https://news.ycombinator.com/item?id=43425935

TLDR, AMD machines based on HX395 offer comparable memory bandwidth and size at lower cost ($2000) but lack the high speed networking and compute power (60TF FP16 on AMD and 120TF on NVIDIA)

Apple M4 Max costs 23% more with equivalent memory, has twice the memory bandwidth but again no fast networking and significantly less compute (30TF FP16 on M4 Max).

For inference on large language models the extra memory bandwidth probably means M4 Max is the fastest option unless you use large batches or long contexts. For training large models needing more than 128GB RAM two of these Nvidia boxes is probably fastest.


Another thing lacking on AMD machines is the software support. Not just that a lot of stuff is CUDA-only still, but AMD themselves struggle to support their own hardware for ROCm. According to their docs ROCm for HX395 is supported on Windows[0] but not Linux[1]. Though further reading indicates the Windows version is also just a subset of Linux ROCm that seems to not support PyTorch or anything. I really don't know what they're doing over there but I hate it.

[0] https://rocm.docs.amd.com/projects/install-on-windows/en/lat...

[1] https://rocm.docs.amd.com/projects/install-on-linux/en/lates...


It's a huge mess.

I think people are running LLMs on HX395 using the Vulkan backend in llama.cpp which will work on Linux. No RocM or pytorch though.


Is it possible to install the DGX stack on other distros besides Ubuntu? Nvidia say it's possible but does anyone actually did it?

>You also have the option to install the NVIDIA DGX Software Stack on a regular Ubuntu 22.04 while still benefiting from the advanced DGX features. This installation method supports more flexibility, such as custom partition schemes.

https://docs.nvidia.com/dgx/dgx-os-6-user-guide/introduction...


this reminds me of the (now discontinued) lambda tensorbook that I bought a few years back. AI computers are interesting because they are extremely similar to really good gaming computers but the marketing (and some of the hardware) is different.

I don't fully understood why they are separate from gaming computers other than marketing, but I am very happy with my tensorbook. I got it when I was very young and impulsive I don't know if I would get it again, although that fact alone speaks to the computer's longevity.


Gaming computers usually run Windows and games. AI computers often run Linux and AI software. The differences require different configurations of both hardware and software.

So, they make products preloaded with configurations for each market segment. The customers pay for all the headaches this saves them of custom setups.


Lambda collaborated with Razer on their last tensorbook in 2022. It was literally a then current Razer Blade running Ubuntu


Good for training, definitely a bad idea for inference. But if you are spending that much money, why not just buy the equivalent of GPUs? You could buy 10 12GB 3060s for that price.


Powering ten 3060's and having a computer that can accept ten GPUs becomes a non-negligible hurdle to overcome.


For LLM developers, is there really no advantage to having a big block of unified memory, rather than a bunch of devices with a small amount of memory each?


MoE inference wouldn't be terrible. That being said, there's not a good MoE model in the 70-160B range as far as I'm aware.


As with buying consumer GPUs from Nvidia now, these prices mean nothing, because it will be impossible to buy one of these at MSRP.

The actual price will be 2-3x this from some scalper who has a giant pile of them while people who want to do actual work with them pay through the nose.

That's assuming the connectors don't catch on fire, no cores are missing the way ROPs have been missing with 5090s, etc.


They’ve offered reservations that, when selected, give the buyer a 4 day option to purchase. The reservations are accessible enough I was able to grab one without much effort

So would seem like they’re managing the allocations pretty carefully, or at least trying harder than they have before.


Too bad it won't be available at scale. We had a customer who asked us if he could get 200 of those boxes for his AI scientists. But they're apparently only sold in single quantities on the NVidia store or something, not through the official distribution channels


$3k is already close to the price of just a 5090, so if you can have a reasonably fast but way more (equivalent to) VRAM (128gb) machine sounds great for a hobbyist assuming you can easily run and train llama, Stable Diffusion and similar vram-hungry projects.


Nvidia completely skipped a middle tier. DGX Spark is very underpowered for $3,000. DGX Station is likely $150k - $200k. There is no middle tier at all.

A middle tier could be something like an RTX 5090-like chip connected to 128GB - 256GB of GDDR7 RAM for $15k - $30k.


Well, there is the RTX PRO 6000. Maybe you could double those up, no NVLink though.


Nvidia should join with Valve and make it a high-end gaming console.

It would be nice dual use system.


I would support this if it meant that Nvidia would improve their Linux drivers. As it stands, though, a device running SteamOS on Nvidia hardware sounds painful.


It'll take a long time for them to rebuild their reputation on that front. I threw away a pile of perfectly serviceable NVIDIA desktop boards a while back because they were "supported", but they couldn't do stuff like run modern compositors or even XScreenSaver.

These were high-end boards that worked fine in Linux when I bought them (as long as you were willing to periodically patch + recompile the kernel from a text console).

Anyway, once they have feature parity with Windows + current CUDA with a 100% open source stack going back a decade, then I'll consider them again. Until then, I'll happily buy an AMD that's 90% as fast for the same price.


There are already a ton of high-end gaming consoles you can buy today with nvidia chips. In fact, this is exactly what I did for my own AI machine (dual 3090s).


A console would be a device like an XBox, PS5 or Switch. Of those, only the Switch (current + next gen) are NVIDIA devices. The Steam Deck and the other current generation consoles are all AMD.


it would be nice if Valve's Steam Big Picture mode actually worked with Nvidia. It's still not the case.


Their last console probably inspired them to not try again for a while.


The steam deck? I would say it was fairly successful.


I got my products mixed up in my head. I retract that. Thanks.


Am I the only one who became curious only to become more interested in the larger DGX station workstation model they are offering with an unaccounced price tag at the time?


No, the station is simply unobtainable for mortals.


$3K, for what? The "NVIDIA Inside" sticker?

Any org with less than several hundred thousand dollars to spare isn't anyone that NVIDIA cares about at all...


Where is that tech promise of SSDs as fast as RAM? Seems like a good time for it to go to production.


3D Xpoint aka Optane? It never really caught on and has since been discontinued…


Still one of my fastest drives, and extremely durable. It's too bad.


Has anyone benchmarked the Spark against a comparably priced Mac?


The comparably priced Mac comes with half the RAM, so if that's a constraint, it's already lost. For applications that aren't memory constrained it would be very interesting to see some benchmarks. Although I suspect it's going to come down to the quality of the relevant CUDA vs Metal libraries as much as anything else.


How are you arriving at double the price? 16-CPU-core Mac Studio with storage upgrade to 1TB and RAM upgrade to 128GB comes out to $3,700. Add $700 if you want to upgrade to the 28-core CPU (there’s nothing directly comparable to the DGX Spark’s 20-core CPU).

None of this speaks to the actual GPU performance. The only spec the DGX Spark webpage mentions is “1,000 FP4 TOPS”.


Id love to see a head to head on the DGX, Mac Studio and the framework models to see where the strength and weaknesses are for each.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: