AFAIK this doesn't really work for interactive use, as LLMs process data seriall... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		magicalhippo 8 months ago \| parent \| context \| favorite \| on: Building a personal, private AI computer on a budg... AFAIK this doesn't really work for interactive use, as LLMs process data serially. So your request needs to pass through all of the cards for each token, one at a time. Thus a lot of PCIe traffic and hence latency. Better than nothing, but only really useful if you can batch requests so you can keep each GPU working all the time, rather than just one at a time.

numpad0 8 months ago [–]

Clearly I wasn't aware enough that DNN is by default like a mesh. Makes sense that it's going to be bottlenecked by the tightest link. Thanks...

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact