Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some things I'd like to see solved in this space:

    - Versioning. Don't change the LLM model behind my application's back. Always provide access to older versions.
    - Freedom. Allow me to take my business elsewhere, and run the same model at a different cloud provider.
    - Determinism. When called with the same random seed, always provide the same output.
    - Citation/attribution. Provide a list of sources on which the model was trained. I want to know what to expect, and I don't want to be part of an illegal operation.
    - Benchmarking. Show me what the model can and cannot do, and allow me to compare with other services.


All of these things are there right out of the box with the HuggingFace toolset.

(Determinism does depend more on the exact software running the model. In general it works now but there are occasional exceptions like PyTorch on M1 not being deterministic the first time you initialize it or something weird)


Determinism is largely impossible due to arbitrary ordering of GPU threads and non-associativity of floating point operations


Is this true for LLMs and not for at least Stable Diffusion? Stable Diffusion is largely deterministic, with the main issues mainly when switching between software or hardware versions of torch, GPU architectures, CUDA/CUDANN, etc.

Or perhaps I'm wrong about Stable Diffusion too?


I thought so too, but I run a stable diffusion service, and we see small differences between generations with the same seed and same hardware class on different machines with the same CUDA drivers running in parallel. It’s really close but there will be subtle differences, (that a downstream upscaler sometimes magnifies), and I haven’t had the time to debug/understand this.


Ah okay that makes sense. In my experience I've only noticed differences when the entire composition changes so I'm guessing it's near pixel level or something?

I assume they're the most noticeable with the ancestral samplers like euler a and the DPM2 a (and variants)?


IIUC versioning isn't really possible with a neural net, the training process influences the generation pathway

caveat: I have an incredibly superficial understanding of any of this.


It is definitely possible. At any point, you can just take a snapshot of the weights. Together with a description of the architecture, this is a complete description of a model.


This is complete wrong.

A model is a binary artifact that can be versioned like any other binary asset.


LLMs are pretrained and then released for use.

So can have v1, v2 etc.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: