- Versioning. Don't change the LLM model behind my application's back. Always provide access to older versions.
- Freedom. Allow me to take my business elsewhere, and run the same model at a different cloud provider.
- Determinism. When called with the same random seed, always provide the same output.
- Citation/attribution. Provide a list of sources on which the model was trained. I want to know what to expect, and I don't want to be part of an illegal operation.
- Benchmarking. Show me what the model can and cannot do, and allow me to compare with other services.
All of these things are there right out of the box with the HuggingFace toolset.
(Determinism does depend more on the exact software running the model. In general it works now but there are occasional exceptions like PyTorch on M1 not being deterministic the first time you initialize it or something weird)
Is this true for LLMs and not for at least Stable Diffusion? Stable Diffusion is largely deterministic, with the main issues mainly when switching between software or hardware versions of torch, GPU architectures, CUDA/CUDANN, etc.
I thought so too, but I run a stable diffusion service, and we see small differences between generations with the same seed and same hardware class on different machines with the same CUDA drivers running in parallel. It’s really close but there will be subtle differences, (that a downstream upscaler sometimes magnifies), and I haven’t had the time to debug/understand this.
Ah okay that makes sense. In my experience I've only noticed differences when the entire composition changes so I'm guessing it's near pixel level or something?
I assume they're the most noticeable with the ancestral samplers like euler a and the DPM2 a (and variants)?
It is definitely possible. At any point, you can just take a snapshot of the weights. Together with a description of the architecture, this is a complete description of a model.