Hacker News new | past | comments | ask | show | jobs | submit login

Ollama (also wrapping llama.cpp) has GPU support, unless you're really in love with the idea of bundling weights into the inference executable probably a better choice for most people.



Ollama is great if you're really in love with the idea of having your multi gigabyte models (likely the majority of your disk space) stored in obfuscated UUID filenames. Ollama also still hasn't addressed the license violations I reported to them back in March. https://github.com/ollama/ollama/issues/3185


I wasn't aware of the license issue, wow. Not a good look especially considering how simple that is to resolve.

The model storage doesn't bother me but I also use Docker so I'm used to having a lot of tool-managed data to deal with. YMMV.

Edit: Removed question about GPU support.


I think this is also a problem in a lot of tools, that is never talked about.

Even myself I’ve not thought about this so deeply, even though I am also very concerned about honoring other people’s work and that licenses are followed.

I have some command line tools for example that I’ve written in Rust that depend on various libraries. But because I distribute my software in source form mostly, I haven’t really paid attention to how a command-line tool which is distributed as a compiled binary would make sure to include attribution and copies of the licenses of its dependencies.

And so the main place where I’ve given more thought to those concerns is for example in full-blown GUI apps. There they usually have an about menu that will include info about their dependencies. And the other part where I’ve thought about it is in commercial electronics making use of open source software in their firmware. In those physical products they usually include either some printed documents alongside the product where attributions and license texts are sometimes found, and sometimes if the product has a display, or a display output, they have a menu you can find somewhere with that sort of info.

I know that for example Debian is very good at being thorough with details about licenses, but I’ve never looked at what they do with command line tools that compile third-party code into them. Like does Debian package maintainers then for example dig up copies of the licenses from the source and dependencies and put them somewhere in /usr/share/ as plain text files? Or do the .deb files themselves contain license text copies you can view but which are not installed onto the system? Or they work with software authors to add a flag that will show licenses? Or something else?


It's really something that should be abstracted by the linker. Codebases like zlib for example will just put a `const char notice[] = "Copyright Adler et al";` in one of their files so the license issue with zlib is solved simply by using zlib. However modern linkers have gotten so good that -fdata-sections -Wl,--gc-sections will strip that away and probably LTO too. In Cosmopolitan Libc I used to use the GNU assembler `.ident` directive in an asm() tag at the tops of .c files to automate license compliance. But some changes to ld.bfd ended up breaking that. Now I have to use these custom defines like https://github.com/jart/cosmopolitan/blob/706cb6631021bbe7b1... and https://github.com/jart/cosmopolitan/blob/706cb6631021bbe7b1... and https://github.com/jart/cosmopolitan/blob/706cb6631021bbe7b1... to get the job done. It really should be a language feature so that library authors can make it as simple as possible for users to comply with their license. I just don't think I've ever seen anyone think about it this way except for maybe Google's JavaScript minifiers, which is where I got the idea.


Llamafile is great if you don't want to run any meaningful models because it's limited to 4GB.


That's a Windows limitation, though.


Even on Windows, you just run the binary separate from the model file. I actually run a single binary separate from the model files because I run it with multiple of them, so I kind of forgot that that was even the default way it kind of expects you to hold it.


When I said

> such great performance that I've mostly given up on GPU for LLMs

I mean I used to run ollama on GPU, but llamafile was approximately the same performance on just CPU so I switched. Now that might just be because my GPU is weak by current standards, but that is in fact the comparison I was making.

Edit: Though to be clear, ollama would easily be my second pick; it also has minimal dependencies and is super easy to run locally.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: