Hacker News new | past | comments | ask | show | jobs | submit login
Mlx-community/OLMo-2-0325-32B-Instruct-4bit (simonwillison.net)
64 points by mdp2021 41 days ago | hide | past | favorite | 23 comments



OLMo uses open datasets, such as CommonCrawl and StackOverflow, for training, about 5TB worth of text. I wonder how well it would perform if it was also trained on Annas Archive/LibGen (>600TB).


A possibly better question could be how well it would perform if it was trained on selected material - see the efforts of Mortimer Adler in the USA, or the efforts of any good publishing house in the definition of editorial collections.

But I remain skeptical that without "critical thinking as a condition to write into "conscious" memory" the barrier of "conformism" will ever be broken.


Not a lawyer but would assume downloading material from libgen is, in the vast majority of cases, illegal because it's a breach of copyright or similar. That’s gotten Meta in quite a spectacle of late [1]

[1] https://www.loeb.com/en/insights/publications/2023/12/richar...


CommonCrawl is composed of copyrighted contents too. You gain copyright on your work automatically the moment you created it, including this very comment.


What if I repost your comment without your permission?


One could argue that using copyrighted content in LLMs, much like reposting, should fall under fair use. This is also Microsoft's claim in the GitHub Copilot lawsuits. It's up to the court to decide though. (IANAL)


In many jurisdictions it's just sharing that is illegal, not obtaining.


Yes. The interesting legal question is that are you sharing the original knowledge if you've transformed it via teaching it to an AI.

https://www.reuters.com/legal/litigation/ai-companies-lose-b... reports on the ongoing case on the image generation side of the fence.


That is called copyright laundering FYI.


It’s a catchy term, but loaded. Copyright protects only original expression, not ideas and information. So if a computer algorithm reads the former and outputs the latter, arguably copyright isn’t involved at all.

There are plenty of good counterarguments to this as well, when you consider the effects of automation and scale. I’m definitely interested in seeing how the jurisprudence develops as these cases go through the courts.


I have struggled with SVG generation with just about all models, the SVG demo for this model is more or less that I get from much larger models.

Am I doing something wrong? Everyone seems to say how well models work in producing SVGs but I get shapes in all sorts of the wrong places. SVG documents are quite low level (verses editing them in Inkscape or Illustrator) so its tricky to modify, beyond very simple shapes.


The models are mostly terrible at SVG output, at least if you ask for something that's hard (or impossible?) to draw like a pelican riding a bicycle. That's why I use it as a benchmark, I think it's amusing: https://simonwillison.net/tags/pelican-riding-a-bicycle/

Some of them can do good SVGs for things that make sense, like simple diagrams.


These works well for some svg that are simple and already in training data but doesn't work for harder svgs, even simple one if they are out of distribution of training data.

In simon's example whole purpose is to make it draw something that it has not seen before but can easily infer from geometry, spatial arrangement. I think it makes a fun problem.


Sorry, I realize I should also provide links to resources published by the model maker, the Allen Institute for Artificial Intelligence ("Ai2"):

https://allenai.org/blog/olmo2-32B

https://allenai.org/blog

https://github.com/allenai/olmo

https://allenai.org/papers


I think it’s a big deal to see a fully open LLM now achieving this level of quality. While the partially open releases we’ve seen from the big labs are are quite valuable, models like OLMo-2 are the only way that researchers can truly study this technology to answer questions about how the models’ capabilities are shaped by their training data and training process.

The closed and partly-closed models rely on a lot of secret sauce, so it’s also just really impressive to see their results being replicated in the open.


Hear, hear.

In the paramount tasks is to understand the internals of the "black box", get knowledge, engineer better. Of course having "fully open" projects should help that.


You should link to the original model too: https://huggingface.co/allenai/OLMo-2-0325-32B-Instruct

Kudos to Allen AI for their great work on a fully-open LLM!


100%! Allen AI OLMo. Thank you.

I was here wondering if there was a specific reason for MLX behind this model, but (thankfully thinking of openness) nothing to do with the original model.

(*) https://allenai.org/olmo


Here's the huggingface link from that article: https://huggingface.co/mlx-community/OLMo-2-0325-32B-Instruc...

How do mlx quants compare to gguf?

Edit: the thread below says that mlx is faster, but gguf quantisations tend to maintain better text quality.

https://www.reddit.com/r/LocalLLaMA/comments/1gc0t0c/how_doe...


The download-model step breaks with:

max_rec_size = mx.metal.device_info()["max_recommended_working_set_size"]

RuntimeError: [metal::device_info] Cannot get device info without metal backend

Because it is accompanied with a huge stacktrace, it makes me think this is a genuine bug and I hope Simon will fix it.


What hardware are you using?

My plugin only works on Apple Silicon.


"refreshingly abstract" is just another term for wrong.. not particuarly helpful..


My joke there makes more sense in the context of the series: https://simonwillison.net/tags/pelican-riding-a-bicycle/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: