More

radq · 2025-09-27T01:49:46 1758937786

The 'point' skill is trained on a ton of UI data; we've heard of a lot of people using it in combination with a bigger driver model for UI automation. We are also planning on post-training it to work end-to-end for this in an agentic setting before the final release -- this was one of the main reasons we increased the model's context length.

Re: chart understanding, there are a lot of different types of charts out there but it does fairly well! We posted benchmarks for ChartQA in the blog but it's on par with GPT5* and slightly better than Gemini 2.5 Flash.

* To be fair to GPT5, it's going to work well on many more types of charts/graphs than Moondream. To be fair to Moondream, GPT5 isn't really well suited to deploy in a lot of vision AI applications due to cost/latency.

radq · 2025-09-27T01:41:14 1758937274

Thanks! If you could shoot me a note at vik@m87.ai with any examples of the precision/recall issues you saw I'd appreciate it a ton.

buyucu · 2025-09-27T16:44:49 1758991489

are you planning to release a GGUF?

scoots_k · 2025-09-27T04:24:52 1758947092

Will do!

nstj · 2025-09-27T11:57:46 1758974266

Wonderful to see "at the coalface" collaboration happen on this stuff at HN. More than just a newsfeed!

radq · 2025-06-06T01:56:00 1749174960

Cool project! The codebase is simple and well documented, a good starting point for anyone interested in how to implement a high-performance inference engine. The prefix sharing is very relevant for anyone running batch inference to generate RL rollouts.

radq · on Dec 5, 2024

Hello folks, I work on moondream. Posted a demo video on twitter for this release: https://x.com/vikhyatk/status/1864727630093934818

Happy to answer any questions!

radq · on Dec 1, 2024

Not true, H100s cost $2-3/GPU/hr on the open market.

menaerus · on Dec 1, 2024

Yes, they even do at $1/GPU/hr. However, 8xH100 cluster at full utilization is ~8kWh of electricity and costs almost ~0.5M$. 16xH100 cluster is probably 2x of that. How many years before you break-even at ~24$/GPU/day income?

Jabbles · on Dec 1, 2024

7

https://www.google.com/search?q=0.5e6%2F8%2F24%2F365

menaerus · on Dec 1, 2024

Did you really not understand rethoric nature of my question and assumed that I can't do 1st grade primary school math?

solidasparagus · on Dec 1, 2024

Who cares? That's someone else's problem. I just pay 2-3$/hr and the H100s are usable

sangnoir · on Dec 2, 2024

You should care about counterparty risks. If your business model depends on unsustainable 3rd party prices powered by VC largesse and unrealizable dreams of dominance, the very least you can do is plan for the impending reckoning, after which GPU proces will be determined by costs.

menaerus · on Dec 2, 2024

Look, I understand that some people are short-sighted and can hardly think out of the box and that is totally fine by me. I don't judge you for being that so I kindly ask you not to judge my question. Learn to give some benefit of the doubt.

radq · on June 13, 2024

Have you considered sponsoring an open-source project? ;)

radq · on March 30, 2024

1/3rd "activated parameters", while also requiring 2x the VRAM.

YetAnotherNick · on March 30, 2024

That's the point of MoE. Sacrificing VRAM for compute/RAM bandwidth which makes it harder sell for consumer devices but easier for server devices where things are more likely to be compute or RAM bandwidth bound.

radq · on March 19, 2024

The training technique used here (fitting something similar to a NeRF to different views of the same image) is pretty similar to this paper which uses a similar technique to denoise (instead of upscale) output features: https://arxiv.org/abs/2401.02957

radq · on Dec 4, 2023

I'm confused - you posted in the "who wants to be hired" thread, and then got an email from this company asking if you'd be interested?

ryaneager · on Dec 4, 2023

Yeah, but who’s hiring threads the company is supposed to read your skills and respond if you are an actual fit, not just send it to every single person who posted. I’d bet dollars to donuts that half the candidates they sent it to don’t even qualify for the position.

CoastalCoder · on Dec 4, 2023

I think the assumption here is that the company claims they looked at your qualifications and decided it might be worth your effort to apply, when in fact they didn't.

If that's what's happening, it's a form of fraud (but legal, I imagine).

catharsisatlast · on Dec 4, 2023

You're right. You are confused. He or she didn't get an email like that. It was an email that said, "Saw your profile on HN and we think your skills look like a good fit for our team." No one saw the profile and thought that. Re "wondered if you'd be interested in our YC company," no one wondered that.

This was a deceptive email meant only to advertise the fact that Anima 1) exists, and 2) is accepting applications. It should have been posted in the "Who is hiring?" thread.

Kiro · on Dec 4, 2023

The same with OP. I don't understand this thread at all.

radq · on July 25, 2023

Do outlier features emerge in sub-100M parameter models? I haven't seen any research discuss it below the 124M scale (bert-base). At that scale training a model takes ~4 days on an 8xA100 node.

nl · on July 25, 2023

That is a fair question, and in addition I'm unsure that a simple metric like perplexity is likely to pick it up.

However, I do think that if perplexity showed a lower drop-off using this modified softmax under quantization that would be an exciting finding and enough to indicate further experiments would definitely be worth doing.

But you are right - if it doesn't show an improvement it doesn't necessarily rule out that it could be helping.

Edit: In the Qualcomm AI paper mentioned in this post, they experiment on BERT uncased (109B param) and OPT 125M and are able to show the effects using perplexity.

I hadn't read the paper when I suggested the same approach, so I guess that is good validation it is worth trying.

Edit2: Actually they also test on ViT 22M, which would be even quicker to try I think.