> From my (just a user) perspective, GPUs are expensive, they shouldn't be left standing if they're not being used.
How much does a idling GPU actually take when there is no monitor attached and no activity on it? My monitor turns off after 10 minutes of inactivity or something, and at that point, I feel like the power draw should be really small (but haven't verified it myself).
I'm building a local medical AI app for Mac, recently published on the App Store. https://apple.co/4mlYANu
It uses medgemma 4B for analyzing medical images and generating diagnostic insights and reports, ofc must be used by caution, its not for real diagnostics, can be something to have another view maybe.
Currently, it supports chat and report generation, but I'm stuck on what other features to add beyond these. Also experimenting with integrating the 27B model, even with 4bit quantization, looks better than 4b.
Sorry, but it's garbage for now, at least in my case. From an asynchronous agent, I would expect it to get more done in 10 minutes than a regular agent like claude code. Instead, you give it a task, wait 10 minutes, and get garbage code. Then you provide feedback, wait another 10 minutes, and still get something that doesn't compile. Meanwhile, Claude Code does this in 10 seconds and usually produces runnable code.
That's the problem with remote async agents. You can't steer the ship effectively anymore. Sure, i have 10 pull requests in 10 minutes by now, and i'm throwing all of them out because they are crap after i spent about an hour reviewing them.
It’s great news that they provide it for free. It’s hard to subscribe to all the LLM providers. Even with a pro subscription, you need to buy credits to be able to use with the editors, which gets very expensive if you use them a lot.
On another side, I really like the experience of coding with GitHub Copilot. It suggests code directly in your editor without needing to switch tabs or ask separately. It feels much more natural and faster than having to switch tabs and request changes from an AI, which can slow down the coding process.
I believe that if Mistral is serious about advancing in open source, they should consider sharing the corpus used for training their models, at least the base models pretraining data.
I doubt they could. Their corpus almost certainly is mostly composed of copyrighted material they don't have a license for. It's an open question whether that's an issue for using it for model training, but it's obvious they wouldn't be allowed to distribute it as a corpus. That'd just be regular copyright infringement.
Maybe they could share a list of the content of their corpus. But that wouldn't be too helpful and makes it much easier for all affected parties to sue them for using their content in model training.
no, not the actual content, just the titles of the content.
like "book title" by "author". the tool just simply can't be taken seriously by anyone until they release that information. this is the case for all these models. it's ridiculous, almost insulting.
I am curious about how they gained access to a huge cluster to train a 7B model from scratch. It seems like they used the EuroHPC cluster, Leonardo, but I am not sure if it was provided by EuroHPC or if they paid for it. If it was a giveaway, what privileges did they have to access such a large cluster?
I re-implemented Mamba myself and this was the first time I had ever worked with einops/einsum. I'm 50/50 on them after this. I found them relatively easy to look at and understand the intent (possibly more so than other representations), but talking extra time to transforms into other primitives (loops, multiplication, etc). I belive using torch.einsum is generally well optimized as well compared to naively looping. All said, I don't know if I'd use it myself working from scratch but it's interesting to know and if I was working in python I might try comparing the speed of einops/sum vs other ways.