I'm downloading DeepSeek-V3.2-Speciale now at FP8 (reportedly Gold-medal performance in the 2025 International Mathematical Olympiad and International Olympiad in Informatics).
It will fit in system RAM, and as its mixture of experts and the experts are not too large, I can at least run it. Token/second speed will be slower, but as system memory bandwidth is somewhere around 5-600Gb/s, so it should feel OK.
Check out "--n-cpu-moe" in llama.cpp if you're not familiar. That allows you to force a certain number of experts to be kept in system memory while everything else (including context cache and the parts of the model that every token touches) is kept in VRAM. You can do something like "-c128k -ngl 99 --n-cpu-moe <tuned_amt>" where you find a number that allows you to maximize VRAM usage without OOMing.
This is about more. I can run 600B+ models at home. Today I was having a discussion with my wife and we asked ChatGPT a quick question, it refused because it can't generate the result based on race. I tried to prompt it to and it absolutely refused. I used my local model and got the answer I was looking for from the latest Mistral-Large3-675B. What's the cost of that?
The author was running a quantised version of GLM 4.5 _Air_, not the full fat version. API pricing for that is closer to $0.2/$1.1 at the top end from z.ai themselves, half the price from Novita/SiliconFlow.
I think there are probably Law Firms/doctors offices that would gladly pay ~3-4K euro a month to have this thing delivered and run truely "on-prem" to work with documents they can't risk leaking (patent filings, patient records etc).
For a company with 20-30 people, the legal and privacy protection is worth the small premium over using cloud providers.
Just a hunch though! This would have it paid-off in 3-4 months?
They are based in the UK. That is technically Europe, but I believe for privacy regulations it isn't the same as a EU-country, but I could be very wrong. Would love to be educated on this by someone.
Currently at the millions stage with https://mailpace.com relying mostly on Postgres
Tbh this terrifies me! We don’t just have to log the requests but also store the full emails for a few days, and they can be up to 50 mib in total size.
My favourite band (king gizzard) removed all their music from Spotify. I took the opportunity to switch to navidrome with tailscale and started obtaining music via bandcamp and ripping old CDs. It works much better than I expected, even transcoding from flac to mp3 on the fly from my phone app.
Investing the Spotify fee every month into my own music collection is a great investment, and it has meant that I am actually listening to the music and not just playing the same songs off a Spotify playlist every now and then again
For the dead comment asking about whether this is a vscode fork, it’s not- it’s a completely new, custom word processor written in rust from the ground up
Sounds expensive. Amazon SES has 1k emails/month included for free (if you use an API to send). When sending via SMTP that quota does not apply, but still 1k Emails just costs 0.1$ (yes, 10 cents). I do not use any other AWS services but SES for my emails because of the pricing, I host everything else on Hetzner.
That doesn't seem like even close to the truth, else Amazon SES would have no business. I use it myself in my Webapp to deliver signup verification and haven't gotten a single complaint so far.
Thanks for recommending mailpace, £7.50/month for 10,000 emails is very reasonable, _and_ they support idempotency! Definitely makes me consider switching to them..
How long would it take to recoup the cost if you made the model available for others to run inference at the same price as the big players?