More

albertgoeswoof · 2025-12-10T21:40:59 1765402859

What inference performance are you getting on this with llama?

How long would it take to recoup the cost if you made the model available for others to run inference at the same price as the big players?

kingstnap · 2025-12-10T23:00:05 1765407605

He has GLM 4.5 Running at ~100 Tokens per second.

Assumptions:

Batch 4x and get 400 tokens per second and push his power consumption to 900W instead of the underutilized 300W.

Electricity around €0.2/kWhr.

Tokens valued at €1/1M out.

Assume ~70% utilization.

Result:

You get ~1M tokens per hour which is a net profit of ~€0.8/hr. Which is a payoff time of a bit over a year or so given the €9K investment.

Honestly though there is a lot of handwaving here. The most significant unknown is getting high utilization with aggressive batching and 24/7 load.

Also the demand for privacy can make the utility of the tokens much higher than typical API prices for open source models.

In a sort of orthogonal way renting 2 H100s costs around $6 per hour which makes the payback time a bit over a couple months.

PhilippGille · 2025-12-10T23:31:22 1765409482

> He has GLM 4.5 Running at ~100 Tokens per second.

GLM 4.5 Air, to be precise. It's a smaller 166B model, not the full 355B one.

Worth mentioning when discussing token throughput.

dnhkng · 2025-12-11T07:05:03 1765436703

I'm downloading DeepSeek-V3.2-Speciale now at FP8 (reportedly Gold-medal performance in the 2025 International Mathematical Olympiad and International Olympiad in Informatics).

It will fit in system RAM, and as its mixture of experts and the experts are not too large, I can at least run it. Token/second speed will be slower, but as system memory bandwidth is somewhere around 5-600Gb/s, so it should feel OK.

Gracana · 2025-12-11T14:46:41 1765464401

Check out "--n-cpu-moe" in llama.cpp if you're not familiar. That allows you to force a certain number of experts to be kept in system memory while everything else (including context cache and the parts of the model that every token touches) is kept in VRAM. You can do something like "-c128k -ngl 99 --n-cpu-moe <tuned_amt>" where you find a number that allows you to maximize VRAM usage without OOMing.

segmondy · 2025-12-11T03:50:00 1765425000

This is about more. I can run 600B+ models at home. Today I was having a discussion with my wife and we asked ChatGPT a quick question, it refused because it can't generate the result based on race. I tried to prompt it to and it absolutely refused. I used my local model and got the answer I was looking for from the latest Mistral-Large3-675B. What's the cost of that?

nicman23 · 2025-12-11T07:06:49 1765436809

about the cost of your hardware lol

Deathmax · 2025-12-11T13:33:40 1765460020

The author was running a quantised version of GLM 4.5 _Air_, not the full fat version. API pricing for that is closer to $0.2/$1.1 at the top end from z.ai themselves, half the price from Novita/SiliconFlow.

dnhkng · 2025-12-11T07:39:14 1765438754

Running LLM's directly might not be effective.

I think there are probably Law Firms/doctors offices that would gladly pay ~3-4K euro a month to have this thing delivered and run truely "on-prem" to work with documents they can't risk leaking (patent filings, patient records etc).

For a company with 20-30 people, the legal and privacy protection is worth the small premium over using cloud providers.

Just a hunch though! This would have it paid-off in 3-4 months?

albertgoeswoof · 2025-11-19T15:14:51 1763565291

https://mailpace.com is fully European based and independent

mosselman · 2025-11-19T16:03:35 1763568215

They are based in the UK. That is technically Europe, but I believe for privacy regulations it isn't the same as a EU-country, but I could be very wrong. Would love to be educated on this by someone.

albertgoeswoof · 2025-11-19T16:12:23 1763568743

UK inherited the same gdpr from the EU, so practically it remains the same.

MailPace data is also hosted in the EU only

albertgoeswoof · 2025-10-13T20:05:46 1760385946

Currently at the millions stage with https://mailpace.com relying mostly on Postgres

Tbh this terrifies me! We don’t just have to log the requests but also store the full emails for a few days, and they can be up to 50 mib in total size.

But it will be exciting when we get there!

albertgoeswoof · 2025-10-12T19:41:26 1760298086

My favourite band (king gizzard) removed all their music from Spotify. I took the opportunity to switch to navidrome with tailscale and started obtaining music via bandcamp and ripping old CDs. It works much better than I expected, even transcoding from flac to mp3 on the fly from my phone app.

Investing the Spotify fee every month into my own music collection is a great investment, and it has meant that I am actually listening to the music and not just playing the same songs off a Spotify playlist every now and then again

albertgoeswoof · 2025-10-07T14:57:00 1759849020

For the dead comment asking about whether this is a vscode fork, it’s not- it’s a completely new, custom word processor written in rust from the ground up

albertgoeswoof · 2025-09-25T15:16:54 1758813414

Try https://mailpace.com

The lowest plan $40/year for 1k emails/month isn’t on the Pricing page, but you can select it when signing up.

littlecranky67 · 2025-09-26T08:15:17 1758874517

Sounds expensive. Amazon SES has 1k emails/month included for free (if you use an API to send). When sending via SMTP that quota does not apply, but still 1k Emails just costs 0.1$ (yes, 10 cents). I do not use any other AWS services but SES for my emails because of the pricing, I host everything else on Hetzner.

albertgoeswoof · 2025-09-26T08:35:50 1758875750

Yes but AWS SES emails don't get delivered to inboxes

littlecranky67 · 2025-09-26T10:38:49 1758883129

That doesn't seem like even close to the truth, else Amazon SES would have no business. I use it myself in my Webapp to deliver signup verification and haven't gotten a single complaint so far.

johtso · 2025-09-25T16:01:04 1758816064

Thanks for recommending mailpace, £7.50/month for 10,000 emails is very reasonable, _and_ they support idempotency! Definitely makes me consider switching to them..

iamcalledrob · 2025-09-25T19:35:14 1758828914

Been using Mailpace for a few years.

Has been a 10/10 experience -- rock solid and extremely good deliverability.

Wish the pricing increased non-linearly though at higher volumes.

pier25 · 2025-09-25T15:20:48 1758813648

Thanks. It's not very smart to not list that plan in the pricing page IMO.

jasonfrost · 2025-09-25T15:22:22 1758813742

Or migadu for 19/yr

sodality2 · 2025-09-25T18:39:18 1758825558

Migadu is more for personal emails - they aren't meant for transactional emails at all.

albertgoeswoof · 2025-09-10T19:02:19 1757530939

Here’s an example https://contextsync.dev/

albertgoeswoof · 2025-08-18T07:07:22 1755500842

This is what you’re looking for: https://tritium.legal/

treetalker · 2025-08-19T00:17:21 1755562641

I saw this on HN before, but how is it for litigation?

albertgoeswoof · 2025-08-11T20:30:36 1754944236

Another source to back up the first claim https://carnegieuk.org/blog/online-safety-and-carnegie-uk/

I would like to see much more thorough journalism on the origin of these laws

albertgoeswoof · 2025-08-05T20:45:07 1754426707

3 year old M1 MacBook Pro 32gb, 42 tokens/sec on lm studio

Very much usable