More

indigodaddy · 2026-01-24T16:20:10 1769271610

Hi, so I'm looking for an stt that can happen on a server/cron, that will use a small local model (I have 4 vCPU threadripper CPU only and 20G ram on the server) and be able to transcribe from remote audio URLs (preferably, but I know that local models probably don't have this feature so will have to do something like curl the audio down to memory or /tmp and then transcribe and then remove the file etc).

Have any thoughts?

d4rkp4ttern · 2026-01-24T22:59:50 1769295590

I’ve no thoughts on that unfortunately.

indigodaddy · 2026-01-24T23:41:20 1769298080

indigodaddy · 2026-01-24T15:22:26 1769268146

How would ipv6 solve this specific problem?

indigodaddy · 2026-01-24T15:21:29 1769268089

They're aiming to reduce complexity for the user wherever they can, so their efforts around this make total sense for the platform they've created.

indigodaddy · 2026-01-24T15:19:59 1769267999

If anyone wants to see how exe.dev works, a Marimo dev put together a really nice video:

https://www.youtube.com/watch?v=bV60dwhL2x4

indigodaddy · 2026-01-23T13:06:41 1769173601

I used Opus 4.5 for 99.9% of this project. Take that as you will.

https://github.com/jgbrwn/vibebin

vibebin is an Incus/LXC-based platform for self-hosting persistent AI coding agent sandboxes with Caddy reverse proxy and direct SSH routing to containers (suitable for VS Code remote ssh). Create and host your vibe-coded apps on a single VPS/server.

If anyone wants to test or provide some feedback that would be great. Core functionality works but there's likely to be bugs.

My intent for the project was for the tinkerer/hobbyist or even not super technical person to put this on a VPS and start just doing their own thing/experimenting/tinkering/learning etc.

I had so much fun working on this project, completely reinvigorated by it tbh.

I am just a Linux sysadmin and not a programmer at all (~just~ smart enough to figure stuff out though:) ) and I have to say the excitement and energy that was brought into me working on this project was nothing like I've ever experienced before. It makes me so optimistic about this future that we are either embracing or fending off (depending on your mindset).

indigodaddy · 2026-01-23T01:36:50 1769132210

Simon how do you think this would perform on CPU only? Lets say threadripper with 20G ram. (Voice cloning in particular)

simonw · 2026-01-23T03:06:43 1769137603

No idea at all, but my guess is it would work but be a bit slow.

You'd need to use a different build of the model though, I don't think MLX has a CPU implementation.

genewitch · 2026-01-23T03:30:45 1769139045

the old voice cloning and/or TTS models were CPU only, and they weren't realtime, but no worse than 2:1, 30 seconds of audio would take 60 seconds to generate. roughly. in 2021 one-shot TTS/cloning using GPUs was getting there, and that was close enough to realtime; one could, if one was willing to deal with it, wire microphone audio to the model, and speak words, and the model would, in real time, modify the voice. Phil Hendrie is jealous.

anyhow, with faster CPUs and optimizations, you won't be waiting too long. Also 20GB is overkill for an audio model. Only text - LLM - are huge and take infinite memory. SD/FLUX models are under 16GB of ram usage (uh, mine are, at least!), for instance.

indigodaddy · 2026-01-22T18:05:19 1769105119

What happens to all that content/work when this sort of thing happens? Just gone gone or?

literallywho · 2026-01-23T03:47:33 1769140053

Usually gone or retooled for something else. Even just within the Prince of Persia series, they had a game called Prince of Persia: Assassins that was canceled and turned into Assassin's Creed. There was also a sequel to Prince of Persia 2008 in development that was cancelled and never showed up again. You can even find footage from both of these games online, but they will never see the light of day unless someone leaks them (which does happen sometimes).

indigodaddy · 2026-01-23T03:49:02 1769140142

Thanks

indigodaddy · 2026-01-22T16:12:22 1769098342

How does the cloning compare to pocket TTS?

andhuman · 2026-01-22T19:19:49 1769109589

It’s uncanny good. I prefer it to pocket, but then again pocket is much smaller and for realtime streaming.

indigodaddy · 2026-01-23T03:47:47 1769140067

Ah right I guess I meant for instant which I assume qwen can't do

quinncom · 2026-01-22T21:29:04 1769117344

Pocket TTS is much smaller: 100M parameters versus 600–1800M.

indigodaddy · 2026-01-23T01:35:37 1769132137

Ah right so I guess qwen3-tts isn't going to work for cpu-only like pocket TTS can(?)

magicalhippo · 2026-01-23T13:05:25 1769173525

The current code doesn't appear very optimized. Running on CPU-only it only uses four threads for example, nowhere close to saturating all my cores.

As a result it's dog slow on CPU only, like 3-4 minutes to produce a 3 second clip, and still significantly less than real-time on my 5090 using only 30% of the GPU.

indigodaddy · 2026-01-22T15:56:48 1769097408

Stumbled on this, not affiliated

indigodaddy · 2026-01-21T23:21:34 1769037694

Not exactly sandboxing as much as just working within LXC containers, I've been building this:

https://github.com/jgbrwn/vibebin