Hacker Newsnew | past | comments | ask | show | jobs | submit | reitzensteinm's commentslogin

Parakeet V3 is over twice the parameter count of Moonshine Medium (600m vs 245m), so it's not an apples to apples comparison.

I'm actually a little surprised they haven't added model size to that chart.


parakeet v3 has a much better RTFx than moonshine, it's not just about parameter numbers. Runs faster.

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard


So I'm kinda new to this whole parakeet and moonshine stuff, and I'm able to run parakeet on a low end CPU without issues, so I'm curious as to how much that extra savings on parameters is actually gonna translate.

Oh and I type this in handy with just my voice and parakeet version three, which is absolutely crazy.


Yeah, I've got a 7950x and 64gb memory. My vibe coding setup for Bevy game development is eight Claude Code instances split across a single terminal window. It's magical.

I tried the desktop app and was shocked at the performance. Conversations would take a full second to load, making rapidly switching intolerable. Kicking off a new task seems to hang for multiple seconds while I'm assuming the process spins up.

I wanted to try a disposable conversations per feature with git worktree integration workflow for an hour to see how it contrasted, but couldn't even make it ten minutes without bailing back to the terminal.


God the number of ghastly survival crafting LLM slop games that are gonna appear on steam 6 months from now...

I'm already dreading it. Steam was already full of junk being released by the dozens every single day. It's hard to think it could be worse.

I also think Steam does a great job a hiding it, and the new recommendation page is really great IMO. Other than some generic AAA, it introduced me to really great games I enjoyed based on my play history.

The more content is available, the more curation is important and IMO their algorithm currently does a good job at it.


> I also think Steam does a great job a hiding it

Steam kept pushing a game as "recommended for you" with 99% negative reviews.

In what world would I possibly want to buy a game with a <1% approval rating?


There are some odd cases like that, but you can always "Ignore" a game and it'll never show up again. That also feeds into Steams curation for you based on your interests.

all stores in general are going to suffer. There needs to be a new model.

The field will spread. I'm working on what I'm intending up be the best game of my career, but you can ship barely functional slop in a few days.

Don't the cli panes flicker like crazy?

No, they're generally pretty solid. Once an hour one will crash, and sometimes there are performance problems, but it's a very workable setup.

> Once an hour one will crash, and sometimes there are performance problems

> pretty solid

Huh?


Claude Code would have been science fiction five years ago. I'm not looking a gift horse in the mouth over a few papercuts.

There is an issue on their github about flickering they don't seem to care much about. I think most AI CLIs are using the same reactish cli thing called ink and all are having the same problems. opencode moved to a different library (opentui?) and their client seems to be doing much better. ALthough I must say I like to run the opencode cli locally with the web option and connect to it with a web browser. It's very nice. Plus you can code in bed :)

Its a cli app that connects to an API. Its no more advanced than a terminal irc client or MUD. Get better standards.

What have you shipped so far?

I think part of the issue is that in production deployments, you're batching high enough that you'll be paging in those long tail experts constantly.

Unless you're handing that in some kind of fancy way, you'll be holding up the batch while waiting for host memory which will kill your throughout.

It makes much more sense for non batched local inference, especially if you can keep the MoE routing stable like you say, but most folks aren't optimising for that.


Ideally, you should rearrange batches so that inference steps that rely on the same experts get batched together, then inferences that would "hold up" a batch simply wait for that one "long tail" expert to be loaded, whereupon they can progress. This might require checkpointing partial inference steps more often, but that ought to be doable.

I think this is doable for very long tail experts that get swapped in for specialised topics - say, orbital mechanics.

But for experts that light up at, say, 1% frequency per batch, you're doing an awful lot of transfers from DRAM which you amortize over a single token, instead of reads from HBM which you amortize over 32 tokens.


I think your analysis is right this would make sense mostly for the 30B-3A style models that are mostly for edge / hobbyist use, where context length is precious so nobody is batching.

Given that experts live per layer I dont think it makes sense to have orbital mechanics experts but … I have wondered about swapping out the bottom 10% of layers per topic given that that is likely where the highest order concepts live. I’ve always wondered why people bother with LORA on all layers given that the early layers are more likely to be topic agnostic and focused on more basic pattern assembly (see the recent papers on how LLMs count on a manifold)


Sure. I'll bet you that in calendar year 2028 Waymo has more paid passenger trips than Tesla. Loser donates $1k USD to Doctors Without Borders.

I'm not a camera only doomer, and expect that in ten years Waymo will also not use Lidar, or that the units will be incredibly cheap and well integrated.

But I think the pro Tesla camp is exaggerating how quickly the march of 9s will happen for them, and underestimating and how quickly Waymo will expand in the next few years.


Will happily take this bet and also happy to 10x it if you want. Would prefer we just pay each other though. You can donate my money if you want

Proceeds going to charity is a must for me - it keeps it friendly and significantly reduces the impact of counterparty risk.

There's a reason Long Now uses this format, and I'm happy to use their platform and pay the fee: https://longbets.org/rules/

Email is reitzensteinm@gmail.com if you're interested.


I am in on this as well and will do 10-1, I will donate $10k for any $1k pledged (would easily do 100-1 as I am sure Tesla will kill robotaxi within next 2 years “to focus on alienrobots on Moon Data Centers”)

OK I'm happy to book 10k on this at 10-1. Or whatever the max is you're OK with betting. Prefer just to give to eachother then donate. Please respond if you're serious and we can trade details

Martin Kleppmann has this tool that's quite relevant: https://martin.kleppmann.com/2014/11/25/hermitage-testing-th...

Oh that is super cool. Great prior art to study in combo with Loom. Very excited to dig in - imagine if there was an easy-to-use data race tester where you didn't have to figure out the interleaving points up front? Just point it at your code and let it find them. Exciting.

Loom does exhaustive search, with clever methods to prune it. On real world programs, you have to set a limit to that because it obviously grows extremely quickly even with the pruning.

I've built something similar to Loom, except it's more focused on extensively modeling the C++11/Rust memory model (https://github.com/reitzensteinm/temper). My experience is that fairly shallow random concurrent fuzzing yields the vast majority of all concurrency bugs.

Antithesis (https://antithesis.com/) are probably the leaders of the pack in going deeper.


Black Mesa is an almost exact copy of Half Life at the start, and where that's true it's incredibly well done. Feels very much like a remaster.

Unfortunately, by the end of the earth levels and certainly on Xen, the levels switch over to original designs. They become massive and sprawling, boring and confusing. They really should have stuck to doing a like for like reimplementation.

I grew up on Half Life, so playing the first half of Black Mesa a few years ago was one of my favorite adult gaming experiences. But I gave up who knows how close to finish line after Xen was insufferable.


The original Xen is rubbish. I’m glad they remade it. Just like they did with “On a rail”.

Black Mesa is a masterpiece.


There is one Xen episode close to the end that is indeed way too big for its own good and quite boring, I will give you that, but otherwise I found the new Xen levels very well made and fleshed out. Lets be honest, the original Xen was quite lackluster...


Well, even if the levels are well made and I've just got poor taste, Half Life was such a tightly designed package, introducing new weapons, things to play with (like the trains), enemies, environment modifiers at a steady pace.

Replacing a 5 minute level with a 20 minute level, even if it's better, ruins that pacing. There's just not enough content in the game to support it.

I agree Xen was by far the weakest of the original levels, but I don't think it's a coincidence that it was also pretty short. I think they knew it had novelty but no staying power and probably cut it to the bone.


I'm confused after reading this thread; the Xen levels are long in original HF, or in Black Mesa?

> Unfortunately, by the end of the earth levels and certainly on Xen, the levels switch over to original designs

By original designs, do you mean HF1 version, or pre-HF1 designs that didn't get implemented in HF1?


Black Mesa has a longer Xen and original designs took inspiration from retail Half-Life but weren't exact copies of map layouts etc.

Do you mean the later levels are bigger and more confusing in black mesa?

Genuine question, as I own both versions and don't know which to play.


But coding is largely trained on synthetic data.

For example, Claude can fluently generate Bevy code as of the training cutoff date, and there's no way there's enough training data on the web to explain this. There's an agent somewhere in a compile test loop generating Bevy examples.

A custom LLM language could have fine grained fuzzing, mocking, concurrent calling, memoization and other features that allow LLMs to generate and debug synthetic code more effectively.

If that works, there's a pathway to a novel language having higher quality training data than even Python.


I recently had Codex convert an script of mine from bash to a custom, Make inspired language for HPC work (think nextflow, but an actual language). The bash script submitted a bunch of jobs based on some inputs. I wanted this converted to use my pipeline language instead.

I wrote this custom language. It's on Github, but the example code that would have been available would be very limited.

I gave it two inputs -- the original bash script and an example of my pipeline language (unrelated jobs).

The code it gave me was syntactically correct, and was really close to the final version. I didn't have to edit very much to get the code exactly where I wanted it.

This is to say -- if a novel language is somewhat similar to an existing syntax, the LLM will be surprisingly good at writing it.


Yet 2025 power sector emissions fell, which is quite discordant with the picture you're attempting to paint.

https://ember-energy.org/data/electricity-data-explorer/?dat...


Hill climbing a password would only be possible if intermediate KV cache entries were stored. To hillclimb "hunter2", you're going to try "a", "b", "c", etc, until you notice that "h" comes back faster. Then you try "ha", "hb" and so on.

But that's only going to work if the cache looks like: "h", "hu", "hun", ..., "hunter2"

If just "hunter2" is in the cache, you won't get any signal until you stumble on exactly that password. And that's before getting into the block size granularity of the caches discussed elsewhere in this thread.

That's not to say timing attacks aren't possible. I haven't looked at Claude Code's prompt generation, but there's no intrinsic reason why you couldn't do things like figure out what open source code and research papers your competitors are loading into context.

Sharing caches between orgs would be an incredible misstep.


Right, you can’t actually guess a letter (byte) at a time but you can guess a token at a time (I believe the vocabulary is 200000 possible tokens in gpt 5) So you could send each of the 200000 possible tokens, see which is cached, and then send 200000 more tokens to find the next cached token Certainly less efficient but well within the realm of a feasible attack


It's a good call out re: tokens vs letters, but I think you might have misunderstood my point - you can't do it a token at a time unless the intermediate KV cache is stored after each token is generated.

This won't be the case in any non toy implementation, as it would be unneccessary and slow.


Ah, fair enough. Anthropic caches at a block level (basically a single message) so for non-trivial messages this is really less of a concern, although I definitely understand why they still scope cache to a single tenant


I'm a little nervous about the correctness of the memory orderings in this project, e.g.

Two acquires back to back are unnecessary here. In general, fetch_sub and fetch_add should give enough guarantees for this file in Relaxed. https://github.com/frostyplanet/crossfire-rs/blob/master/src...

Congest is never written to with release, so the Acquire is never used to form a release chain: https://github.com/frostyplanet/crossfire-rs/blob/dd4a646ca9...

The queue appears to close the channel twice (once per rx/tx), which is discordant with the apparent care taken with the fencing. https://github.com/frostyplanet/crossfire-rs/blob/dd4a646ca9...

The author also suggests an incorrect optimization to Tokio here which suggests a lack of understanding of what the specific guarantees given are: https://github.com/tokio-rs/tokio/pull/7622

The tests do not appear to simulate the queue in Loom, which would be a very, very good idea.

This stuff is hard. I almost certainly made a mistake in what I've written above (edit: I did!). In practice, the queue is probably fine to use, but I wouldn't be shocked if there's a heisenbug lurking in this codebase that manifests something like: it all works fine now, but in the next LLVM version an optimization pass is added which breaks it on ARM in release mode, and after that the queue yields duplicate values in a busy loop every few million reads which is only triggered on Graviton processors.

Or something. Like I said, this stuff is hard. I wrote a very detailed simulator for the Rust/C++ memory model, have implemented dozens of lockless algorithms, and I still make a mistake every time I go to write code. You need to simulate it with something like Loom to have any hope of a robust implementation.

For anyone interested in learning about Rust's memory model, I can't recommend enough Rust Atomics and Locks:

https://marabos.nl/atomics/


> The tests do not appear to simulate the queue in Loom, which would be a very, very good idea.

Loom is apparently this: https://github.com/tokio-rs/loom I've used tokio a bit in the past, but wasn't aware of that tool at all, looks really useful and probably I'm not alone in never hearing about it before. Any tips&tricks or gotchas with it one should know beforehand?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: