Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For those that have tried this, what kind of time-to-first-token latency are you seeing?


I had 9 seconds, earlier with Cline. That said, resulting output file I had requested generation of was over 122KB in 58.690 seconds, so I was approaching 2KB per second even factoring in high TTFT.


The high TTFT (around 5-6 seconds) is what kills the excitement for this for me. Sure, when it starts outputting its crazy fast so it’s good for generating single file prototypes, but as soon as you try to use it in Cline or any other agentic loop you’ll be waiting for API requests constantly and it’s a real bottleneck.


TTFT == time to first token.

(I would've just said, "the throughput is fantastic, but the latency is about 3 times higher than other offerings".)


feels very low compared to claude/gpt for me




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: