More

zoba · 2026-03-04T16:16:39 1772640999

I tried the new qwen model in Codex CLI and in Roo Code and I found it to be pretty bad. For instance I told it I wanted a new vite app and it just started writing all the files from scratch (which didn’t work) rather than using the vite CLI tool.

Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.

sosodev · 2026-03-04T16:30:14 1772641814

I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.

They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.

Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.

malwrar · 2026-03-04T19:02:29 1772650949

+1 to this, anecdotally I’ve found in my own evaluations that if your system prompt doesn’t explicitly declare how to invoke a tool and e.g. describe what each tool does, most models I’ve tried fail to call tools or will try to call them but not necessarily use the right format. With the right prompt meanwhile, even weak models shoot up in eval accuracy.

vardalab · 2026-03-04T19:11:13 1772651473

Have frontier lab do the plan which is the most time consuming part anyways and then local llm do the implementation. Frontier model can orchestrate your tickets, write a plan for them and dispatch local llm agents to implement at about 180 tokens/s, vllm can probably ,manage something like 25 concurrent sessions on RTX 6000 Do it all in a worktrees and then have frontier model do the review and merge. I am just a retired hobbyist but that's my approach, I run everything through gitea issues, each issue gets launched by orchestrator in a new tmux window and two main agents (implementer and reviewer get their own panes so I can see what's going on). I think claude code now has this aspect also somewhat streamlined but I have seen no need to change up my approach yet since I am just a retired hobbyist tinkering on my personal projects. Also right now I just use claude code subagents but have been thinking of trying to replace them with some of these Qwen 3.5 models because they do seem cpable and I have the hardware to run them.

Tepix · 2026-03-04T20:11:48 1772655108

What is "the new qwen model"? There are a dozen and you can get them in a dozen different quantizations (or more) which are of different quality each.

lreeves · 2026-03-04T19:54:29 1772654069

In my experience Qwen3.5/Qwen3-Coder-Next perform best in their own harness, Qwen-Code. You can also crib the system prompt and tool definitions from there though. Though caveat, despite the Qwen models being the state of the art for local models they are like a year behind anything you can pay for commercially so asking for it to build a new app from scratch might be a bit much.

zoba · 2026-03-02T01:16:42 1772414202

Will this be called Web 4.0?

fny · 2026-03-02T02:38:36 1772419116

There was never a 3.0...

leptons · 2026-03-02T16:38:36 1772469516

https://www.web3isgoinggreat.com/

adithyassekhar · 2026-03-02T03:29:18 1772422158

There's no 3.0 in ba sing se

zoba · 2026-03-02T03:43:13 1772422993

“Web 3” was crypto

kibibu · 2026-03-02T03:54:14 1772423654

It was originally the eternally-on-the-horizon Semantic Web, before somebody decided to reuse the name into something to do with crypto (perhaps without bothering to search for "web 3" beforehand)

zoba · 2026-01-14T18:55:42 1768416942

https://dzoba.com

zoba · 2025-12-20T23:33:59 1766273639

Had great success with this prompt: “QA this website for me. Report all bugs”

zoba · 2025-11-11T19:17:52 1762888672

iPhones are now so big we need a special carrying device for it.

Please bring back the mini :’(

zoba · 2025-10-14T14:25:32 1760451932

I’d also be interested in this. Especially for Macs

zoba · 2025-10-13T22:30:22 1760394622

I’m very excited for this. An early question I have: what would need to be done to make this a “thinking” model?

zoba · 2025-09-10T18:21:05 1757528465

Thinking about what Jony Ive said about “owning the unintended consequence” of making screens ubiquitous, and how a voice controlled, completely integrated service could be that new computing paradigm Sam was talking about when he said “ You don’t get a new computing paradigm very often. There have been like only two in the last 50 years. … Let yourself be happy and surprised. It really is worth the wait.”

I suspect we’ll see stronger voice support, and deeper app integrations in the future. This is OpenAI dipping their toe in the water of the integrations part of the future Sam and Jony are imagining.

zoba · 2025-09-10T16:30:51 1757521851

So strange to think about how Vine could’ve won this and an American company could’ve been the leader here.

xnx · 2025-09-10T16:39:42 1757522382

A reminder of how much of success is luck/timing.

moduspol · 2025-09-10T17:06:02 1757523962

And remember Quibi [1]? Short-form video in vertical format specifically for mobile devices? They didn't have every aspect nailed, but they were definitely trailblazers on that front.

[1] https://en.wikipedia.org/wiki/Quibi

AlexAplin · 2025-09-10T18:00:55 1757527255

Quibi launched in April 2020. TikTok by this point would have 2 billion downloads [1]. It's difficult to assess they were trailblazers here. I might even say a component of their failure is free mobile video was widely accessible by this point.

[1]: https://www.theverge.com/2020/4/29/21241788/tiktok-app-downl...

m0llusk · 2025-09-10T19:28:31 1757532511

Didn't have every aspect nailed? Definitely trailblazers? Quibi is a prime example of an absolute business wipeout. They got a bunch of investor money together, showed no interest in what viewers actually want, and then went down in flames immediately upon public release of the product. The whole thing was a disaster that didn't accomplish anything beyond putting a bunch of capital in the pockets of C grade C suite players.

apparent · 2025-09-10T17:28:40 1757525320

Was that at all like TikTok? I thought it was professional creators, not community-sourced.

giancarlostoro · 2025-09-10T16:33:21 1757522001

They really dropped the ball.

casey2 · 2025-09-10T17:16:46 1757524606

Or YouTube. Short form animation was the largest draw of views in the early days before they chose to kill it and become a "serious platform"

jonbiggums22 · 2025-09-10T17:22:17 1757524937

I thought the kept incentivizing longer content so they could cram more ads into the videos. Hard to get some one to watch a 20 second ad for a 2 minute video, but if you can convince everyone to pad that thing up to 10 minutes you could stuff at least 2 ads in there.

crazysim · 2025-09-10T16:32:13 1757521933

It got Kodak'd.

zoba · 2025-09-08T20:29:22 1757363362

I would love to be able to turn off Instagram Reels on iOS like is apparently possible on Android.

amilios · 2025-09-08T20:41:59 1757364119

Is it possible on Android? How? I would love for this ability to exist

teeeeeegz · 2025-09-09T04:50:36 1757393436

are you only trying to access Insta DM's instead?