More

SilenN · 2026-01-22T23:55:12 1769126112

Simply, it's when your output embedding matrix = input.

You save vocab_dim*model_dim params (ex. 617m for GPT-3).

But the residual stream means that the weight matrices are roughly connected via a matmul, which means they struggle to encode bigrams (commutative property enforces symmetry).

Attention + MLP adds nonlinearity, but it still means less expressivity.

Which is why they aren't SOTA, but are useful in smaller models.

SilenN · 2026-01-20T17:59:56 1768931996

https://news.ycombinator.com/item?id=46685327

SilenN · 2026-01-20T17:53:28 1768931608

Yes! I went to a meetup for it as well

- https://x.com/silennai/status/1907540814890521023?s=20

- https://x.com/silennai/status/1918166890322784407?s=20

SilenN · 2026-01-20T17:51:15 1768931475

This is probably true and not emphasized enough. Many are making their value judgements with token limits (which is fair, it is expensive).

SilenN · 2026-01-20T09:09:19 1768900159

See

https://news.ycombinator.com/item?id=46685489

https://news.ycombinator.com/item?id=46687347

SilenN · 2026-01-20T08:59:17 1768899557

Appreciate it! And agree. I added a longer comment above saying basically this.

SilenN · 2026-01-20T08:57:30 1768899450

Outside of the story at the start I intentionally tried to make it information dense. Would appreciate feedback!

HNisCIS · 2026-01-20T17:02:21 1768928541

The story itself is fine it's more the trained response to LLM voice that causes problems

SilenN · 2026-01-20T08:56:44 1768899404

Would appreciate feedback, see my comment above!

SilenN · 2026-01-20T08:55:54 1768899354

What sections where you most off put by?

I used Claude to expand on my ideas for a few of the purely informational things, and for formatting, but this article is largely written by hand.

For example "Interface tests are the ability to know what's wrong and explaining it." is in hindsight a confusing sentence. Many such cases.

stavros · 2026-01-20T08:58:17 1768899497

It's things like this "super impactful!!!" style it has:

> Enter Claude Code 2.0.

> The UX had evolved. The harness is more flexible and robust. Bugs are fixed. But that's all secondary.

It's OK for emphasis on some things, but when you see it on every blog, it's a bit much.

Plus, I dislike that everything is lists with LLMs, it's another thing that you just start seeing everywhere.

SilenN · 2026-01-20T09:04:32 1768899872

That section doesn't include a lick of AI writing. One tell (maybe?) is that I switch from past tense to present tense mid sentence.

Either a) I sound like an LLM when I'm writing articles (possible) or b) turing test AGI something something.

Lists point is fair, I did use Claude for formatting. Where did it off put you here?

stavros · 2026-01-20T09:14:19 1768900459

There isn't a specific place, it's the general aesthetic. Maybe you do sound like an LLM :P I guess it's not unlikely to pick up some mannerisms from them when everyone is using them.

I guess I don't really mind the use of an LLM or not, it's more the style that sounds very samey with everything else. Whether it's an LLM or not is not very relevant, I guess.

SilenN · 2026-01-20T17:48:56 1768931336

Good point. I'll consciously be trying to write more "out of distribution" now.

mmq · 2026-01-20T09:48:27 1768902507

> a) I sound like an LLM when I'm writing articles (possible) or b) turing test AGI something something.

We entered the machinable culture. We spent many years trying to make the machine mimic humans, now humans are mimicking the machine :)

SilenN · 2026-01-20T07:47:27 1768895247

Try Cursor Composer! It's the most natural transition. Exactly what you're currently doing, but it inserts the code snippets for you from within your IDE.