Hacker Newsnew | past | comments | ask | show | jobs | submit | SilenN's commentslogin

Simply, it's when your output embedding matrix = input.

You save vocab_dim*model_dim params (ex. 617m for GPT-3).

But the residual stream means that the weight matrices are roughly connected via a matmul, which means they struggle to encode bigrams (commutative property enforces symmetry).

Attention + MLP adds nonlinearity, but it still means less expressivity.

Which is why they aren't SOTA, but are useful in smaller models.




This is probably true and not emphasized enough. Many are making their value judgements with token limits (which is fair, it is expensive).


Appreciate it! And agree. I added a longer comment above saying basically this.

Outside of the story at the start I intentionally tried to make it information dense. Would appreciate feedback!

The story itself is fine it's more the trained response to LLM voice that causes problems

Would appreciate feedback, see my comment above!

What sections where you most off put by?

I used Claude to expand on my ideas for a few of the purely informational things, and for formatting, but this article is largely written by hand.

For example "Interface tests are the ability to know what's wrong and explaining it." is in hindsight a confusing sentence. Many such cases.


It's things like this "super impactful!!!" style it has:

> Enter Claude Code 2.0.

> The UX had evolved. The harness is more flexible and robust. Bugs are fixed. But that's all secondary.

It's OK for emphasis on some things, but when you see it on every blog, it's a bit much.

Plus, I dislike that everything is lists with LLMs, it's another thing that you just start seeing everywhere.


That section doesn't include a lick of AI writing. One tell (maybe?) is that I switch from past tense to present tense mid sentence.

Either a) I sound like an LLM when I'm writing articles (possible) or b) turing test AGI something something.

Lists point is fair, I did use Claude for formatting. Where did it off put you here?


There isn't a specific place, it's the general aesthetic. Maybe you do sound like an LLM :P I guess it's not unlikely to pick up some mannerisms from them when everyone is using them.

I guess I don't really mind the use of an LLM or not, it's more the style that sounds very samey with everything else. Whether it's an LLM or not is not very relevant, I guess.


Good point. I'll consciously be trying to write more "out of distribution" now.

> a) I sound like an LLM when I'm writing articles (possible) or b) turing test AGI something something.

We entered the machinable culture. We spent many years trying to make the machine mimic humans, now humans are mimicking the machine :)


Try Cursor Composer! It's the most natural transition. Exactly what you're currently doing, but it inserts the code snippets for you from within your IDE.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: