Hacker Newsnew | past | comments | ask | show | jobs | submit | shpongled's commentslogin

I looked through their torch implementation and noticed that they are applying RoPE to both query and key matrices in every layer of the transformer - is this standard? I thought positional encodings were usually just added once at the first layer

No they’re usually done at each attention layer.

Do you know when this was introduced (or which paper)? AFAIK it's not that way in the original transformer paper, or BERT/GPT-2

All the Llamas have done it (well, 2 and 3, and I believe 1, I don't know about 4). I think they have a citation for it, though it might just be the RoPE paper (https://arxiv.org/abs/2104.09864).

I'm not actually aware of any model that doesn't do positional embeddings on a per-layer basis (excepting BERT and the original transformer paper, and I haven't read the GPT2 paper in a while, so I'm not sure about that one either).


Thanks! I'm not super up to date on all the ML stuff :)

Should be in the RoPE paper. The OG transformers used multiplicative sinusoidal embeddings, while RoPE does a pairwise rotation.

There's also NoPE, I think SmolLM3 "uses NoPE" (aka doesn't use any positional stuff) every fourth layer.


This is normal. Rope was introduced after bert/gpt2

I think you cropped out the important part of the quote:

> It’s rare that I have more than a drink or two in one night.

I don't drink that often any more, but 2-3 drinks in a night, done occasionally is not a problem. I've had weeks where I drink a beer (or two!) every night, and also don't struggle with any alcohol problems.

2 drinks every single night? Leaning that way - and not great for you just from a health/caloric perspective.


I always wonder why people would make such obvious selective edits that completely change the meaning of a sentence and quote it as if it was what the author intended.

Do they not think people will notice? Or do they not notice that they've even done it?


Maybe they got really excited while reading...


I would pay $5000 to never have to read another LLM-authored piece of text ever again.


One possibility is trying a different form of cardio. I personally don't enjoy running at all... but I love cycling. Running for 30 minutes is super boring, but I can go do a 4-hour ride no problem. If you can't go outside at all, then this won't really help you though.


Same here : I discovered the fun of rollerblading in skate parks at 34.

Never did any sport in my whole life, officially obese, but now I’m taking a collective course in a skatepark every week and I’m having so much fun that I’m forcing myself to do more sessions even when I don’t feel like it. And even if I’m still pretty "bad" at it, it’s just amazingly liberating.

I guess you just have to find your thing ?


> If you can't go outside at all, then this won't really help you though.

If going outside is not an option, stationary bicycles are a thing though there wont be any nice outdoor scenery to go with your cycling.


My best cardio is doubtless stair machine while doing Anki with headphones and a small Bluetooth game pad in hand


"It doesn't get any easier, you just get faster"

- Greg LeMond


N,N-DMT is very intense and not to be taken lightly - but you could say the same with LSD, psilocybin, etc. Personally, I am much more wary of large doses of LSD/psilocybin than DMT, in part to the substantially longer duration of the former. Ego death and the complete dissolution of reality makes it harder to have a bad trip


I'd generally agree with you, but:

  > substantially longer duration of the former
When time stops until the end of this universe gives way to the beginning of this universe and the snake eats it's tail, "longer" doesn't hold much meaning...


True, I should have qualified "actual" duration, not perceived duration!


user name checks out


Yep, I would love anonymous record types, ala StandardML/OCaml


Most car "accidents" are intentional - drivers are choosing to speed (35%), drive intoxicated (30%), or scroll tiktok (15%) rather than pay attention to the road.


Interesting, all those things could be prevented using modern tech? Eg eye tracking

(Not abruptly stopping the car because looking at TikTok, but ... Fines? Limiting speed? Withdrawn license, if repeated?)

And building more railways, subways! Also creates jobs.

Edit: From where did you get the numbers? I googled for "solo car crash reason statistics",

and sleepiness seemed to be one main reason (too?)


I just googled "car fatalities [alcohol/distracted]" etc, and found either NHTSA or California statistics. All of them could be prevented using modern tech, but considering that we can't even get speed cameras socially/legally accepted I don't see it happening. We just don't have a culture of caring about this stuff in the US, despite >40,000 people dying each year due to traffic fatalities.

[1] https://www.nhtsa.gov/risky-driving/drunk-driving [2] https://www.nhtsa.gov/risky-driving/speeding [3] https://www.nhtsa.gov/risky-driving/distracted-driving


Firearms are a constitutionally enshrined right - driving is not. For the vast majority of Americans, cars represent a significantly higher threat than assault by gun [1]. We also let drivers flagrantly and repeatedly break the law and negligently kill people with essentially near-impunity. The same cannot be said for firearms.

[1] https://injuryfacts.nsc.org/all-injuries/preventable-death-o...


Writing my own "OS" (read: mostly copying from the OG Bran's Kernel Development Tutorial [1]) was also a formative experience for me as a teenager. Really great way to learn systems programming and what goes on under the hood!

[1] http://www.osdever.net/bkerndev/Docs/gettingstarted.htm


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: