I looked through their torch implementation and noticed that they are applying RoPE to both query and key matrices in every layer of the transformer - is this standard? I thought positional encodings were usually just added once at the first layer
All the Llamas have done it (well, 2 and 3, and I believe 1, I don't know about 4). I think they have a citation for it, though it might just be the RoPE paper (https://arxiv.org/abs/2104.09864).
I'm not actually aware of any model that doesn't do positional embeddings on a per-layer basis (excepting BERT and the original transformer paper, and I haven't read the GPT2 paper in a while, so I'm not sure about that one either).
I think you cropped out the important part of the quote:
> It’s rare that I have more than a drink or two in one night.
I don't drink that often any more, but 2-3 drinks in a night, done occasionally is not a problem. I've had weeks where I drink a beer (or two!) every night, and also don't struggle with any alcohol problems.
2 drinks every single night? Leaning that way - and not great for you just from a health/caloric perspective.
I always wonder why people would make such obvious selective edits that completely change the meaning of a sentence and quote it as if it was what the author intended.
Do they not think people will notice? Or do they not notice that they've even done it?
One possibility is trying a different form of cardio. I personally don't enjoy running at all... but I love cycling. Running for 30 minutes is super boring, but I can go do a 4-hour ride no problem. If you can't go outside at all, then this won't really help you though.
Same here : I discovered the fun of rollerblading in skate parks at 34.
Never did any sport in my whole life, officially obese, but now I’m taking a collective course in a skatepark every week and I’m having so much fun that I’m forcing myself to do more sessions even when I don’t feel like it. And even if I’m still pretty "bad" at it, it’s just amazingly liberating.
N,N-DMT is very intense and not to be taken lightly - but you could say the same with LSD, psilocybin, etc. Personally, I am much more wary of large doses of LSD/psilocybin than DMT, in part to the substantially longer duration of the former. Ego death and the complete dissolution of reality makes it harder to have a bad trip
When time stops until the end of this universe gives way to the beginning of this universe and the snake eats it's tail, "longer" doesn't hold much meaning...
Most car "accidents" are intentional - drivers are choosing to speed (35%), drive intoxicated (30%), or scroll tiktok (15%) rather than pay attention to the road.
I just googled "car fatalities [alcohol/distracted]" etc, and found either NHTSA or California statistics. All of them could be prevented using modern tech, but considering that we can't even get speed cameras socially/legally accepted I don't see it happening. We just don't have a culture of caring about this stuff in the US, despite >40,000 people dying each year due to traffic fatalities.
Firearms are a constitutionally enshrined right - driving is not. For the vast majority of Americans, cars represent a significantly higher threat than assault by gun [1]. We also let drivers flagrantly and repeatedly break the law and negligently kill people with essentially near-impunity. The same cannot be said for firearms.
Writing my own "OS" (read: mostly copying from the OG Bran's Kernel Development Tutorial [1]) was also a formative experience for me as a teenager. Really great way to learn systems programming and what goes on under the hood!
reply