MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention [video]

matt_langston · on April 2, 2023

I wonder why that parameter is called "h"? Hmmm ...

Why not just say the word "hysteresis" and bring some magnets to class for show-and-tell to help your students develop an intuition for the "h" parameter in RNNs.

tysam_and · on April 2, 2023

Like jimsimmons said below, I believe it traditionally refers to 'hidden'. which was in vogue at the time for both feedforward nets and RNNs as well as any other other neural networks in the 90's or so and on. This trend actually continued for a while and I learned it in one of Hintons' main online classes which was made somewhere between 2012-2015 or so IIRC (though I opted to switch to reading and trying to implement raw papers instead as my brain works intuitively strangely, on the whole).

You can think of it as everything the RNN knows about what you're doing and a thing that evolves from place to place as you go. Because it is iterated on itself as a map, it abides by some very interesting properties that let it represent some very difficult functions, though actually attaining a representation of those functions is rather difficult indeed in my experience from what I've seen.

There are one or two rather successful projects trying to keep RNNs both alive and competitive with transformers. I think they do very well on the whole, though the transformers seem to have slightly improved parameter efficiency, generally speaking.

I hope this helps you with your question, please do let me know if you have any other follow up questions on this topic/matter. (: (: :) :)

kalimanzaro · on April 2, 2023

Hmm i read a tonne of RNN lit before 2020 and 'd never come across the term "hysteresis parameter" standing in for the hidden units. is it a recent trend? Google seem to suggest so

tysam_and · on April 2, 2023

I didn't mention anything at all about a hysteresis parameter.

jimsimmons · on April 2, 2023

Hidden state?

totetsu · on April 2, 2023

hysteresis is also important to understand for working with radio networking.

zxexz · on April 2, 2023

And for analog electronics in general!

ajdegol · on April 2, 2023

And for non linear forcing of plasmas… but it’s been many years since my phd

matt_langston · on April 2, 2023

As defined by the lecturer herself, "h(t) = h(t-1)", the very definition of hysteresis.

My point is that the lecturer missed a golden opportunity to give her students a natural intuition of "h" that they can see, feel and touch and that will serve them well for their entire careers.

The only thing "hidden" about "h" is that hysteresis is hidden in plain site in her lecture - maybe the lecturer did not know herself.

Neural networks have an undeserved reputation for being mysterious, and maybe that is partly due to a lack of basic physics knowledge.

lupire · on April 3, 2023

> As defined by the lecturer herself, "h(t) = h(t-1)", the very definition of hysteresis.

How is that a definition of hysteresis?

Hysteresis is when state is a function of previous state, not identical to previous state.

matt_langston · on April 3, 2023

It's just simplified pseudoscope using the lecturer's own notation from her slides to make my point.

The following is the lecturer's full TeX form if that helps:

h(t) = \tanh \left(h(t-1) W_{\text{hh}}^T+x(t) W_{\text{hx}}^T\right)

However, I don't want our readers to get distracted by line noise; h(t) = h(t - 1) makes my point.

woodson · on April 2, 2023

Back in the day, having taken some kind of statistical signal processing course would have been common before getting into neural networks. That would likely have covered a lot of intuitions.

ramraj07 · on April 2, 2023

Which is the best course or set of videos to learn the basics of neural networks and deep learning? Something that really gets the best explanation of things like backprop?

seer · on April 2, 2023

I’ve been watching https://karpathy.ai/zero-to-hero.html and they seem amazing so far, literally go to gpt from basic math and programming - he first codes the libs from scratch to show you how the internals work, and only then uses an off the shelf production lib.

nightski · on April 2, 2023

It's a great resource, but very focused on language models. I'd say if you are looking for a general overview of deep learning that fastai is fantastic.

quickthrower2 · on April 2, 2023

Yes this is a great course, on every level. I recommend to do the exercises after each lecture to cement the concepts too. The exercises are good too. Stretching but achievable.

j1br · on April 2, 2023

what do you mean by exercises are there any specific exercises or the ones who is solving while he's explaining the material?

quickthrower2 · on April 2, 2023

under the first video there is a link to a google colab document with exercises - different but similar to what is in the video.

joshvm · on April 2, 2023

http://cs231n.stanford.edu/

cs231n is still a really solid course, despite the more recent lecture videos not being publicly available.

armcat · on April 2, 2023

Not a course, but I would highly recommend Deep Learning with Python, by Francois Chollet, creator of Keras. Incredibly approachable book that covers everything from tensors and backprop, to mixed precision and multi gpu scaling, and includes time series, language, vision and audio in between

derangedHorse · on April 2, 2023

Andrew Ng's machine learning course on Coursera is still my favorite introduction to neural networks

tysam_and · on April 2, 2023

Karpathy's zero to hero series is excellent, and I really recommend it.

I also made a few repos that are geared around readability and being a good 'working code demonstration' of certain best-practices in neural networks. If you're like me and you grok code better than symbols, this could be a helpful adjunct as well if you're wanting to dig deep a bit.

https://github.com/tysam-code/hlb-CIFAR10 (convolutional networks and imaging stuff. it's fast. very fast.)

https://github.com/tysam-code/hlb-gpt (pruned-down base of nanoGPT with training-speed-focused changes built on top of it. Check out the 0.0.0 tag from the repo if you want the barest of bare-bones implementations. Thanks!)

Both of these implementations are pretty straightforward for what they do but CIFAR-10 has less dynamic scheduling and stuff so it might be easier to fit in your head. However, both are meant to be simple (and extremely hackable if you want to poke around and take apart some pieces/add different watchpoints to see how different pieces evolve, etc. I am partially inspired by, among many things, one of those see-through engine kits that I saw in a magazine growing up as a child that I thought was a very cool, dynamic, and hands-on way to just watch how the pieces moved in a difficult topic. Sometimes that is the best way that our brains can learn, though we are all different and learn best differently through different mediums in my experience).

Feel free to let me know if you have any specific questions and I'll endeavor to do my best to help you here. Welcome to an interest in the field!

I guess to briefly touch on one topic -- some people focus on the technical only first, like backprop, and though math is required heavily for more advanced research, I don't learn concepts very well through details only. Knowing that backprop is "Calculate the slope for the error in this high-dimensional space for how a neural network was wrong at a certain point, then take a tiny step towards minimizing the error. After N steps, we converge to a representation that is like a zip file of our input data within a mathematical function" is probably enough for 90-95% of the usecases you will do as a ML practitioner, if you do so. The math is cool but there are more important things to sweat over IMO, and I think messaging to the contrary raises the barrier to entry to the field and distracts from the important things, which we do not need as much. It's good to learn after you have space in your brain for it after you understand how the whole thing works together, though that is just my personal opinion after all.

Much love and care and all that and again feel free to let me know if you have any questions please. :) <3

ramraj07 · on April 2, 2023

Thanks, will check these out for sure. Already digging Karpathys code first approach in the first few hours!

tysam_and · on April 3, 2023

Yeah, he is great. Hard to really estimate the impact he's had on the deep learning community on the whole/at large.

We could always use more people sharing and spreading knowledge around like him! I hope to find a similar brand to that someday for myself, though I'm constantly growing into those boots, I think. :') <3

If you have any questions on that too feel free to let me know here and I can try to answer them. It's certainly a very interesting field! <3 :DDDD :)))) <3

hnarayanan · on April 2, 2023

Look for Karpathy’s CS231n lectures on YouTube. They are really good.

wslh · on April 2, 2023

Since the topic of this thread is this MIT course, Is this video great for you? I liked it.

kevmo314 · on April 2, 2023

Interesting that they're using TensorFlow. Is TensorFlow still common for new projects or is it because Google is sponsoring the class?

fnbr · on April 2, 2023

It is not common for new projects. The vast majority of new projects use PyTorch, with some using tensorflow and some using JAX.

rlt · on April 2, 2023

Can PyTorch be used to implement LLMs / GPT?

visarga · on April 2, 2023

PyTorch is the most used neural net framework in research. Almost everything appears in PyTorch first.

obscur · on April 2, 2023

Yes, see also the transformers library containing many transformer models in pytorch

tysam_and · on April 2, 2023

quite easily, yes

rattray · on April 3, 2023

Are these the sorts of Transformers and "Attention" heads that play a prominent role in LLMs like GPT3 et al? Would this lecture help build a foundational understanding of those technologies?

ramblerman · on April 2, 2023

Bit of a side tangent, but why does MIT upload (or allow the upload) of these videos under the staff member's youtube. Alexander Amini in this case.

It makes it hard to find and subscribe to. And also a bit weird from an ownership perspective.

For better or worse I think it's how Lex Fridman got his initial boost, I believe his personal youtube channel contained some popular MIT lectures of him at the start.

jessemcbride · on April 2, 2023

When I worked in higher ed, I learned that professors retain a lot more ownership of their content than I expected. I don’t know if this is how it works everywhere, but I wouldn’t be surprised if that was the case here.

joshvm · on April 2, 2023

In basically all cases course material is developed by professors and TAs, though the current lecturer might have adapted from several years of previous work. Who owns the IP exactly is tricky (likely the university has at least an equal stake?), but I expect that if lectures are recorded, but not open-access then it's the lecturer's decision. It might be copyright issues, re-use of someone else's slides, not wanting to let students see past years' work, etc.

For example, I work at ETH and we have a large internal video archive of lectures (as do many universities pre- and post-covid), but some lecturers choose to post material on YouTube too. It's not a blanket yes/no policy at the institutional level, as far as I'm aware.

fshbbdssbbgdd · on April 2, 2023

I thought MIT had to remove a huge catalogue of institutional content (under the opencourseware label) because of an ADA lawsuit. The problem was they didn’t hire people to subtitle it, which was discrimination.

Foobar8568 · on April 2, 2023

It was Berkeley, I don't think MIT was impacted.

schizo89 · on April 2, 2023

[flagged]

credit_guy · on April 2, 2023

> They have blood on their hands and history of silencing and oppressing dissidents such as Aaron Schwartz.

I guess what you have in mind is that Aaron Swartz was arrested by the MIT Campus Police (together with a Secret Service agent). This is not oppression by MIT. The MIT Campus Police received the arrest warrant from a federal prosecutor (Carmen Ortiz) and they needed to carry out the arrest. They did not have a choice in the matter.

If anything MIT was the opposite of what you portray them to be. They had an open campus policy at the time (they stopped that during Covid). Anyone could wander inside the campus and walk through the MIT buildings, and in some cases could access computers, which Aaron Swartz did in order to download articles. Actually, MIT helped Aaron in carrying out his mission.

As for MIT helping the US military, yes, they did and they do that. I'm proud of it. If you think the military is bad, you haven't been paying attention lately, especially in the last one year and a bit.

sudosysgen · on April 2, 2023

Just because other militaries are bad doesn't make the US military good. It's perfectly analogous in its wars of agression, and supporting it is morally bad, no matter how hard you whatabout it.

credit_guy · on April 2, 2023

Ukraine has great fighters and great spirit. But fighters and spirit can take you only that far. Without the Javelins and Himars, Ukraine would have collapsed in a matter of weeks. Without a doubt some of the tech that goes in those Javelins and Himars was invented at MIT.

This is not whataboutism. This is how you stop tyranny.

TylerLives · on April 2, 2023

Do you think the outcome of the war matters when evaluating the actions of the international community in Ukraine? For example, if millions of people die in Ukraine as a consequence of war that was only possible because NATO decided to arm Ukrainians, will it be justified, because they were fighting for democracy? What if Ukraine loses anyway?

sudosysgen · on April 2, 2023

The weapons you're talking about killed over a million people in tyrannical wars and terrorism. They are weapons of tyranny.

credit_guy · on April 2, 2023

I understand that you'd like to live in a world where everyone loved each other and there are no weapons, and people walk around carrying roses and tulips.

That world does not exist. The world we are living in is a world where Putin exists. A world where a disarmed US and Western Europe would be conquered in less time than it would take you to sing Kumbaya.

I prefer to live in a world where MIT helps the US military complex come up with better missiles than a world where Putin kills millions of men, women and children in the name of "denazification".

sudosysgen · on April 2, 2023

I'm telling you that your argument if it was valid would justify the people at the Moscow Institute of Technology (MIT) making weapons for the Russian military because they were used by {choose country criminally invaded by the US} to defend themselves against invaders.

Your position is hypocritical unless you think that it's morally good to make weapons for Putin.

neilv · on April 2, 2023

MIT has a lot of great students, staff, and faculty. The MIT Open Courseware and other publicly shared instruction videos are of great benefit to the world. Boycotting any of that sounds silly.

What happened to Aaron Swartz is tragic. Afterwards, there was an internal push to investigate what happened, and the report was made public. I think you'll find a lot of MIT affiliates who share good qualities with Swartz.

Marvin Minsky sadly died in 2016. People interested in the big picture and history of AI might do well to read some of his writings.

There are many valid criticisms, concerns, and opinions about MIT. But MIT is a big place, which attracts people from around the world, for various reasons. If you boycott people due to association with MIT, you're depriving yourself, and also depriving the world of the benefit of collaborations/synergies.

albntomat0 · on April 2, 2023

Might as well be complete, and filter out websites based on MIT developed programming languages

https://en.wikipedia.org/wiki/Hacker_News (Note the programming language, and find what it's based on)

https://en.wikipedia.org/wiki/Lisp_(programming_language)

schizo89 · on April 2, 2023

[flagged]

748 · on April 2, 2023

Your nickname is lit. We live in a world where we are not the only superpower. If we will stop developing capable tools for dealing with adversaries, adversaries will use that to their benefit and take advantage of our inferior toolings. Please try to understand what I am trying to say: no-one wants to have blood on their hands, but the reality is that we are not the ones who decide, whether it will be spilled, all we can do is make sure it is not going to be ours. And on a sidenote, the companies you mentioned have achieved some astonishing from the engineering point of view achievements, which I personally admire just because it is state of the art in so many ways.

flangola7 · on April 2, 2023

Someone can develop it, but it won't be me.

spiderPig · on April 2, 2023

That user name though

schizo89 · on April 2, 2023

Exactly, and it has something to do with Ken Kesey. Learn about what they did to Aaron, and you will see the parallels

kossTKR · on April 2, 2023

Exactly. Unfortunately hackernews these days is huge fans of the military and the richest classes without much question.

Psyops don't exist, research journalism is tinfoilhattery and all science is awesome science.

Question one state and you're apparently fan of another, it's become the perfect setup for the status quo.

In reality the US intelligentsia is tiny and unfathomably powerful, an octopus with its arms deep in everything "academia" and technology, this is apparently mind boggling even after a myriad of leaks over the last 30 years, and no one dares to do even the simplest network analysis because of an enormous amount of media controlled shitcoating.

There's absolutely nothing "hacker" in hackernews anymore.

KyeRussell · on April 2, 2023

Leave then.