Hacker Newsnew | past | comments | ask | show | jobs | submit | more naturalgradient's commentslogin

The guy said 'allow me to disagree slightly' after complimenting her on her thread and she lashed out at him in a highly unprofessional manner.

Thinking you can interact with customers of your employer like this in public even on a private account is naive.

Trying to turn it into an oppressor/sexism shitshow is being a toxic employee and a liability.

Hard to see a scandal..


>They try hard to let in disadvantaged kids, but it's a crap shoot deciding who has potential and who doesn't. They rarely care about anything other than academics. (I'm not sure how they recruit the rowers.)

They don't try that hard.

Being vaguely involved with the Oxbridge undergrad admissions process in CS/Math i can tell you there is very little trying. A fuss is being made about coming from a disadvantaged background but in practice sadly the people running it only care about one thing: how well you can grind out an answer to a math olympiad style question in 15 minutes. Yes, extra-curriculars and well-roundedness don't matter which I think is a good thing because I believe in focusing on being great at one thing.

What it comes down to nonetheless is preparation and school support, e.g. via training for math competitions. Saying the interviews are about 'evaluating the thinking process' of the applicant is a fantasy when most applicants come from schools where they have been trained to do them for years. Oxbridge are not forthcoming about this but ultimately they take people who are already well groomed Math olympiad winners, not raw potential.

It's probably still better than opaquely selecting for race and like-ability and if this means many math undergrads are Asian, why should that be a problem? It's still unfair to disadvantaged children and this sucks, but at least the criteria are clear.

Ps: on your question how they recruit rowers: They let them study land economy, that's the joke at least.


I have interviewed maths applicants at Oxford, with the caveat that it was 20 years ago.

The things that struck me were that a lot is dependent on the individual interviewers. I tried hard to look for potential as did most of the interviewers I spoke to did but can be easier said than done. The exam wasn't a big factor in the decision making I found, unless somebody did really well or really badly, as the marks were all very bunched up and we couldn't see the scripts to see what people were getting right and wrong.

Also, the course is really hard for somebody who hasn't done double maths A-level, so if somebody only has single maths, you have to pretty confident that they will be able to basically teach themselves further maths A-level. There is obviously some support from tutors with that, but the set-up is aimed at people with double maths.

They definitely don't only take groomed Maths Olympiad winners and the questions are much easier than Olympiad questions (in my undergraduate year there were only two of us who had made it to the top 20 country in the British Maths Olympiad stuff for example - Cambridge filters them all off!)

I also don't think even the top schools train up for the interviews as much as you say - I went to a school that was top ten in the A-level league tables nd only had one mock interview.


The questions in the admissions exam and interviews are fairly straightforward, they are nowhere near any math olympiad level. Math olympiads normally require a host of advanced techniques, each of which is quite accessible but never gets taught in schools. Interview questions just need what gets taught plus some understanding of what a proof is.


I am aware, PhD students design and mark these questions, and can interview :)

The point is that if you go to a target school doing Math olympiads throughout your school life, the admissions exam and interview is a walk in the park. The applicants who didn't have any of this preparation can still do well but will fare relatively worse against that group, and I think this is very obvious in the ultimate intake.


It's incredibly hard (impossible?) to come up with a system which cannot be prepared for by a subset of students with access to top tier tuition. You can do some score normalisation based on background but I expect you'd agree that isn't going to help if people can't get through the first years course content?

I don't believe anyone wants the course to be easier. So how do you enable people who are underprivileged to catch up with a shitload of additional tuition which has been offered to other applicants? How do you distinguish effectively between the two, whilst also accepting high aptitude students?

It's not that I think any elite university does this brilliantly - but as someone with a decent level of exposure do you have a feeling on whether there are complete solutions?


Are those interviews by PhD students in addition to or replacement for interviews by proper dons?


Each applicant receives multiple interviews and I can perform one of these as a PhD student, and have received the same training any faculty member would have.

The interview process is relatively standardized, and if my results were to differ starkly from what more experienced interviewers do they would be disregarded, and the director of studies for that college would simply not invite me back to help.

I would add that asking PhD students to do this is not the worst thing because via supervisions and other teaching efforts, we have a good picture of what undergrads here need to be able to do. An interview is a like a short supervision.


Thanks - that actually sounds fairly sensible.


> Being vaguely involved with the Oxbridge undergrad admissions process in CS/Math i can tell you there is very little trying. A fuss is being made about coming from a disadvantaged background but in practice sadly the people running it only care about one thing: how well you can grind out an answer to a math olympiad style question in 15 minutes.

It varies a lot between colleges/interviewers. I initially interviewed at Trinity (Cambridge), and there was an admissions exam (roughly STEP I/II level) and the interview was solely focused on that. I was then pooled to Selwyn, and there was no admissions exam and the interview was much more general (split into three parts - NatSci DoS-barely even about the subject, more about yourself generally/why Cambridge; CompSci DoS-more general questions, it was more about coming up with an idea, or how you would implement something, and broader CS concepts, than being able to chase down a specific answer to a set question; Physics DoS-focused around a particular question, but was more interested in you being able to come up with a viable method, and explain your reasoning, than being able to do every single step there and then (I needed _a lot_ of hints, and was still admitted for CompSci, and that DoS later said that wasn't an issue - I could see the broader idea, and would have got there with time/being able to look things up)). Someone who interviewed directly with Selwyn said they had the same experience.


> What it comes down to nonetheless is preparation and school support, e.g. via training for math competitions. Saying the interviews are about 'evaluating the thinking process' of the applicant is a fantasy when most applicants come from schools where they have been trained to do them for years.

I'm from the U.S., so take my input with a grain of salt with regards to applying it to the English school system, but in my experience the majority of mathematical talent comes from outside work, with next to zero support from schools.


That's because you didn't go to an elite math school.


What kind of school are you referring to when you mention “elite math school”?


There are schools in various countries, e.g. Romania, which are known for producing extremely well prepared applicants. They select for these schools from the entire country and have them focus on sciences, math and computer science early on. They practice Oxbridge style interviewing/math olympiad questions to death.

This results in for instance there being proportionally many more Romanians at Oxbridge Cs/Math/Physics than you'd expect by population.


Ok, sure, but there are very few schools like that in the U.S. outside of homeschools.


most recruited rowers do one of the cash cow masters courses.


Just pointing out that interestingly Cambridge, where Fokas is a professor, has not released anything.

He is merely visiting USC so it strikes me as weird that they would claim this PR so quickly.

Also Mathematician-MD somehow makes it sound like the MD means he is a lesser mathematician or not a full mathematician. Fokas is a well respected Professor at one of the top applied Maths departments in the world. A better and less biased title would be 'Math Professor' or 'Cambridge math professor' claims..


When you actually read it, being a mathematician and an MD is more impressive, not less. The interesting thing about him is his cross disciplinary knowledge. I don't see the insult.


Isn't MD for..like.. medical doctors ?

If he's an MD but without a medical degree (which I guess is the case), then what's the difference betweeen MD, PHD or 'professor' ??


If you inspect [0], you'll find that his MD was, in fact, in medicine. What came before that, though, was a PhD in applied math from Caltech, and that immediately after receiving his MD he became chair of Clarkson University's Math and Computer Science department.

Mentioning his MD is a distraction and, as the previous poster commented, suggests that he's something of an amateur. This is very far from the case.

[0] http://www.damtp.cam.ac.uk/user/tf227/


So this goes off on a tangent but I feel it relates to noncentrality [0]. Fokas has a PhD in maths. Being an MD or having gotten an MD 40 years ago is clearly entirely non-central to his career. Calling him Mathematician-MD seems like it is meant to make him seem a lesser mathematician, e.g. by insinuating that this is just something he does part time, and that he can hence be taken less seriously.

I don't know what the poster meant by suggesting 'Mathematician-MD', but it reads weirdly to me for that reason. It's highlighting an attribute of a person that is entirely unrelated to his career or this article. Why if not to denigrate him? The title should be changed to neutrally reflect his position.

https://www.lesswrong.com/posts/yCWPkLi8wJvewPbEp/the-noncen...


I actually believe the intent is the opposite.

I think the PR people are just trying to sell Fokas as a polymath genius. "Look, he is not only a mathematician but also an MD, wow!"


Thank you, I had not considered that. I think researchers or a HN audience may perceive this differently than the average reader.


FWIW: my initial instinctive reaction to the MD part was most definitely negative.


An MD is a medical degree. It's possible to have an MD but not practice medicine professionally in the same way someone can earn a law degree (a JD) but choose to not be a practicing lawyer. "Professor" is a academic title for people at a university or advanced teaching institution. Most professors do have terminal degrees in their respective fields like PhDs, MDs, or JDs, so you could also call a Professor with a PhD "Dr. X" instead of "Prof. X". But, "Professor" is generally considered a more prestigious title since its much rarer and harder to get a professorship than get an advanced degree.


I think you should refrain from making such comments and follow the best-intention assumption on HN.


I would just want to comment that while this is true in principle, it's also slightly misleading because it does not include how much tuning and testing is necessary until one gets to this result.

Determining the scale needed, fiddling with the state/action/reward model, massively parallel hyper-parameter tuning.

I may be overestimating but I would reckon with hyper-parameter tuning and all that was easily in the 7-8 figure range for retail cost.

This is slightly frustrating in an academic environment when people tout results for just a few days of training (even with much smaller resources, say 16 gpus and 512 CPUs) when the cost of getting there is just not practical, especially for timing reasons. E.g. if an experiment runs 5 days, it doesn't matter that it doesnt use large scale resources, because realistically you need 100s of runs to evaluate a new technique and get it to the point of publishing the result, so you can only do that on a reasonable time scale if you actually have at least 10x the resources needed to run it.

Sorry, slightly off topic, but it's becoming a more and more salient point from the point of academic RL users.


I hear you. I would say that this work is tantamount to what would normally be a giant NSF grant.

Depending on your institution, this is precisely why we (and other providers) give out credits though. Similar to Intel/NVIDIA/Dell donating hardware historically, we understand we need to help support academia.


Yes, thank you for that by the way, did not want to diminish your efforts. Just wanted to point out that papers are often misleading about how many resources are needed to get to the point of running the result. I have received significant amounts of money from Google, full disclosure.


That's so awesome. Thanks for the exchange you two had. I love seeing the technology permeate through it's different causeways to become a useful and tangible product for more and more people. It's a thing of beauty to watch unfold each and every time, to me.


This is a very good point. While the final model might be a weekend of training, getting there is a lot more iterations/work.


So as someone working in reinforcement learning who has used PPO a fair bit, I find this quite disappointing from an algorithmic perspective.

The resources used for this are almost absurd and my suspicion is, especially considering [0], that this comes down to an incredibly expensive random search in the policy space. Or rather, I would want to see a fair bit of analysis to be shown otherwise.

Especially given all the work in intrinsic motivation, hierarchical learning, subtask learning, etc, the sort of intermediate summary of most of these papers from 2015-2018 is that so many of these newer heuristics are too brittle/difficult to make work, so we resort to slightly-better-than brute force.

https://arxiv.org/abs/1803.07055


(I work at OpenAI on the Dota team.)

Dota is far too complex for random search (and if that weren't true, it would say something about human capability...). See our gameplay reel for an example of some of the combos that our system learns: https://www.youtube.com/watch?v=UZHTNBMAfAA&feature=youtu.be. Our system learns to generalize behaviors in a sophisticated way.

What I personally find most interesting here is that we see qualitatively different behavior from PPO at large scale. Many of the issues people pointed to as fundamental limitations of RL are not truly fundamental, and are just entering the realm of practical with modern hardware.

We are very encouraged by the algorithmic implication of this result — in fact, it mirrors closely the story of deep learning (existing algorithms at large scale solve otherwise unsolvable problems). If you have a very hard problem for which you have a simulator, our results imply there is a real, practical path towards solving it. This still needs to be proven out in real-world domains, but it will be very interesting to see the full ramifications of this finding.


Thank you for taking the time to respond, I appreciate it.

Well I guess my question regarding the expensiveness comes down to wondering about the sample efficiency, i.e. are there not many games that share large similar state trajectories that can be re-used? Are you using any off-policy corrections, e.g. IMPALA style?

Or is that just a source off noise that is too difficult to deal with and/or the state space is so large and diverse that that many samples are really needed? Maybe my intuition is just way off, it just feels like a very very large sample size.

Reminds me slightly of the first version of the non-hierarchical TensorFlow device placement work which needed a fair bit of samples, and a large sample efficiency improvement in the subsequent hierarchical placer. So I recognise there is large value in knowing the limits of a non-hierarchical model now and subsequent models should rapidly improve sample efficiency by doing similar task decomposition?


The best way we know to think of it is in terms of variance of the gradient.

In a hard environment, your gradients will be very noisy — but effectively no more than linear in the duration you are optimizing over, provided that you have a reasonable solution for exploration. As you scale your batch size, you can decrease your variance linearly. So you can use good ol' gradient descent if you can scale up linearly in the hardness of the problem.

This is a handwavy argument admittedly, but seems to match what we are seeing in practice.

Simulators are nice because it is possible to take lots of samples from them — but there's a limit to how many samples can be taken from the real world. In order to decrease the number of samples needed from the environment, we expect that ideas related to model-based RL — where you spend a huge number of neural network flops to learn a model of the environment — will be the way to go. As a community, we are just starting to get fast enough computers to test out ideas there.


Yo, this probably isn't the type of HN comment you're used to, but I just wanted to say thanks for enriching the dota community. I know that's not really why you're doing any of this, but as someone who's deeply involved with the community, people get super hyped about what you guys have been doing.

They also understand all of the nuances, similar to HN. Last year when you guys beat Arteezy, everyone grokked that 5v5 was a completely different and immensely difficult problem in comparison. There's a lot of talent floating around /r/dota2, amidst all the memes and silliness. And for whatever reason, the community loves programming stories, so people really listen and pay attention.

https://imgur.com/Lh29WuC

So yeah, we're all rooting for you. Regardless of how it turns out this year, it's one of the coolest things to happen to the dota 2 scene period! Many of us grew up with the game, so it's wild to see our little mod suddenly be a decisive factor in the battle for worldwide AI dominance.

Also 1v1 me scrub


Agreed! Can't wait to not have to play Dota 2 with humans :p


> Also 1v1 me scrub

I wanted to play SF against the bot so badly - even knowing I'd get absolutely destroyed over and over agin


EDIT (I work at OpenAI and wrote the statement about the variance of the gradient being linear): Here's a more precise statement: the variance is exponential in the "difficulty" of the exploration problem. The harder the exploration, the worse is the gradient. So while it is correct that things become easy if you assume that exploration is easy, the more correct way of interpreting our result is that the combination of self play and our shaped reward made the gradient variance manageable at the scale of the compute that we've use.


> In order to decrease the number of samples needed from the environment, we expect that ideas related to model-based RL — where you spend a huge number of neural network flops to learn a model of the environment — will be the way to go.

Will those models be introspectible / transferrable? One thing I'm curious about is how AI's learn about novel actions / scenarios which are "fatal" in the real world? Humans generally spend a lot of time being taught these things (rather than finding out for themselves obviously) and eventually come up with a fairly good set of rules about how not to die in stupid ways.


Transferability depends on the way the models is set up, and moves on a scale.

Introspectable: given that you can ask unlimited "What if" questions models, we should be able to get a lot of insights into how the models work internally. And you can often design them to be introspectable as some performance or complexity cost. (if that's what you meant by introspectable).


Can you clarify why variance only scales linearly in the duration you are optimizing over? I would have expected it to be exponential, since the size of the space you are searching is exponential in the duration.


Re variance, the argument is not entirely bullet proof, but it goes like this: we know that the variance of the gradient of ES grows linearly with the dimensionality of the action space. Therefore, the variance of the policy gradient (before backprop through the neural net) should similarly be linear in the dimensionality of the combined action space, which is linear in the time horizon. And since backprop through a well-scaled neural net doesn't change the gradient norm too much, the absolute gradient variance of the policy gradient should be linear in time horizon also.

This argument is likely accurate in the case where exploration is adequately addressed (for example, with a well chosen reward function, self play, or some kind of an exploration bonus). However, if exploration is truly hard, then it may be possible for the variance of the gradient to be huge relative to the norm of the gradient (which would be exponentially small), even though the absolute variance of the gradient is still linear in the time horizon.



That makes sense, thanks for clarifying!


> Dota is far too complex for random search

Why? We know that random search is smart enough to find a solution if given arbitrarily large computation. So, that random search is not smart enough for Dota with the computational budget you used, is not obvious. Maybe random search would work with 2x your resources? Maybe something slightly smarter than random search (simulated annealing) would work with 2x your resources?

> and if that weren't true, it would say something about human capability

No it would not. A human learning a game by playing a few thousand games is a very different problem than a bot using random search over billions of games. The policy space remains large, and the human is not doing a dumb search, because the human does not have billions of games to work with.

> See our gameplay reel for an example of some of the combos that our system learns

> Our system learns to generalize behaviors in a sophisticated way.

You're underestimating random search. It's ironic, because you guys did the ES paper.


> If you have a very hard problem for which you have a simulator, our results imply there is a real, practical path towards solving it.

Are there that many domains for which this is relevant?

Game AI seems to be the most obvious case and, on a tangent, I did find it kind of interesting that DeepMind was founded to make AI plug and play for commercial games.

But unless Sim-to-Real can be made to work it seems pretty narrow. So it sort of seems like exchanging one research problem (sample-efficient RL) for another.

Not to say these results aren't cool and interesting, but I'm not sold on the idea that this is really practical yet.


Simulation to real learning seems to be slowly and steadily improving? Eg as seen in https://ai.googleblog.com/2017/10/closing-simulation-to-real...

Transfer learning, which seems more widely researched, has also been making progress at least in the visual domain.


There seems to be a bunch of work in this area, but I have no idea how you measure progress in this area, it's not like you can do evaluations on a shared task.

And it's clearly not solved yet either - 76% grab success doesn't really seem good enough to actually use, and that with 100k real runs.

I don't really know how to compare the difficulty of sim-to-real transfer research to sample efficient RL research, and it's good to have both research directions as viable, but neither seems solved, so I'm not really convinced that "just scaling up PPO" is that practical.

I'm hoping gdb will be able to tell me I'm missing something though.


>> Our system learns to generalize behaviors in a sophisticated way.

Could you elaborate? One of the criticisms of RL and statistical machine learning in general is that models generalise extremely poorly, unless provided with unrealistic amounts of training data.


Why Dota and not something like adverse-weather helicopter flying which is more "useful"?


If I had to guess I would say that Dota is a very complex environment that could be akin to real-world complexity that is simulatable to the point that simulation and the real game work identical. The real world isn't nearly as clean, however, as we get better and better at these "toy" examples we likely could learn more efficiently on the real world problems.


I think the "simple random search" algorithm in the paper you linked is not so simple -- it's basically using numerical gradient descent with a few bells and whistles invented by the reinforcement learning community in the past few decades. So perhaps it would be more fair to say that gradient descent (not random search) has proven to be a pretty solid foundation for model-free reinforcement learning.


Yes, I am aware, I did not mean random search as in random actions, but random search with improved heuristics to find a policy.

The point being that that the bells and whistles of PPO and other relatively complaticated algorithms (e.g. Q-PROP), namely the specific clipped objective, subsampling, and a (in my experience) very difficult to tune baseline using the same objective, do not significantly improve over gradient descent.

And I think Ben Recht's arguments [0] expands on that a bit in terms of what we are actually doing with policy gradient (not using a likelihood ratio model like in PPO) but still conceptually similar enough for the argument to hold.

So I think it comes down to two questions: How much do 'modern' policy gradient models improve on REINFORCE, and how much better is REINFORCE really than random search? The answer thus far seemed to be: not that much better, and I am trying to get a sense of if this was a wrong intuition.

[0] http://www.argmin.net/2018/02/20/reinforce/


When optimizing high-dimensional policies, the gap in sample complexity between PPO (and policy gradient methods in general) and ES / random search is pretty big. If you compare the Atari results from the PPO and ES papers from OpenAI, PPO after 25M frames is better than ES after 1B frames. In these two papers, the policy parametrization is roughly the same, except that ES uses virtual batchnorm. For DOTA, with a much bigger policy, I'd expect the gap between ES and PPO to be much bigger than for Atari.

My takeaway from [0] and Rajeswaran's earlier paper is that one can solve the MuJoCo tasks with linear policies after appropriate preprocessing, so we shouldn't take them too seriously. That paper doesn't do an apples-to-apples comparison between ES and PG methods on sample complexity.

All of that said, there's not enough careful analysis comparing different policy optimization methods.

(Disclaimer: I am an author of PPO)


Me too, at work (PhD student) I take enormous pleasure in writing end to end fully automated pipelines. Nothing that needs to be done more than once isnt automated pretty much. At home, I would never buy or use an Alexa/Google assistant/use Siri or any 'smart' device, I simply don't want to bother with it.


Exactly. Anything that needs even very minimal human interaction like pressing a "y" key is not automated at all. It is very common to see in the corporate environment that scripts needs some editing in b/w to be done during its running with editor which can be easily automated using sed.


An interesting effect I have noted for me when doing research is that even if it only takes one extra click or changing one parameter to generate a plot, or run some extra analysis, I will only do it selectively and start rationalising it to myself that I know when a certain plot or analysis will help and when it won't.

This becomes especially egregious over month to year long experiments where I run the same experiment every day on end.

There was really no reason not to auto generate every possible plot, every possible analysis every time (and I cannot use ipython notebooks or things like that because it's many distributed things chained together with lots of scheduling).

The productivity gains have been enormous and are hard to overstate. I don't dread any experiment any more because even in a large complicated distributed setup, everything from initialising kerberos tickets to tons of config files, restarting services, running multiple experiments dependent on each other, and generating plots and summaries and committing them to a repo is one command. Anything that's analysed once is evaluated always.

I now almost look forward to setting up new experiments because of the pleasure I get from just chaining together calls from my control utilities.

All I have to do is pull on my laptop and I download a filter with all results pre-generated paper ready. I think a lot of people do this in experiments where everything is on a single machine, but I haven't seen it as excessive from other phd students doing complicated distributed stuff. There is always a lot of manual command line args passing, manually changing some config while instead of just creating dedicated scripts, etc.


And as an added bonus, your experiments are more reproducible. Kudos!


I came here to post this!

And of course the (sort-of) French equivalent about life in modern times: In search of lost time.


This is sort of what the report of the week ('reviewbrah' [0]) is doing, no? Reviewing fast food in a quirky manner. Million+ subscribers on youtube, really interesting to have followed this for 5+ years when he had a few hundred subscribers.

Doing the exact same thing with very small refinements, over and over. It's not that glamorous, but I reckon at 1M+ subscribers he is doing quite a lot better than many obscure luxurious lifestyle personalities.

[0]: https://en.wikipedia.org/wiki/TheReportOfTheWeek


I'm from Nepal and I've been subscribed to reviewbrah since 2016. I suspect his viewership is much more diverse and widespread that you'd think. It's the internet after all.


So given the choice of a comfortable, safe, environmentally responsible mode of transportation, you chose to go for the option where you get to burn lots of fossil fuel and move at a speed in a metal box at which you cannot control it any more if anything happens and which you had no experience in, putting your own and other lives at risk.

As a German, I wish we would impose strict speed limits on the Autobahn to avoid tourists, or anyone for that matter, do such wasteful dangerous joyrides.

Note: have been in a car accident so appreciate I am very very biased against how offhand and casually people go about such behaviour, how lives are destroyed for no reason.


I’m a believer in the cost of something being a representation of the resources used up in providing that product/service.

Even if the train itself produces less CO2 than a car, somewhere along the line those extra EURs are being used to buy things that produced CO2. And Germany’s gas taxes keep that in mind.

Or maybe the car company just needed to relocate cars out of Munich (no surprise during Oktoberfest) and there weren’t many train tickets left.


> Even if the train itself produces less CO2 than a car

Deutsche Bahn advertises their long-distance trains as running on 100% renewable energy.


Which is just that, marketing.

Almost half of German electricity is from coal.

It’s all the same grid. If DB earmarks some renewable electricity for itself, then the non-renewable mix is increased for another customer.


Deutsche Bahn operates their own grid and gets about two thirds of their energy from power plants directly connected to that on long-term contracts (the rest is fed in from the public grid with some inverters).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: