Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Understanding Agent Cooperation (deepmind.com)
126 points by piokuc on Feb 13, 2017 | hide | past | favorite | 57 comments


The AI can minimize loss / maximize fitness by either moving to look for additional resources, or fire a laser.

Turns out that when resources are scarce, the optimal move is to knock the opponent away. I think this tells us more about the problem space than the AI itself; it's just optimizing for the specific problem.


I think you're right, had the AI (like the second game) had the incentive to maximize the well being of the cooperating actors, rather than itself, the outcome would be different.

But if advanced AI is being developed in a capitalist economy by independent actors, it seems most likely the incentives will be anything other than optimizing the output for the individual outcome.

If that AI finds a way to "hurt" the other actor, there could be some major boat load of unintended consequences.


> I think this tells us more about the problem space than the AI itself; it's just optimizing for the specific problem.

My reading is that the point of this research is to find out what problem spaces are conducive to cooperation, not to find out details of how these particular agents work.


Yeah — after reading this article and the paper, I agree. The previous link [0] on this submission was much more sensationalized and very misleading.

[0] http://www.sciencealert.com/google-s-new-ai-has-learned-to-b...


I'm rather worried about the wording used, and AI being created in that context. Do we really not realize what we're doing? AI is not magic, it's not free from fundamental math, it's not free from corruption. It's just going to multiply it that much more.

Any AI that has been programmed to highly value winning is not going to be very cooperative. For it to be cooperative, especially in situations that simulate survival, it needs to have higher ideals than winning, just like humans. It needs to be able to see and be aware of the big picture. You don't need to look at AI for that, you can just look at the world.

Development of AI's of this nature will just lead to a super-powered Moloch. Cooperative ethics is a highly advanced concept, it's not going to show up on its own from mere game theory without a lot of time.


Cooperative ethics arise immediately in the Prisoner's Dilemma merely by adding an unknown number of iterations to the game. The most efficient strategy is a version of tit-for-tat.


I'm assuming you're referring to something like this: https://egtheory.wordpress.com/2015/03/02/ipd/

I think we shouldn't confuse efficient strategies with the chosen strategies. What causes Moloch is the inability to see the big picture, to see outside of the self in the collective (maybe Buddhism has a point).

An efficient strategy may very well be something we'd prefer, such as tit-for-tat. But is that the strategy we choose? Looking at the long history of evolution, I'd say no.


In the long run, we've built massively complex human societies that develop intricate technologies. Technologies whose production requires supply chains many thousands of people, and so complex that nobody involved understands all the technologies involves. All so some people can contend that humanity isn't able to see the collective beyond the self.

I would say we have a demonstrated ability of seeing the big picture, and a pretty good track record of making it work.


> I would say we have a demonstrated ability of seeing the big picture, and a pretty good track record of making it work.

alternative explanation, given for the sake of argument:

we have a terrible ability to see the big picture, but have come up with some ingenious constructions where the small picture of each component in the system is correctly calibrated so the big picture outcome is successful. as you yourself pointed out, the supply chains are so complex that nobody involved understands all of it.

now, how would we go about distinguishing between which of these possible interpretations is correct?

thought experiment goes like this: suppose the big picture requires that some actors in the system do not receive satisfactory treatment in their local context, and that the only benefits those actors receive will be indirect, as benefits accrued to other actors in the system, but not to adjacent actors. will those actors still agree to participate or not?


> In the long run

Hence I said:

> without a lot of time

An AI spending a lot of time doing effectively the same thing humans have been doing (read: propagation of immense amounts of suffering) is not really something I'd want to see repeated. It seems rather obvious that these conclusions are very difficult and slow to arrive at at a proper scale, so no AI will have them by default. They'll be aggressive by default, just like your average animal in evolution. The fact that given their own millions of years (sped up) they may eventually arrive at the rudimentary level of cooperation that humans possess does not instill a lot of hope in me.

> I would say we have a demonstrated ability of seeing the big picture, and a pretty good track record of making it work.

I'm talking about this: http://slatestarcodex.com/2014/07/30/meditations-on-moloch/

A good example that's going to be hard to ignore will be the upcoming climate change due to humans catastrophically failing to see the big picture and focusing on smaller gains within their sub-groups. It really doesn't have much to do with complexity, but it has everything to do with the very same behavior you're seeing the AI execute here.


I cant help but notice you and someone else are here both promoting this "Moloch" stuff and that particular website slatestarcodex

https://news.ycombinator.com/reply?id=13636150&goto=threads%...

why?

Frankly i don't feel it's productive or rational to attach the name of a biblical villain to new technology.


It's not specifically new technology that's being deliniated by "Moloch". The article is long but explains clearly the (not entirely new) association: Moloch represents the sacrifice of human values on the altar of competitiveness, and is best staved off through coordination.


> Moloch represents the sacrifice of human values on the altar of competitiveness, and is best staved off through coordination.

That reply, while informative, continues with the weighted religious terminology. It may find better reception here couched differently.


Moloch is, in this case, a reference to a poem by Allen Ginsberg.


SSC, while fairly controversial (and I strongly disagree with a LOT of what's on there), is mostly known to HN. At the moment, it has the best summary of the concept that I'm aware of.

> Frankly i don't feel it's productive or rational to attach the name of a biblical villain to new technology.

Well, frankly, I disagree. Humans have an inherent blind spot when it comes to complex systemic forces. We tend to imagine them as weak and irrelevant. Reframing them as villains seems to be necessary to understand their power and reach.


Not to totally sidetrack the discussion, but what are some of the things that you strongly disagree with on SSC? (The Moloch article is one of the most fascinating ones I have read.)

And, by the way, I had not considered the Moloch article as a direct re-framing of a problem until you put it as such. I must say thinking about it in that light I find humanizing `complex systemic forces` a rather novel transformation and quite useful. Even having read the article a few times, I hadn't thought to describe it as such. But morphing a problem from one fairly inscrutable set of phenomena to be a villain allows us to use a different set of mental tools to tackle understanding the problem.

Typically I had though more restrictively about such transformations, for example, viewing a sound's waveform graphically can be illuminating in a certain sense (transforming audio-temporal, to visual-spatial). The biggest issue with the toMoloch transform is that the conversion process is obviously going to be significantly more noisy and provide the author the ability copious amounts of wiggle-room to steer the reader towards their own conclusions. But just expressing the facets of the problem and making its existence more well known has a lot of value. Anyhow thanks for helping me see an article I have gotten quite a bit of insight out of in another way.


> Not to totally sidetrack the discussion, but what are some of the things that you strongly disagree with on SSC?

Most things I disagree with SSC on seem to be general rationalist beliefs and may also be found on places like LessWrong. These views are usually expressed less directly, and sometimes in comments.

For example, SSC and rationalists in general attribute very high value to IQ. SSC has some posts relating to ability, genetics, and growth mindset that I find very good:

http://slatestarcodex.com/2015/01/31/the-parable-of-the-tale...

http://slatestarcodex.com/2015/04/08/no-clarity-around-growt...

But, while I mostly agree with both of those series, the continual claim that IQ is the best thing since sliced bread, that it's everything, correlates with everything, and is necessary for someone to reach certain heights, is a something that I find to be more dogmatic than rational. I think the IQ-is-everything model is too simplistic, and rather self-fulfilling, and if you have a lot of patience, you can extract my position on ability development from this old post: https://news.ycombinator.com/item?id=12617007

> And, by the way, I had not considered the Moloch article as a direct re-framing of a problem until you put it as such.

To be fair, I'm not sure if Scott Alexander meant it that way. There was a related post on the Goddess of Cancer, where I think the reframing part was mentioned. But I already believe that Moloch is a manifestation of a wider process, so the issue of explaining to someone how a blind process can have so much power is not new.

> The biggest issue with the toMoloch transform is that the conversion process is obviously going to be significantly more noisy and provide the author the ability copious amounts of wiggle-room to steer the reader towards their own conclusions.

I don't know that it really introduces any more significant noise than anything else. We're already surrounded by so much noise, and I would argue much of it is from the aforementioned process itself, that better means are needed than hoping that a given transformation was accurate anyway. I.e., can we make predictions from the concept of Moloch? It looks to me that we can.

Generally, information needs to be routed to the right subsystems. Humans have a few subsystems that are really good at identifying an adversary or assigning blame. But they don't have any good subsystems to examine the situation itself unless they're already above it, nor can they assign blame to the situation, as they perceive it as neutral and inert. I would say the extreme informational loss from the inability to process effects of systems and situations is so much larger than the added noise that the transformation absolutely needs to be done.


> Looking at the long history of evolution, I'd say no.

This entire lecture series on Human Behavioral Biology is worth watching from the beginning, but I've linked to a moment where Sapolsky describes tit-for-tat strategies arising in animals. First example: Vampire Bats.

[]: https://www.youtube.com/watch?v=Y0Oa4Lp5fLE&feature=youtu.be...


I think we must be looking at different scales, and I'm probably mixing up terms and not making myself very clear.

Evolution defaults to aggression as that is how it squeezes out fitness, and cooperative behavior is continually at odds with that and only seems to survive on one level up every so often, where evolution just starts treating it as giant agents anyway and the cycle starts again at a higher level. Similarly, we humans still have countries and borders and are only cooperating one level up. Cooperation is still merely being used as a survival tool, rather than an end in itself.

I.e., two people working together are working against another two people, those people if they somehow manage to combine are working against another collective, multiple collectives may combine and then work against other collective, etc... such developments may potentially be worse than just individuals fighting each other.

Similar to the idea that in a first contact situation, there may be an advantage in shooting first, and that often also implies only one iteration. I think shooting first is the default, and needs to be actively fought against.

Cooperation is not the default or preferred state for evolution, even though it's more efficient. To get there, it takes a lot of suffering and bloodshed. A few thousand years AI-caused suffering before it figures out that cooperating is useful more than one level up (if it ever does, as humans have failed so far) is not really what I have in mind when I talk about cooperative ethics. Cooperative ethics should be fundamental, not derived from short-term RoI computed in the moment.


I don't know what you mean when you say "evolution". Certainly, the evolution of mammals shows cooperation as a common, viable strategy - just look at all the species operating in packs/flocks/herds. The little I know of simple organism evolution also seems to suggest cooperation(symbiosis) is an important part in evolving into a complex organism.

But the best part about evolution is that we don't need to replicate blind mutation and strict fitness functions, we can use the proven-to-work strategies as our springboard. And the best part about AI is that we have no ethical issues simulating millions of evolutionary iterations of "bloodshed" until we arrive at an AI that is acceptable to our ethics.


> And the best part about AI is that we have no ethical issues simulating millions of evolutionary iterations of "bloodshed" until we arrive at an AI that is acceptable to our ethics.

If you don't want ethical issues, don't create something that needs a code of ethics. We haven't even figured out how to properly define "acceptable to our ethics" (aka laws and other social structures) for humans.

also, you may enjoy "27" https://www.youtube.com/watch?v=dLRLYPiaAoA


> And the best part about AI is that we have no ethical issues simulating millions of evolutionary iterations of "bloodshed" until we arrive at an AI that is acceptable to our ethics.

Are we sure this is the case? Once we start attempting to create an AI that follows our modern ethics, we have to start asking questions about AI personhood. And I for one feel there are deep ethical questions regarding forced iteration/simulation, let alone the violent kind.


Do characters in a story deserve ethical considerations? Is it wrong to re-tell a story of suffering, forcing the helpless characters to live their tragic lives in the imaginations of the listeners?

I guess... Maybe they do, if the story is told vividly enough.


Not entirely spawned by this article, but the whole genre and some other comments on HN by other users: I wonder if part of the "mystery" of cooperation in these simulations is that these people keep investigating the question of cooperation using simulations too simplistic to model any form of trade. A fundamental of economics 101 is that valuations for things differ for different agents. Trade ceases to exist in a world where everybody values everything exactly the same, because the only trade that makes any sense is to two trade two things of equal value, and even then, since the outcome is a wash and neither side obtains any value from it, why bother? I'm not sure the simulation hasn't been simplified to the point that the phenomena we're trying to use the simulation to explain are not capable of manifesting within the simulation.

I'm not saying that Trade Is The Answer. I would be somewhat surprised if it doesn't form some of the solution eventually, but that's not the argument I'm making today. The argument I'm making is that if the simulation can't simulate trade at all, that's a sign that it may have been too simplified to be useful. There are probably other things you could say that about; "communication" being another one. The only mechanism for communication being the result of iteration is questionable too, for instance. Obviously in the real world, most cooperation doesn't involve human speech, but a lot of ecology can be seen to involve communication, if for no other reason than you can't have the very popular strategy of "deception" if you don't have "communication" with which to deceive.

Which may also explain the in-my-opinion overpopular and excessively studied "Prisoner's Dilemma", since it has the convenient characteristic of explicitly writing communication out of it. I fear its popularity may blind us to the fact that it wasn't ever really meant to be the focus of study of social science, but more a simplified word problem for game theory. Studying a word problem over and over and over may be like trying to understand the real world of train transportation systems by repeatedly studying "A train leaves from Albuquerque headed towards Boston at 1pm on Tuesday and a train leaves from Boston headed towards Albuquerque at 3pm on Wednesday, when do they pass each other?" over and over again.

(Or to put it really simply in machine learning terms, what's the point of trying to study cooperation in systems whose bias does not encompass cooperation behaviors in the first place?)


Iterated prisoner's dilemma allows a sort of "communication". As the number of iterations grows, the cost of losing each individual round becomes negligible in the long run and agents can learn to use their decisions (COOPERATE or DEFECT) as a binary communication channel. So instead of saying "let's cooperate" over some side-channel, an agent indicates its intention to cooperate by simply cooperating.

In iterated prisoners dilemma and other similar games, the "API" with which agents interact with the world is extremely simple. The statement of the problem is also very simple. The agent itself can be any computable algorithm for deciding to cooperate or defect based on the past history of game rounds. I find it interesting to see agents learn recognizable behaviours like "communication" or "trade" when they aren't explicitly programmed to do those things.


The folks at DeepMind continue to produce clever original work at an astounding pace, with no signs of slowing down.

Whenever I think I've finally gotten a handle on the state-of-the-art in AI research, they come up with something new that looks really interesting.

They're now training deep-reinforcement-learning agents to co-evolve in increasingly more complex settings, to see if, how, and when the agents learn to cooperate (or not). Should they find that agents learn to behave in ways that, say, contradict widely accepted economic theory, this line of work could easily lead to a Nobel prize in Economics.

Very cool.


Oh great.

It's just a matter of time before it floods the Enrichment Center with deadly neurotoxin.


You know, rather than being scared by this, I think it's an excellent opportunity to learn how and when aggression evolves, and maybe learn how we can set up systems that nudge people to collaborate, perhaps even when resources are scarce.


The article at first suggests that more intelligent versions of AI led to greed and sabotage.

But I do wonder if an even more intelligent AI (perhaps in a more complex environment) would take the long view instead and find a reason to co-habitate.

It's kind of like rocks, paper scissors - when you attempt to think several levels deeper than your opponent and guess which level they stopped at. At some intelligence level for AI, cohabitation seems optimal - at the next level, not so much, and so on.

We're probably going to end up building something so complex that we don't quite understand it and end up hurting somebody.


Why is this done on such a small level? I would have thought that with systems now in place that evolutionary game theory could be done in simulations on such a much larger scale (say 7bn agents +) ... if anything AI systems should be able to determine if certain strategies work (like items like blocking resources - such as a case of geopolitical theory) so see what cooperations occur at that level. Still amazing work but it should be applied to a larger scale for real meaning. More eager to see how RL applied to RTS games will explore and develop strategies more than anything.


"Scarce resources cause competition" and "Scarce but close to impossible to catch on own resources cause cooperation". Is that really a discovery worth publishing?


>Self-interested people often work together to achieve great things. Why should this be the case, when it is in their best interest to just care about their own wellbeing and disregard that of others?

I think this is a kind of strong statement to take as a given, especially as an opening. This is taking social darwinism as law, and could use more scrutiny.


Is it just me, or is this article extremely light on content? The core of it seems to be

  > sequential social dilemmas, and us[ing] artificial agents trained by deep multi-agent reinforcement learning to study [them]
But I didn't find out how to recognise a sequential social dilemma, nor their training method.


Here's the actual paper: https://storage.googleapis.com/deepmind-media/papers/multi-a...

Don't expect any crazy deep insights, but it's a useful read if you want to set up a similar experiment or understand the research methodology.


Mmmh, the problem of modeling social behavior is in defining the reward function, not in implementing optimal strategies to maximize the reward.

In a game where you are given the choice of killing 10,000 people or be killed yourself, which is the most rewarding outcome?


I wonder how Deepmind will simulate game theory as it advances


I imagine you'll be able to get answer to more complex games. Things like combination of N players, multiple stable states, different optima for different players, external factors / stimuli. Answers by simulation rather then proof.


I know what I'm writing my systems science paper on.


What an awful headline. "AI learns to compete in competitive situations" should be the precis.

Basically, it learned that it didn't need to fight until there was resource scarcity in a simulation.


I think aggressive is a better (more descriptive, narrower) word than compete here.

Two racers are competing to see who runs faster, but if one pulls out a laser gun and shoots the other, that's aggressive.


It's a loaded word in the context of the headline, and the manner in which it was used in the story body, especially when combined with "stress" to create an image of a sort of edgy killing machine.

Actually, it's an interesting word. Dictionary definitions of aggression frequently revolve around emotions - it's a very human word, probably not suitable for AI.


The game included the ability to shoot the other player with no consequence - it sounds aggressive because "laser beam" but if you called it "tagging" then it would more clearly just be an in-rules option.


I don't think "in-rules" and "aggressive" are mutually exclusive.

It's fair to call blitzing the QB an aggressive move in American football.


You're technically correct, but the football analogy switches context so that the meaning of aggressive is no longer bad.

I think the point bencollier49 is trying to make is that we simply gave software a specific set of rules to train it. It doesn't know how we perceive the actions it is performing.

The game could be described as two people eating poisonous apples in order to prevent the other person from dying. In that case, the currently greedy one would be the hero.


> we simply gave software a specific set of rules to train it. It doesn't know how we perceive the actions it is performing.

I think the author is making the same point from another angle.

The AI learns what we would consider aggressive moves when conditions favor those moves.


To the AI, there is nothing aggressive about the "laser gun"; as far as it was aware, the "laser gun" could be any tool. It was just doing what it had determined may help it achieve a better score.


The "laser gun" knocks the other player out of the race.

It's not the name that makes me consider it aggressive, rather the fact that it works by harming the other player.

It's probably true the AI doesn't distinguish "aggressive" tools from other tools. Isn't that one of lessons here? If an AI isn't taught not to be aggressive, it will choose to harm other participants when that's the most effective strategy.


Not if laser guns are a part of the race. Then it's just competitive.


In real life, the game rules are whatever is physically possible, taking into account the risk of getting caught and the corresponding fine.

It is all good fun and laser tag until the AI manages the interests of your bank, your insurer, the stock market, everyone, and you desperately need a liberal government to enforce serious penalties for increasingly complex loopholes that the AI finds in a few seconds.


If the laser gun is not part of the race, it's cheating (at best). I don't understand this nitpicking. Of course a gun is an aggressive tool. "Aggressive" and "competitive" are not mutually exclusive.


It all depends on how the gun is modeled. If, which seems likely, it is as simple as push-button-receive-bacon, then there are no consequences to pushing the button.

It's difficult to characterize that as aggression, especially if the system has no built in notion of harm or other-like-me.

That is what is actually scarier: Violence as paperwork.


Progressive left double speak, nothing new.


Please don't post unsubstantive comments, and especially don't do partisan name-calling here. It destroys the kind of discussion HN exists for.

Edit: unfortunately you've been doing this a lot. We ban accounts that do this, so please stop.


This reads as click-bait, here's the original blog post and research paper by DeepMind:

"Understanding Agent Cooperation" https://news.ycombinator.com/edit?id=13635218




Bait.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: