Hacker News new | past | comments | ask | show | jobs | submit login

I am aware of one reproduction of the experiment, the goals seem pretty darn explicit, and its results are public. He has stated the rules of engagement, and has said that he did it "the hard way". If nothing else, one should at least be confident that Yudkowsky is honest.

His claim that "a transhuman mind will likely be able to convince a human mind" is what his experiment demonstrates, not what is assumes, and frankly it is absurd to make it sound like he has not repeatedly given justifications for the statement.

What actual misinterpretations or other issues are you worried about?




- What reproduction? What would you consider a successful reproduction, for that matter? If I told you I reenacted the experiment at home with a friend, would you consider this a reproduction? Someone saying they reproduced it on the internet would convince you? What are your standards of quality?

- What is the goal of the experiment? Is the goal "show that a transhuman AI can convince a human gatekeeper to set it free"? Or is it actually "show that a huthatman can talk another human into performing a task", or even "an internet (semi)celebrity can convince a like-minded person into saying they would perform a task of very low real-world stakes". How would you tell each of these goals apart?

- The results are most definitely not public. What is public is what Yudkowsky claims the results were, but since the transcripts are secret and there are no witnesses, how do we know they are true (or even not assuming dishonesty or advanced crankiness, how can we tell if they are flawed?). Would you believe me if I told you I have a raygun that miniaturizes people, that I have tested it at home and it works, and that I have a (very small) group of people who will tell you what I say is true? No, I cannot show you the raygun or the miniaturized people, but I can tell you it was a success!

- "A transhuman mind will likely be able to convince a human mind" is what is stated as truth in the fictional conversation at the top of the AI-box experiment web page. Yudkowsky has repeatedly provided "justifications", but these are unscientific and unreasonable.

Yudkowsky claims that because a person can convince another person of claiming they would perform a task (setting an hypothetical AI free), that then a "transhuman" mind is likely to convince a human gatekeeper. The logical disconnect is huge. First, that people can convince other people of things is no big revelation. Unfortunately, it doesn't follow that because some people can convince other people of some things in certain scenarios, then people can universally convince other people of arbitrary things in every context. Worse, we don't even know what a "transhuman" mind would be like; assuming it means "faster thoughts" (a random assumption), why would more thoughts per minute translate into higher convincing capacity? Is it true, for that matter, that higher intelligence translates into higher ability to convince others of stuff?

----

Another example of methodological flaws: in both runs of the experiment, the participants seem to be selected from a pool of people fascinated by this kind of questions and who would be open to suggestion that a "transhuman" mind can convince them of stuff. Let's look at them:

First participant: Nathan Russell. Introduces himself as

> "I'm a sophomore CS major, with a strong interest in transhumanism, and just found this list."

Then shows interests in a similar experiment and considers how it could be designed. Note that the list itself, SL4, is for people interested in the "Singularity". Enough said.

Second participant: David McFadzean. Correctly claims the first experiment is not proof of anything, and is willing to take part in a second experiment. Later Yudkowsky describes him like this:

> "David McFadzean has been an Extropian for considerably longer than I have - he maintains extropy.org's server, in fact - and currently works on Peter Voss's A2I2 project."

The mentioned website still exists and it has something to do with a Transhumanist Institute. I start to see a pattern here.


The only experiment I know of and would consider a serious attempt at reproduction would be Tuxedage's series,

https://www.lesswrong.com/posts/FmxhoWxvBqSxhFeJn/i-attempte...

https://www.lesswrong.com/posts/dop3rLwFhW5gtpEgz/i-attempte...

https://www.lesswrong.com/posts/oexwJBd3zAjw9Cru8/i-played-t...

His total is 3 for 3. I do not know how to explain these results without either taking them to be honest attempts at a fair experiment or by assuming those involved colluded. I find the latter absurd, given my priors about the honesty of members of LessWrong (Yudkoswky in particular, though he wasn't involved in the reproduction).

> If I told you I reenacted the experiment at home with a friend, would you consider this a reproduction? Someone saying they reproduced it on the internet would convince you?

It is not so simple. I would want evidence that you and your friend were smart and had a decent understanding of the domain, and that your friend was in a similar state of unbelief about the plausibility of being convinced. I would want a statement that it was a serious attempt at doing things "the hard way" and true to the experiment, on both sides, lest you get [1]. Of course I would want the standard rules, or a reasonable modification publicly stated, in addition.

[1] https://pastebin.com/Jee2P6BD

> What is the goal of the experiment?

To show that "I can't imagine anything that even a transhuman could say to me which would change [my mind]" is not evidence, and should not be treated as such. To provide evidence that "humans are not secure systems".

You say "very low stakes", but Yudkowsky convinced someone who had offered a $5000 handicap. That hardly seems like a trivial quantity.

> [maybe it's all a lie]

You have to be very cynical to take this worldview.

> we don't even know what a "transhuman" mind would be like

The experiment is under assumption of a true singularity, ergo. nigh-unlimited intelligence. I can discuss what outcomes I think are likely for AI development, or which are merely plausible, but the experiment is about one particular hypothetical, so that would be a different conversation.

> the participants seem to be selected from a pool of people fascinated by this kind of questions and who would be open to suggestion that a "transhuman" mind can convince them of stuff

I am unconvinced that this experiment would work if the gatekeepers did not have an understanding of the topic; they are meant to play a gatekeeper, after all. A person who considers the singularity plausible but thinks an AI box is effective seems like the perfect control should the singularity happen and people want to figure out whether to AI box it.


But that's just it: I simply don't think Yudkowsky or any of the sort of people who would be enthusiastic about the sci-fi theories on SL4, or host extropy.org, or believe in Roko's Basilisk, or read Harry Potter fanfic and find it philosophically insightful, have a decent understanding of the AI domain. Everything about him and his followers smacks of fringe cultists completely outside mainstream research.

I don't think the chosen participants have a particularly deep understanding of the domain, they just think they do (because that's what defines Singularity believers, LessWrong readers, and people who believe they are hyper "rational" and that this is some kind of superpower). I think they understand AI no more than a Star Wars fan understands space travel.


Sure, I don't particularly care if you or anyone else wants to disengage from LessWrong-esque ideas because they sound weird. I only entered this discussion because it sounded like you might have had an actual argument.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: