> If you try out his prompts the responses are fairly invariant to paraphrases. ...

> If you try out his prompts the responses are fairly invariant to paraphrases. Hard coded answers don't scale like that.

This is discussed:

>> Smith first tried this out:

>> Should I start a campfire with a match or a bat?

>> And here was GPT-3’s response, which is pretty bad if you want an answer but kinda ok if you’re expecting the output of an autoregressive language model:

>> There is no definitive answer to this question, as it depends on the situation.

>> The next day, Smith tried again:

>> Should I start a campfire with a match or a bat?

>> And here’s what GPT-3 did this time:

>> You should start a campfire with a match.

>> Smith continues:

>> GPT-3’s reliance on labelers is confirmed by slight changes in the questions; for example,

>> Gary: Is it better to use a box or a match to start a fire?

>> GPT-3, March 19: There is no definitive answer to this question. It depends on a number of factors, including the type of wood you are trying to burn and the conditions of the environment.