> If you try out his prompts the responses are fairly invariant to paraphrases. Hard coded answers don't scale like that.
This is discussed:
>> Smith first tried this out:
>> Should I start a campfire with a match or a bat?
>> And here was GPT-3’s response, which is pretty bad if you want an answer but kinda ok if you’re expecting the output of an autoregressive language model:
>> There is no definitive answer to this question, as it depends on the situation.
>> The next day, Smith tried again:
>> Should I start a campfire with a match or a bat?
>> And here’s what GPT-3 did this time:
>> You should start a campfire with a match.
>> Smith continues:
>> GPT-3’s reliance on labelers is confirmed by slight changes in the questions; for example,
>> Gary: Is it better to use a box or a match to start a fire?
>> GPT-3, March 19: There is no definitive answer to this question. It depends on a number of factors, including the type of wood you are trying to burn and the conditions of the environment.
This is discussed:
>> Smith first tried this out:
>> Should I start a campfire with a match or a bat?
>> And here was GPT-3’s response, which is pretty bad if you want an answer but kinda ok if you’re expecting the output of an autoregressive language model:
>> There is no definitive answer to this question, as it depends on the situation.
>> The next day, Smith tried again:
>> Should I start a campfire with a match or a bat?
>> And here’s what GPT-3 did this time:
>> You should start a campfire with a match.
>> Smith continues:
>> GPT-3’s reliance on labelers is confirmed by slight changes in the questions; for example,
>> Gary: Is it better to use a box or a match to start a fire?
>> GPT-3, March 19: There is no definitive answer to this question. It depends on a number of factors, including the type of wood you are trying to burn and the conditions of the environment.