Just to be clear, these are hidden prompts put in papers by authors meant to be triggered only if a reviewer (unethically) uses AI to generate their review. I guess this is wrong, but I find it hard not to have some sympathy for the authors. Mostly, it seems like an indictment of the whole peer-review system.
AI "peer" review of scientific research without a human in the loop is not only unethical, I would also consider it wildly irresponsible and down right dangerous.
I consider it a peer review of the peer review process
Back in high school a few kids would be tempted to insert a sentence such as "I bet you don't actually read all these papers" into an essay to see if the teacher caught it. I never tried it but the rumors were that some kids had got away with it. I just used it to worry less that my work was rushed and not very good, I told myself "the teacher will probably just be skimming this anyway; they don't have time to read all these papers in detail."
Aerosmith (e: Van Halen) banned brown M&Ms from their dressing room for shows and wouldn’t play if they were present. It was a sign that the venue hadn’t read the rider thoroughly and thus possibly an unsafe one (what else had they missed?)
> As lead singer David Lee Roth explained in a 2012 interview, the bowl of M&Ms was an indicator of whether the concert promoter had actually read the band's complicated contract. [1]
I wonder if they had to change that as the word leaked out. I can just see the promoter pointing out the bowl of M&Ms and then Roth saying "great, thank you, but the contract didn't say anything about M&Ms, now where is the bowl of tangerenes we asked for?"
To add to this, sometimes people would approach Van and ask about the brown M&Ms thing as soon as they received the contract. He would respond that the color wasn’t important, and he was glad they read the contract.
This reminds me of the tables-flipped version of this. A multiple choice test with 10 questions and a big paragraph of instructions at the top. In the middle of the instructions was a sentence: "skip all questions and start directly with question 10."
Question 10 was: "check 'yes' and put your pencil down, you are done with the test."
Because it would end up favoring research that may or may not be better than the honestly submitted alternative which doesn't make the cut, thereby lowering the quality of the published papers for everyone.
It ends up favoring research that may or may not be better than the honestly reviewed alternative, thereby lowering the quality of published papers in journal where reviewers tend to rely on AI.
That can't happen unless reviewers dishonestly base their reviews on AI slop. If they are using AI slop, then it ends up favoring random papers regardless of quality. This is true whether or not authors decide to add countermeasures against slop.
Only reviewers can ensure that higher quality papers get accepted and no one else.
I expect a reviewer using AI tools to query papers to do a half decent job even if they don’t check the results… if we assume the AI hasn’t been prompt injected. They’re actually pretty good at this.
Which is to say, if there were four selections to be made from ten submissions, I expect that humans and AI reviewers to select the same winning 4 quite frequently. I agree with the outrage of the reviewers deferring their expertise to AI on grounds of dishonesty among other reasons. But I concur with the people that do it that it would mostly work most of the time in selecting the best papers of a bunch.
I do not expect there to be any positive correlation between papers that are important enough to publish and papers which embed prompt injections to pass review. If anything I would expect a negative correlation—cheating papers are probably trash.
Doesn't feel wrong to me. Cheeky, maybe, but not wrong. If everyone does what they're supposed to do (i.e. no LLMs, or at least not lazy prompts "rate this paper" and then c/p the reply) then this practice makes no difference.
The basic incentive structure doesn’t make any sense at all for peer review. It is a great system for passing around a paper before it gets published, and detecting if it is a bunch of totally wild bullshit that the broader research community shouldn’t waste their time on.
For some reason we decided to use it as a load-bearing process for career advancement.
These back-and-forths, halfassed papers and reviews (now halfassed with AI augmentation) are just symptoms of the fact that we’re using a perfectly fine system for the wrong things.
I have a very simple maxim, which is: If I want something generated, I will generate it myself. Another human who generates stuff is not bringing value to the transaction.
I wouldn't submit something to "peer review" if I knew it would result in a generated response and peer reviewers who are being duplicitous about it deserve to be hoodwinked.