Which is a baseless hyperbole. We get it, blog spam is annoying. That doesn’t change the fact that humans generate a ton of data just interacting with one another online.
As a forum moderator, I have transitioned to relying heavily on AI-generated responses to users.
These responses can range from short and concise ("Friendly reminder: please ensure that all content posted adheres to our rules regarding hate speech. Let's work together to maintain a safe and inclusive community for everyone") to lengthy explanations of underlying issues.
By using AI-generated content, a small moderation team can efficiently manage a large group of users in a timely manner.
This approach is becoming increasingly common, as evidenced by the rise in AI-generated comments on popular sites such as HN, Reddit, Twitter, and Facebook.
Many users are also using AI tools to fix grammar issues and add extra content to their comments, which can be tempting but may result in unintentional changes to the original message.
In fact, I myself have used this technique to edit this very comment to provide an example.
---- Original comment:
As an online forum mod, I switched to mainly using AI to generate replies to users. Some are very short ("Hey! Remember the rules.") and some are long paragraphs explaining underlying issues. Someone training on my replies would pretty much train on AI generated content without knowing. It allows a small moderation team to moderate a large group quickly. I know that I am not alone in this.
There is also a raise in AI generated comments on sites like HN, Reddit, Twitter and Facebook. It's tempting to copy-paste a comment in AI for it to fix grammar issues, which often results in extra content being added to text. In fact, I did it for this comment.
The original comment is much better, please stop rewriting your comments using OpenAI.
> In fact, I did it for this comment.
Yes, it was obvious from the second sentence. The way ChatGPT structures text by default is very different from how most humans writes. Always the same "By using", "These X can range from" etc.
Padding your text with more words doesn't make it better, more words makes it worse, this isn't school.
Interesting, the "By using" was my own addition to shorten a long sentence it had generated that distracted from the example.
To be more clear, using AI to rewrite comments such as this one is not something I often do. My personal use of it for moderation purposes is more prompt based than pasting a long comment for grammar and spelling corrections.
What I did here was an example and that example provided the same criticism that you wrote here as a reply ("which can be tempting but may result in unintentional changes to the original message"). In other words, makes the text more verbose and sanitizes the writing style.
The prompt we use for moderation contains our site's rules and some added context. So using ChatGPT, we can paste in someone's comment and ask the bot to write a short text explaining how that comment does not follow our rules and what the user can do.
"Using the rules above, write a very short message for a user that wrote a rule breaking comment. Show empathy. Use simple English. Explain the rules that were broken. The comment is [comment here]"
Using this saves a lot of time. Is the quality of the comment not as good as it could be if it was written by a human? Absolutely. However, using AI let us change the user:mod ratio in a way. Automoderators are nothing new, what is new is that now the automoderator can take context into account and provide a customized message.
OpenAI at least can track the hashes of all content it's ever output, and filter that content out of future training data. Of course they won't be able to do this for the output of other LLMs, but maybe we'll see something like a federated bloom index or something.
Agreed there is no perfect solution though, and it will definitely be a problem finding high quality training data in the future.
AI content will be associated with a user or organization in the trust graph. If someone you trust trusts a user or organization who posts AI content, you're free to revoke your trust in that person or blacklist the specific users/organizations you don't want to see anymore.
We've been pretending to be just about to do this for decades. The fact is that internet companies will not develop a network of trust, because they are primarily advertisers looking for better ways to abuse trust.