How to Make Custom AI-Generated Text with GPT-2

minimaxir · on Sept 14, 2019

If you want examples of AI-generated Hacker News titles (made pretty much the same way as the Reddit example in this post), check out this repo: https://github.com/minimaxir/hacker-news-gpt-2

georgewsinger · on Sept 14, 2019

Wolfram has a GPT-2 interface that can be beckoned with two lines of code: https://www.wolframcloud.com/obj/user-900a994f-78ab-4931-b18...

tyingq · on Sept 14, 2019

I'm curious what Google might be doing to combat content/link farms created for SEO purposes using this.

conse_lad · on Sept 14, 2019

There's a tool for that: http://gltr.io/dist/index.html

"Each text is analyzed by how likely each word would be the predicted word given the context to the left. If the actual used word would be in the Top 10 predicted words the background is colored green, for Top 100 in yellow, Top 1000 red, otherwise violet."

Viliam1234 · on Sept 14, 2019

What if I find out what is the supposedly "non-fake" distribution of green/yellow/red/violet words, and generate my text accordingly?

Generally, whenever your detection strategy is "a spam generator would never do X", I simply update my spam generator to do X. (Note that "X" must be something that is relatively easy to calculate, not things like "this actually makes sense", because it must be something your detector can recognize.)

Also, if Google suddenly started penalizing all text that doesn't "seem natural", there would be tons of false positives: languages other than English, jargon-heavy websites, dyslectic people, etc.

Avamander · on Sept 14, 2019

Non-native speakers writing text also comes to mind.

minimaxir · on Sept 14, 2019

Since we're on the topic, this detector assumes generated text is from the default GPT-2 models; it won't work as a well on finetuned GPT-2.

jcims · on Sept 14, 2019

I need something like this for audit logs.

onemoresoop · on Sept 14, 2019

With the risk of being downvoted, what’s the point of ai generating text knowing that it will never understand it like we do and having in mind that poor non AI generated text is already being used for mostly nefarious reasons? As a problem I understand the fascination with it, human languages are extremely complex things and mimmicking it has been proven complicated, but beyond that, what is a practical purpose for this?

ganeshkrishnan · on Sept 14, 2019

We tried this for e-commerce products description. The model asked us to "dip the laundry nuts in chocolate and feed it to the kids"

They are grammatically correct but logically nonsense. The examples shown are also curated from thousands of generated text.

The only use for these is the seo landfill to get in top of Google

minimaxir · on Sept 14, 2019

Because it's fun. Which is practical enough!

I do have ideas for more practical content generation but those do not currently have plans to be made public.

onemoresoop · on Sept 15, 2019

Enjoy if it is fun. I simply hope you won’t use it or enable others to use it for any nefarious means. I once entertained the thought that one day we may be able to ai generate novels. And the more i thought about it the more pointless it seemed.

darawk · on Sept 14, 2019

> With the risk of being downvoted, what’s the point of ai generating text knowing that it will never understand it like we do

In part because we're trying to probe what it even means to "understand" something. If an AI is capable of carrying on a cogent conversation with you, does it understand what it's saying? How do you know?

onemoresoop · on Sept 15, 2019

You simply ask it (the AI entity) more and more questions, give it more facts and see if the answers make sense and it gets closer to what you consider cogent. Rinse and repeat. So far we’re far from it. It’s a very complicated problem that may not need to be solved. We have humans doing just that

drongoking · on Sept 14, 2019

Amen!

It is slightly better now than it used to be, i.e. you now have to read two sentences to realize it's incoherent gibberish, rather than just one. But thinking you're getting closer to understanding this way is like the metaphor of climbing a tree to reach the moon: you can keep reporting progress until you run out of tree.

As for it being entertaining, sure, the first attempts were entertaining and instructive. Now, years later, reading more gibberish from a neural net on HN, not so much.

dlphn___xyz · on Sept 14, 2019

the bulk of ML research is content generation which will eventually be used for nefarious reasons

amrrs · on Sept 14, 2019

minimaxir's gpt-2-simple is a very nice work to get anyone start with Text Generation from the models released by OpenAI. I often wonder how can one keep up so much with the fast-paced NLP and produce things that abstracts the pain and exposes simple functions for developers to on it.

Hugging Face is another such company making massive impact among non-NLP developers to use such resources.

Kudos to these guys!

minimaxir · on Sept 14, 2019

tl;dr I just keep an eye on GPT-2-related news on Twitter.

I have no advance knowledge of any GPT-2 related happenings, which has become inconvenient as OpenAI likes to release things when I am on vacation. :P

bigmit37 · on Sept 14, 2019

Thanks for sharing. I have been hearing a lot of buzzword regarding this archtecture, so I checked out the blog.

I quickly glanced through it but will do a deeper dive later on.

The outputted paragraphs don’t seem to realistic. However you point out that the text needs to be longer for a more a natural output.

I am genuinely curious how relalistic it can get.

Avamander · on Sept 14, 2019

Check out https://www.reddit.com/r/SubSimulatorGPT2Meta/top/

In my personal experience I get fooled by the bots far too often, it's actually really scary, I can imagine it being used nefariously to push a specific political agenda, spam places, do SEO fraud etc. etc.

yowlingcat · on Sept 14, 2019

Very nice. I'm going to have some fun taking a crack using this with some of the funnier stuff I've seen on twitter.