They probably fear that people wouldn’t use the API otherwise, I guess. They could have different tiers though where you pay extra so your data isn’t used for training.
If they're bold enough to say they train on data they do not own, I am not optimistic when they say they don't train on data people willingly submit to them.
>Why does that indicate they would lie about a worse thing?
Because they know their audience. It's an audience that also doesn't care for copyright and would love for them to win their court cases. They are fineaking such an argument to those kinds of people.
Meanwhile, the reaction from the same audience when legal did a very typical subpoena process on said data, data they chose to submit to an online server of their own volition, completely freaked out. Suddenly, they felt like their privacy was invaded.
It doesn't make any logical sense in my mind, but a lot of the discourse over this topic isnt based on logic.
If it ever leaked that OpenAI was training on the vast amounts of confidential data being sent to them, they’d be immediately crushed under a mountain of litigation and probably have to shut down. Lots of people at big companies have accounts, and the bigcos are only letting them use them because of that “Don’t train on my data” checkbox. Not all of those accounts are necessarily tied to company emails either, so it’s not like OpenAI can discriminate.