I'm not suggesting the concerns aren't valid, but I guess I don't understand why this same principal isn't applied to other internet connected / cloud software? Do these companies worry that web browsers like Chrome could leak data or applications like Google Docs?
What is it about an AI chat bots that makes the risk of a data leak so much higher? Is something about OpenAI's ToS? Or it's relative infancy?
That’s because OpenAI can use any data that you send to chatgpt for training purposes. [0] They don’t do it with their APIs btw.
“(c) Use of Content to Improve Services. We do not use Content that you provide to or receive from our API (“API Content”) to develop or improve our Services. We may use Content from Services other than our API (“Non-API Content”) to help develop and improve our Services. You can read more here about how Non-API Content may be used to improve model performance.”
Honestly, from an executive's point of view, why should Samsung trust openAI? It exists in a community which is famous for skirting rules, the whole move fast and break things mentality isn't just at facebook. Their IP is incredibly valuable and they also have access to a bunch of customer IP. OpenAI needs to establish trust and it takes time. Microsoft spent years working with these companies to make them adopt azure and stuff.
"We do not use Content that you provide to or receive from our API ("API Content") to develop or improve our Services."
That sentence does not claim OpenAI does not use such Content for other purposes besides "developing and improving the Services". For example, using the Content in a manner that potentially harms Samsung's business.
What does "develop or improve our Services" even mean? There is no definition.
Third, how would anyone outside of OpenAI know how OpenAI uses the Content. For example, what if OpenAI was using Content from Services other than the API for purposes other than "to develop and improve our Services" (assuming anyone could prove what thet even means). How would anyone outside of OpenAI discover this was happening?
If we search these "ToS" for phrases like "You will" or "You will not", we see that users make promises to OpenAI. However if we search for phrases like "We will" or "We wil not", we see that there are no instances where OpenAI promises anything. IMHO, these "ToS" are better characterised as "ToU". Not to mention being found at "/policies/".
As a ChatGPT user, OpenAI does not owe you anything, unless perhaps you are Microsoft. For you, the "terms" can change at any time, for any reason, without any prior notice.
Let's imagine some far-fetched scenario where someone inside the company leaks information that suggests OpenAI is using Content from Services other than the API for purposes other than "improving or developing the Services". Then what?
OpenAI has not promised to refrain from using Content for certain purposes. There is no breach of these ToS if OpenAI uses the Content for whatever purposes it desires.
Maybe Samsung could claim something like (a) OpenAI misrepesented facts in their ToS, (b) that induced Samsung into using OpenAI, and (c) as a result Samsung suffered harm. Needless to say, claims like that are difficult to prove and any recovery is limited. Whatever creative legal claims Samsung could could up with, none of them would fix damage already done to Samsung from its employees having used OpenAI.
>These Terms of Use apply when you use the services of OpenAI, L.L.C. or our affiliates, including our application programming interface, software, tools, developer services, data, documentation, and websites (“Services”).
Though, they don't define "developing and improving".
Based on the way things have gone with them, it kind of leaves one with the impression that they are building the Dolores Umbridge of AI's and if that becomes AGI first, it will be complete hell to live on this planet, or anywhere the AI can find you really.
There is an option in ChatGPT that you can use to turn that off.
> Chat History & Training Save new chats to your history and allow them to be used to improve ChatGPT via model training. Unsaved chats will be deleted from our systems within 30 days.
The default is your chats are their property to train on and use as they wish.
If you dig into the Settings you can disable sharing chats with OpenAI, but you lose access to just about every feature including saving chats. You can only have one chat at a time, and if the window is closed or refreshed your chat is wiped. it's kind of like if opening a "Private Browsing" window prevented you from having a regular browsing window open and also had no tabs.
For some reason, they still retain your chat for "up to 30 days" despite not letting you save or access it after the page is refreshed.
The 30 days part is most likely a legal compliance bit to cover their asses if their backing data systems ever take a big dump.
They need to use an asynchronous system to be tracking when your chat becomes "finished" and then likely queue it up to a system looking to propagate deletes. They have to choose some kind of SLA on that and probably went with a common data privacy user data deletion window of 30 days.
But hasn’t Google literally admitted to using user data for training purposes? I remember this being a big deal with Gmail and personalized ads a while back. Personally, it sounds to me like another case of “We think new technology is scary, therefore it’s banned”.
I have seen companies have rules about (or against) cloud computing in general. I remember when decent web-based translation services first came out, and the guidance from BigCorp was to not use them for anything work-related.
From what I personally have seen, this sort of guidance remains. When companies do use things like Google Docs or Microsoft Office365, they likely have some specific contract in place with Google / Microsoft / etc., that the company's legal team has decided they are happy with.
I anticipate that the same will eventually be true of ChatGPT and such, that there will be some paid corporate offering with contract terms that make the company lawyers happy.
Most of my career has been with larger companies, often with high data sensitivity; I can easily imagine that some smaller and/or less data-sensitive companies might not care about any of this.
I work for a company who for a very long time was strongly opposed to employees using any cloud-based infrastructure, including OS or programming language package managers (eg. apt-get, Pip for Python, etc), opting to host their own instance if possible and disallowing usage if not possible. IT did finally cave and switch to Office365, which has been slowly opening the floodgates to other services being allowed.
The gets into some really interesting corporate governance issues.
The cloud is a terrible bet for many large companies. The benefits are minimal while the risks are huge, however what’s in the best interest for the company is only tangentially related to what happens.
It’s really difficult to ensure companies actually take low probability risks seriously. A 1% chance to lose 10 billion dollars is an easy bet for upper management to make when their personal risks and rewards don’t line up with the company’s risks and rewards.
As somebody who is completely puzzled why you could think that the cloud is a terrible bet for many companies - are these companies allowing their employees to use Google Search or other parts of the internet or do they have to look up things in paper books in the corporate library?
Let’s not pretend Google search is what people mean when they say “cloud.” It’s about running internal processes on external systems.
As to the risks, many companies live and die by their internal secrets. These range from a private keys, customer lists, trading strategies, and similar trade secrets to actual serious R&D efforts.
Sometimes the damage is obvious such as crypto exchanges suddenly finding themselves broke, but corporate espionage can also be kept quite. Losing major government bids because a competitor read some internal memo’s is a serious risk and you may never know.
It’s much harder to reconstruct actionable intelligence from a huge stream of people using Google search even if they’re using it to preform sensitive calculations.
While theoretically possible for something bad to happen, short phrases are much less likely to have actionable/sensitive info in them than the full document corpus of a company.
This is already in the works. Microsoft offers an Azure GPT4 service for business. They already offer a business version of CoPilot as well. I have not gone over the details but I imagine the usual business and support agreements will be in place for the business tiers.
> Do these companies worry that web browsers like Chrome could leak data or applications like Google Docs?
Yes they do. Where I work the whole google office suite is blocked from inside the network (you have to use MS Office). ChatGPT is blocked. Most web apps that you can copy text or data into are either blocked, or we have an agreement with the provider, or (for open source) we have an internal on-prem fork.
From TechCrunch on March 1, 2023: "Starting today, OpenAI says that it won’t use any data submitted through its API for “service improvements,” including AI model training, unless a customer or organization opts in."
So prior to that, they were willing to use your data for model training. Every service may have leaks/security issues, but few say they'll purposely use your data. OpenAI probably should've promised not to use your data from the beginning; it'll be a hard perception to change now.
Through their API is different than using their web based interface where you specifically have to opt out of allowing them to use your chat for training.
It is, firstly there are legal agreements when you’re an enterprise user of such solution then there are various tools like DLP solutions that integrate with cloud/SAAS services such as Google Docs or Officer 365 and lastly there are CASB solutions that allow you to control how corporate users use those solutions in the first place.
E.G. you’ll probably be able to use the corporate account to sign into the corporate Google Docs or O365 instance but if you try to sign into your own it would be blocked and likely also reported on so you might get a call from SecOps down the line.
OpenAI currently offers none of it and more importantly it openly uses the data that users submit to it as well as the responses for additional training and any other purpose they might come up with.
As for browsers these are also often also configured not to send data outside of the company and yes it’s possible. Windows 11 web search and other features would also likely be disabled on your corporate device.
This probably isn't at the top of the list of serious concerns, but one problem that's kind of unique to AI is that in general, AI-generated content isn't eligible for copyright protection. Companies might worry about losing copyright on certain things if someone finds out that a lazy employee didn't actually create the content themselves.
Companies generally tend to be wary of cloud services due to data leak concerns. At the very least, they like to be in control of the decision about which services are approved and which are not.
It's about leaking private/proprietary company information. It's not about features.
How will that private/proprietary information be used by OpenAI? Does it include NDA information from another company that they don't have the right to share? How secure is the information stored (think industrial espionage)? There is a lot that needs to be taken into account that even goes beyond this.
With Google Docs, MS Office, Atlassian etc you get a real software product with engineers paid ~$300k per year to fix bugs.
With ChatGPT, you get researchers paid over $1m per year [1] to use you as a training data source and ship stuff with basic bugs and then "feel sorry" when stuff breaks:
https://www.theregister.com/2023/03/23/openai_ceo_leak/
Another position for those like Samsung: preventing ChatGPT use encourages incubation of internal competing solutions.
It’s shadow IT. You are not supposed to leak confidential company data to any other company unless you have the appropriate vendor agreements in place. It’s like loading the source code to Dropbox when you are supposed to use bitbucket.
Don't know about chrome, because it's an application and not itself harmful, but yes they think Google docs will steal their data. Or any other cloud service. They are all banned for employees. I'm surprised that anyone is surprised by this.
I don't think it is at all clear that OpenAI wont use data that is put into it for unclear purposes, and I don't think they have a corporate account feature to guarantee prompt privacy.
We've seen demonstrations of models being tricked into sharing details about their setup or training data. If they are to be trained on what is shared with them then that data could be procured by an attacker.
I would have concerns about Google using my data but I wouldn't be concerned that the data I enter could easily appear in someone else's spreadsheet.
My company explicitly blocks the use of non-corporate controlled cloud products for obvious reasons. All it takes is one person to post an Excel document incorrectly to cause a major incident.
What is it about an AI chat bots that makes the risk of a data leak so much higher? Is something about OpenAI's ToS? Or it's relative infancy?