A common theory is that increased temperatures end up causing an ice age. Bill Bryson discusses that in one of his books and I can’t remember the exact reasoning, maybe it’s something like
Increased heat -> more moisture -> more clouds -> less heat -> ice age
I’m not sure. But it was fairly convincing and has apparently happened on Earth before before humanity according to the records.
There are many relevant things that already exist in the physical world and are not currently considered dangers: ecommerce, digital payments, doordash-style delivery, cross-border remittances, remote gig work, social media fanning extreme political views, event organizing.
However, these are constituent elements that could be aggregated and weaponized by a maleficent AI.
Maleficent humans are constantly trying to use these elements for their own gain, often with little to no regards to other humans (especially out groups). This happens both individually, in small groups, in large organizations and even multiple organization colluding. Both criminal, terrorist, groups at war, along with legal organizations such as exploitative companies and regressive interest organizations, et.c.. And we have tools and mechanisms in place to keep the level of abuse at bay. Why and how are these mechanisms unsuitable for protecting against AI?
>Why and how are these mechanisms unsuitable for protecting against AI?
The rule of law prevented WWI and WWII, right? Oh, no it did not, tens to hundreds of millions died due to human stupidity and violence depending on what exactly you count in that age.
> Both criminal, terrorist, groups at war
Human organizations, especially criminal organizations have deep trust issues between agents in the organization. You never know if anyone else in the system is a defector. This reduces the openness and quantity of communication between agents. In addition you have agents that want to personally gain rather than benefit the organization itself. This is why Apple is a trillion dollar company following the law... mostly. Smart people can work together and 'mostly' trust the other person isn't going to screw them over.
Now imagine a superintelligent AI with a mental processing bandwidth of hundreds of the best employees at a company. Assuming it knows and trusts itself, then the idea of illegal activities being an internal risk disappears. You have something that operates more on the level of a hivemind toward a goal (what the limitations of hivemind versus selfish agents are is another very long discussion). What we ask here is if all the worlds best hackers got together, worked together unselfishly, and instigated an attack against every critical point they could find on the internet/real world systems at once, how much damage could they cause?
Oh, lets say you find the server systems the super intelligence is on, but the controller shuts it off and all the data has some kind of homomorphic encryption so that's useless to you. It's dead right? Na, they just load up the backup copy they have a few months later and it's party time all over again. Humans tend to remain dead after dying, AI? Well that is yet to bee seen.
Those tangible elements would conceivably become the danger, not the AI using those elements. Again, the "guns don't kill people, people kill people" take is all well and good, but well outside of this discussion.
I can see how steganography applied to images can result in hard-to-detect watermarks or provenance identifiers.
But I don't see how these can be effectively used in text content. Yes, an AI program can encode provenance identifiers by length of words, starting letters of sentences, use of specific suffixes, and other linguistic constructs.
However, say that I am a student with an AI-generated essay and want to make sure my essay passes the professor's plagiarism checker. Isn't it pretty easy to re-order clauses, substitute synonyms, and add new content? In fact, I think there is even a Chrome extension that does something like that.
Or maybe that is too much work for the lazy student who wants to completely rely on ChatGTP or doesn't know any better.
The point of steganography (as discussed in the paper) is not unerasable watermarks but undetectable (to the adversary) messages in innocent-looking communication.
I'm confused why you focus on plagiarism detection. That being said, your scenario is very briefly mentioned in the conclusion and requires augmenting the approach (entropy coding) with error correction.
The result would be that as long as your modifications (reordering clauses, etc.) reasonably closely follow a known distribution with limited entropy (which I think it clearly does, although specifying this distribution and dealing with the induced noisy channel might be very hard), there will be a way to do covert communication despite it, though probably only a very small amount of information can be transmitted reliably. For plagiarism detection, you only need a number of bits that scales like -log[your desired false positive rate] so it would seem theoretically possible. Theoretically it also doesn't matter if you use text or images, though in practice increasing the amount of transmitted data should make the task a lot easier. However, I'm not sure if something like this can be practically implemented using existing methods.
Instead of manually reordering clauses, etc, you could just run the original essay through another LLM without watermarking capability and ask it to write a new essay based on the original.
Then test the result against your own plagiarism detector and iterate through the watermark-less LLM until the resulting essay passes.
Or just proactively run it through a bunch of times.
Or just use the watermark-less LLM to begin with.. personal, unshackled, powerful LLMs are definitely on the trajectory we're headed in.
This is fairly theoretical work. It assumes that the parties (and the adversary) know the distribution precisely.
Its direct practical may be potentially somewhat limited because people aren't going around communicating randomly selected LLM outputs... and if you use LLM output in a context where text would be expected it could be distinguished.
It's not useful for watermarking as the first change will destroy all the rest of the embedding.
I can make a contrived example where it's directly useful: Imagine you have agents in the field, you could send out LLM generated spam to communicate with them. Everyone expects the spam to be LLM generated, so it's not revealing that its detectable as such. This work discusses how you can make the spam carry secret messages to the agents in a way that is impossible to detect (without the key, of course) even by an attacker that has the exact spam LLM.
Less contrived, a sufficiently short message from a sufficiently advanced LLM is probably indistinguishable from a real post in practice. -- but that's outside of the scope of the paper. It's hard (impossible?) to rigorously analyze that security model because we can't say much about the distribution of "real text" so we can't say how far from it LLM output is. The best models we have of the distribution of real text are these LLMs, if you take them to BE the distribution of real text then that approach is perfectly secure by definition.
But really, even if the situation it solves is too contrived to be a gain over alternatives, the context provides an opportunity to explore the boundary of what is possible.
A higher level of abstraction generative AI will soon be able to apply is in the design of algorithms that humans cannot possibly comprehend that go well beyond rudimentary newspaper want ad spycraft.
An example is smart contracts on the Corda blockchain, which is programmed in Kotlin. It may be me, but I have seen more Kotlin outside of Android than on it.
My business partner is an attractive young woman living in Seoul. I am based in California. She and I have regular phone meetings at 2 am her time. She uses our meeting time to jog around the city while talking. She's been doing this routine for 2 years and never had a problem.
When I visited Seoul, we went all over the city and I realized after 4 days that I had not seen a single policemen nor heard a single police siren. Seoul is 25 million people. I live in a "safe" California town of 60,000 people and see policemen all the time, and hear sirens regularly. I would caution any friend from jogging at 2am here.
In coffee shops in Seoul, people leave their wallets and laptops unattended while they go to the bathroom. By contrast, in my hometown in California, thieves have walked in, punched customers, grabbed laptops out of the customer's hands and run.
A very different world -- not just Seoul, but other cities in Asia.
I consider myself somewhat plugged into social media (Twitter, Insta, Tiktok, etc). But I admit I didn't know what Fediverse was and though it was some kind of government project, maybe for contractors to share documents relating to grants.
Still don't understand the connection between "Fed" and "Mastodon". I think a Mastodon-based social network should be called "Tuskverse" or "Pachyverse" or "Stompverse"
The connection is that Mastodon is just one of the "applications" built on top of ActivityPub protocol. Others are Pixelfed, Pleroma, Peertube, Writefreely, and several more.
Being based on the same protocol, they are all to some extent interoperable - e.g. you can follow Pixelfed users from your Mastodon account.
"Fediverse" is the umbrella term for all of these applications working together. It is a portmanteau for "federated" and "universe". The "federal police" association is unfortunate, but it is something USA only - rest of the world doesn't really see this.
It doesn't hold a monopoly on the concept of a federation, but it does (as far as I know) hold a monopoly on the association of "fed" = (federal) police in people's minds.
Intriguingly, "fed" is British youth slang for the police despite Britain not being a federal state and there being very few national police. A US cultural import, presumably.