AI systems are trained against private, PII, copyrighted data all the time witho...

d4mi3n · on Dec 14, 2023

The point about spell check is a very good one. I think one big differentiator is that the attack/risk surface for something as complex as a LLM is much higher and that much, much more information is encoded in an LLM than a spell checking dictionary.

For example, it's possible to extract training data from an LLM—which could include PII/medical data/etc. Those risks don't exist with spellcheck as far as I'm aware.

To your point about what is "AI", I'd state that AI is a misnomer. What we're really talking about are generative large language models (LLMs). What an _can_ be considered an LLM is definitely up for debate, but if you were to describe one in general terms I think we could reasonably say that most (or all) things we consider LLMs are:

  1. A probabilistic model of a natural language
  2. Have the ability to interpret and generate natural language text
  3. Typically encode a large volume of training data

I'd love to hear other thoughts on how one would define an LLM in practical, simple language. I imagine doing so would be a pre-requisite to any effective legislation.