This has been the thesis behind our product since the beginning (~3 years), before a lot of the current hype took hold. I'm excited to see it gain more recognition.
Chat is single threaded and ephemeral. Documents are versioned, multi-threaded, and a source of truth. Although chat is not appropriate as the source of truth, it's very effective for single-threaded discussions about documents. This is how people use requirements documents today. Each comment on a doc is a localized chat. It's an excellent interface when targeted.
Karpathy recently tweeted about the importance of being close to the data when training LLMs, and I think a similar level of rigor and transparency is essential when judging evaluations. Thanks for bringing us this insight from the data.
Building hierarchical abstractions on top of code is IMO the only way to truly enable AI to write code beyond better autocomplete. Kevin's work with DreamCoder shows that hierarchical abstractions can be built automatically in the code domain.
Once these abstractions exist side-by-side with natural language (essentially a code-natural language world model), it'll enable arbitrarily complex code generation from descriptions of the outcomes/results.
Doesn't the existence of this nice, encapsulated, hierarchical code make AI coding useless, rather than easy? You can just use those existing abstractions.
The reason AI coding systems exist is because it's hard, with our current programming language and libraries, to do things even if they've been done thousands of time before. If you can build new languages or new libraries/corpus where that's not a problem, you no longer need AI.
"The difference, when it comes to AI, is one of scale. ChatGPT can “read” more published words in a few seconds than I could in several lifetimes and, unlike me, that data isn’t immediately replaced in my human-limited short-term memory by whatever I’m thinking of next."
I think this misses the point. The issue of scale isn't on the ingest side, it's on the output side. Once you train an LLM on a book (however long that takes), then the LLM can be the interface to that book for an unlimited number of users. That scales very differently to, say, a person reading a book and writing something influenced by it.
In the case of the LLM, it's a complete interface to the contents of the book. It lets you "talk to the book". If that exists, why would anyone buy the book? If I could ask ChatGPT to "summarize the new book by XYZ", then spend an hour or two asking the questions _I_ have about the book from it, then buying the book would be a net negative.
If we don't solve attribution (like BMI solved for music), then the financial upside of publishing might be majority-captured by whoever trains LLMs on the copyrighted material.
By your argument, writing summaries of books, "explain <book> in 3 minutes" youtube videos, and commentaries on books should be made illegal too.
Or more precisely, they should be made illegal if and only if they achieve "scale" of maybe at least a couple million viewers.
The fundamental premise of copyright is flawed. Taking medieval concepts involving censorship of the printing press and extending them to the 21st century is bound to produce awkward results. I'm not hopeful that copyrights will be reconsidered from the ground up during this AI shock, but at least we shouldn't pretend that any arguments about copyrights should be reasonable and make sense. I honestly believe a "realpolitik" approach is more helpful, at least we know that those with more political influence and spend more effort lobbying will probably "win" in the end...
I appreciate the example, but here's where I think it differs as an analogy of what LLMs do:
A summary doesn't have infinite or variable depth. If you read the summary of a non-fiction (I'll limit my argument to that, as another poster pointed out) book, and either aren't convinced, or want to learn more about the matter, you'd have to purchase the book.
An LLM that has ben trained on the book, if somehow designed not to hallucinate, would be able to answer any question you have about the book at any depth, seamlessly blending in material from other books to answer a question or explain a concept. That seems like an entirely better experience than reading the book from start-to-finish. I don't see how the original can compete.
LLMs will never be able to not hallucinate. Also, it's insane to me that people like you would prefer to ask a chatbot about a book rather than read the book itself. Part of the value of books is the voice of an author.
This is a very basic and naive, poor scenario. Words are in public domain. But somehow their arrangements makes all the difference. Can AI just solve this "arrangement" problem better than humans do. Arrangements can be liked to series of moves in chess and AlphaGo solves for this through selfplay given only the rules.
Assuming you know the right questions to ask. Most people don't know what they don't know. I've tried this. I'd prefer to pay a small amount to read the book.
Is that really that much of a barrier? Off the top of my head: you could start with a prompt like "write a summary of book XYZ, followed by a summary of each chapter". Then dive deeper into each one from there using the same prompt recursively, etc.
Yes, it’s a tremendous barrier, which is why academic fields tend to have introductions or surveys of the domain, stepwise instruction, with more in-depth or specialist knowledge being premised on having this understanding. I feel like this should be obvious to everyone? Did your education not follow such a progression?
We know this isn't possible at the moment. Are we going to legislate for something that is not yet technologically possible? Should judges decide cases because maybe ML researchers will figure out how to reliably stop models from hallucinating?
Even for source code, the US does not offer copyright to programs that are simple enough to have effectively one way to accomplish the desired function rather than requiring creative (aka artistic) choices by the programmer.
Example program specification for which the straightforward implementation in any common programming language would not be copyrightable by itself without adding additional scope: “When executed, output ‘Hello, world!’ plus a new line character to standard output, and then exit returning exit code 0.”
> Even for source code, the US does not offer copyright to programs that are simple enough to have effectively one way to accomplish the desired function
I think the fact you use the word "function" here is extremely telling. Writing code is obviously in a closer intellectual domain to designing a car engine, than it is to drawing a picture.
Maybe you have ground to stand on when talking about things like code golf which could be analogous to poetry. But no, the vast majority of code is not the product of artistic expression. It is the product of functional desires.
Not sure why you're trying to make an argument about trivial software. The same is true about trivial art: draw a black square on a white canvas. Good luck claiming copyright for that.
I disagree that the vast majority of code lacks artistic expression, especially when using the inclusive sense of the word “artistic” (or often “creative”) that the law uses to determine copyrightability.
There are so many different styles and designs when implementing any nontrivial underlying functional specification, and the preferences, choices, skill, and aesthetic of individual programmers definitely shine through. The ways you and I would find it straightforward to write a given program and the way I would write the same program are very probably recognizably different, beyond purely functional programs like the one I gave.
The existence of an underlying functional desire does not change the necessary artistic element in how to achieve that desire. Even in the traditional art world, an underlying functional desire is often more present than you think. Many artworks throughout history and even today are in fact commissioned, whether explicitly per-piece or through a patronage or employment relationship. A commissioned artwork is trying to satisfy either the specifications or the desires of the client. And among those which aren’t commissioned, like personal photographs, the underlying desire is often a functional one of remembering an occasion, despite the many clearly copyrightable artistic choices and skill required to create the work.
The black square on a white canvas example could very well be copyrightable, and I’d even guess that it usually is. Your functional specification still leaves the artist much freedom to choose the dimensions, relative positions and angles, exact shades of color, and materials of both the black square and the white canvas, as well as the shape of the canvas. Many ways to do it - and, importantly, no obvious one straightforward way to do it as there is in my trivial programming example.
> I disagree that the vast majority of code lacks artistic expression
I disagree that I made a claim that you're disagreeing with here. I said the vast majority of code is the product of functional desire. Building an engine is the product of functional desire. Building a birdhouse is the product of functional desire. Building a bridge is the product of a functional desire. Drawing a portrait is the product of artistic expression. All of these require creative thinking. One of them is copyrightable. Software is definitely closer in intellectual domain to what is not copyrightable than to what is.
I don't think copyright should exist at all, but I also 100% think you're kidding yourself if you think most software is an artistic endeavor.
It can't be solved, by design. We want LLMs to behave naturally. Humans, naturally, don't provide any attribution, unless it really matters for the conversation.
No one (except for the copyright holders) wants LLMs to be a marketing department's dream, something straight out of cyberpunk novels, spewing brand names(tm) non-stop.
> then buying the book would be a net negative
Surely this is not true. At least for the fiction, people read books instead of their short summaries, because they want to spend time enjoying the story. That's why people are so against any spoilers.
> It lets you "talk to the book". If that exists, why would anyone buy the book?
Interactive and non-interactive experiences are two different things. Although, for sure, after a good book, I'd surely enjoy a "what-if" or "explain that" chat with an LLM (here, a possible business model for rightholders). But a chat cannot replace a story.
For a non-fiction, I probably might enjoy a brief summary first. That's why science papers start with an abstract, anticipating the reader's needs. But even then, if I'm interested, I will probably need full unabridged text to get into the exact details (without LLMs hallucinating me anything).
I’m confused why you claim attribution is somehow “unnatural”? Every actually useful lecture, essay, report, etc. I’ve encountered included things like footnotes, references, or a bibliography. So much so, in fact, that I tend to disregard things that don’t include them. So-and-so claims X. What are their sources? There are none? Who cares. Life is too short to engage with arguments that lack rigor or support, even though these things themselves require verification!
Life is too short for me to engage with your argument, because you've failed to attribute the first writers of sentences / ideas semantically similar to each of the lines in your comment.
Ah, you’re right. My mistake. I should’ve simply claimed it’s natural to cite sources instead. After all, there is no debating what is natural or those who are simple.
My apologies, my perception of LLMs is somewhat skewed, because I primarily think of conversation agents.
It's unnatural in a conversation. When we're talking about, say, Superman, we don't ever say that it's "a registered trademark of DC Comics, Inc." With obligatory exceptions for comical or satirical effects, or if we're specifically talking about trademarks or copyrights, etc. And of course when we're talking about robots we don't normally give any nods to Karel Čapek.
I believe that, same as humans, LLMs already try to provide references when requested, or if the style/format (such as lecture) prompts for having them. Just remember that famous anecdote where a lawyer used ChatGPT and it wrote a speech and provided believable references (then judge threw this out of court because quality/reliability is another problem - which is out of scope, though).
You're right. I think it's fair to carve out fiction from my argument. For that, I would surely go to the source material until the point where the LLM was coming up with better long-form fiction de-novo. But for non-fiction, which I would guess is the economically and intellectually more important category to protect, the effects may be devastating.
I also agree that attribution can't be solved easily in the current paradigm. Perhaps, during training, one could deduce how much of the net gradient on a particular weight was derived from the batches covering some book, and then during inference, assign attribution based on the effect of that weight on the output. All of this is very expensive to do, and I don't have strong intuitions for whether the resulting attributions would be in any way meaningful.
To your point about hallucinations, if there's not a solution to that, then perhaps the whole point is moot when, after a while, the hype dies down. But if somehow hallucinations are solved (I don't see a technical way this can happen now, but who knows?), then I think we'll need to address attribution for non-technical material.
My impression is that attribution on limited datasets isn't terribly hard. If you can prompt the LLM to say a sentence that is approximately in the source material, then the nearest sentence vector in the source material can be looked up in a vector DB, which can attribute it in context.
I think this might be one of the few places where LLMs can provide straightforward value, since it can work as a search engine that can accept vague queries, create approximate answers, fetch the real answers, translate the source material into layman's terms with citations, and allow the newly informed user to refine or dig deeper with that context. The most dangerous part is translation, and the data I've seen show that transformers almost never hallucinate on tasks where no external knowledge is needed.
Because (if the book affords it), reading can be a form of psychic traveling. A reader enters an altered state of consciousness, lives in the world of the book, and comes back changed.
A summary of the information and 'plot points' would seem like a replacement only for those who have never really been absorbed in reading a book.
> why would anyone buy the book? If I could ask ChatGPT to "summarize
Summary is not the same as reading the book. Anyone can read reviews, human written summaries, on internet, or even some analyses, instead of asking AI model or buying a book. AI model usually cannot reproduce even small fragments. But it can indefinitely 'creatively' fantasize in books universe. Usually messing facts and mixing it all with other books.
By the way model doesn't have to be as big as ChatGPT. Anyone with good gaming GPU can get an open source free model and train it for academic research. Mixing fantasy with something else can produce interesting results.
> If I could ask ChatGPT to "summarize the new book by XYZ", then spend an hour or two asking the questions _I_ have about the book from it, then buying the book would be a net negative.
I believe you lose some data when doing so, summarizing is good when you want to get the gist of it, but not good when you want the actual details.
I know, this sounds very obvious but some people seriously jump to a summary directly and believe that is enough when they research.
The addition of many remote positions since COVID has been a great hiring filter for us. For the type of work we do (AI, R&D), and the culture that we find most productive and enjoyable (enthusiasm because we love the work, a sense of working in a team), remote was a real downgrade when we tried it.
We advertise the job as on-site only, and because of that the applications self-select for those that want in-office work. It's made our interviews more focused on technical ability.
I think this is a better equilibrium overall. Those on either side of the remote/on-site preference can find the right respective jobs and work cultures.
It is possible to build technical solutions that take the work and luck out of social organization by shouldering the heavy lifting, but then get out of the way once people meet IRL.
The best tool for this right now is calendar and email. There's a lot of room for improvement.
Please PM me if you're interested in a deeper discussion. I've thought a bit about what this tool might look like and how to get it off the ground.
reply