So much for "but deepseek doesn't do multi-modal..." as a defence of the alleged moats of western AI companies.
How ever many modalities do end up being incorporated however, does not change the horizon of this technology which has progressed only by increasing data volume and variety -- widening the solution class (per problem), rather than the problem class itself.
There is still no mechanism in GenAI that enforces deductive constraints (and compositionality), ie., situations where when one output (, input) is obtained the search space for future outputs is necessarily constrained (and where such constraints compose). Yet all the sales pitches about the future of AI require not merely encoding reliable logical relationships of this kind, but causal and intentional ones: ones where hypothetical necessary relationships can be imposed and then suspended; ones where such hypotheticals are given a ordering based on preference/desires; ones where the actions available to the machine, in conjunction with the state of its environment, lead to such hypothetical evaluations.
An "AI Agent" replacing an employee requires intentional behaviour: the AI must act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically. However there has been no progress on these fronts.
I am still unclear on what the sales pitch is supposed to be for stochastic AI, as far as big business goes or the kinds of mass investment we see. I buy a 70s-style pitch for the word processor ("edit without scissors and glue"), but not a 60s-style pitch for the elimination of any particular job.
The spend on the field at the moment seems predicated on "better generated images" and "better generated text" somehow leading to "an agent which reasons from goals to actions, simulates hypothetical consequences, acts according to causal and environmental constraints.. " and so on. With relatively weak assumptions one can show the latter class of problem is not in the former, and no amount of data solving the former counts as a solution to the latter.
The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.
> [...] this technology which has progressed only by increasing data volume and variety
Sure, if you ignore major shifts after 2022, I guess? Test-time-compute, quantization, multimodality, RAG, distillation, unsupervised RL, state-space models, synthetic data, MoEs, etc ad infinitum. The field has rapidly blown past ChatGPT affirming the (data) scaling laws.
> [...] where when one output (, input) is obtained the search space for future outputs is necessarily constrained
It's unclear to me why this matters, or what advantage humans have over frontier sequence models here. Hell, at least the latter have grammar-based sampling, and are already adept with myriad symbolic tools. I'd say they're doing okay, relative to us stochastic (natural) intelligences.
> With relatively weak assumptions one can show the latter class of problem is not in the former
Please do! Transformers et al are models for any general sequences (e.g. protein structures, chatbots, search algorithms, etc). I'm not seeing a fundamental incompatibility here with goal generation or reasoning about hypotheticals.
If your point is that there's a very very wide class of problems whose answer is a sequence (of actions, propositions, etc.) -- then you're quite correct.
But that isn't what transformers model. A transformer is a function of historical data which returns a function of inputs by inlining that historical data. You could see it as a higher-order function: promptable : Prompt -> Answer = transformer(historical_data) : Data -> (Prompt -> Answer)
it is true that Prompt, Answer both lie within Sequence; but they do not cover Sequence (ie., all possible sequences) nor is their strategy of computing an Answer from a Prompt even capable of searching the full space (Prompt, Answer) in a relevant way.
In particular, its search strategy (ie., the body of the `prompter`) is just a stochastic algorithm which takes in a bytecode (weights) and evaluates them by biased random jumping. These weights are an inlined subspace of Prompt,Answer by sampling this space based on historical frequencies of prior data.
This generates Answers which are sequenced according to "frequency-guided heuristic searching" (I guess a kind of "stochastic A* with inlined historical data"). Now this precludes imposition of any deductive constraints on the answers, eg., (A, notA) should never be sequenced, but can be generated by at least one search path in this space, given a historical dataset in which A, notA appear.
Now, things get worse from here. What a proper simulation of counterfactuals requires is partioning the space of relevant Sequences into coherent subsets (A, B, C..); (A', B', C') but NOT (A, notA, A') etc. This is like "super deduction" since each partition needs to be "deductively valid", and there needs to be many such partitions.
And so on. As you go up the "hierarchy of constraints" of this kind, you recursively require ever more rigid logical consistency, but this is precluded even at the outset. Eg., consider that a "Goal" is going to require classes of classes of such constrained subsets, since we need to evaluate counterfactuals to determine which class of actions realise any given goal, and any given action implies many consequences.
Just try to solve the problem, "buying a coffee at 1am" using your imagination. As you do so, notice how incredibly deterministic each simulation is, and what kind of searching across possibilities is implied by your process of imagining (notice, even minimally, you cannot imagine A & notA).
The stochastic search algorithms which comprise modern AI do not model the space of, say, Actions in this way. This is only the first hurdle.
> This generates Answers which are sequenced according to "frequency-guided heuristic searching" (I guess a kind of "stochastic A* with inlined historical data")
This sounds way too simplistic of an understanding. Transformers aren't just heuristically pulling token cards out of a randomly shuffled deck, they sit upon a knowledge graph of embeddings that create a consistent structure representing the underlying truths and relationships.
The unreliability comes from the fact that within the response tokens, "the correct thing" may be replaced by "a thing like that" without completely breaking these structures and relationships. For example: In the nightmare scenario of a STAWBERRY, the frequency of letters themselves had very little distinction in relation to the concept of strawberries, so they got miscounted (I assume this has been fixed in every pro model). BUT I don't remember any 2023 models such as claude-3-haiku making fatal logical errors such as saying "P" and "!P" while assuming ceteris paribus unless you went through hoops trying to confuse it and find weaknesses in the embeddings.
You've just given me the heuristic, and told me the graph -- you haven't said A* is a bad model, you've said it's exactly the correct one.
However, transformers do not sit on a "knowledge graph", since the space is not composed of discrete propositions set in discrete relationships. If it were, then P(PrevState|NextState) = 0 would obtain for many pairs of states -- this would destroy the transformers ability to make progress.
So rather than 'deviation from the truth' being an accidental symptom, it is essential to its operation: there can be no distinction-making between true/false propositions for the model to even operate.
> making fatal logical errors such as saying "P" and "!P"
Since it doesn't employ propositions directly, how you interpret its output in propositional terms will determine if you think it's saying P&!P. This "interprerting-away" effect is common in religious interpretations of texts where the text is divorced from its meaning, a new one substituted, to achieve apparent coherence.
Nevertheless, if you're asking (Question, Answer)-style prompts where there is a cannonical answer to a common question, then you're not really asking it to "search very far away" from its inlined historical data (the ersatz knowledge-graph that it does not possess).
These errors become more common when the questions require posing several counterfactual scenarios derived from the prompt or otherwise have non-cannonical answers which require integrating disparate propositions given in a prompt.
The prompt's propositions each compete to drag the search in various directions, and there is no constraint on where it can be dragged.
I am not going to engage with your A* proposition. I believe it to be irrelevant.
> However, transformers do not sit on a "knowledge graph", since the space is not composed of discrete propositions set in discrete relationships.
This is the main point of contention. By all means, embeddings are a graph, as you can use a graph to represent its datastructure, but not a tree. Sure, they are essentially points in space, but a graph emerges as the architecture starts selecting tokens for use according to the learned parameters during inference. It will always be the same graph for the same set of tokens for a given data set which provides "ground truth". I know it sounds metaphoric but bare with me.
The above process doesn't result in discrete propositions like we have in prolog, but the point is, it is "relatively" meaningful, and you seed a traversal by bringing tokens to the attention grid. What I mean by relatively meaningful is that inverse relationships are far enough that they won't usually be confused, so there is less chance of meaningless gibberish emerging which is what we observe.
If I replaced "transformer" in your comment with "human", what changes? That's my point.
Humans are a "function of historical data" (nurture). Meatbag I/O doesn't span all sequences. A person's simulations are often painfully incoherent, etc. So what? These attempts at elevating humans seems like anthropocentric masturbation. We ain't that special!
>There is still no mechanism in GenAI that enforces deductive constraints (and compositionality), ie., situations where when one output (, input) is obtained the search space for future outputs is necessarily constrained (and where such constraints compose).
I build these things for a living.
This is a solved problem.
You use multiple different types of models to supervise the worker models and force them to redo the work until you get a result that makes sense, or they fail and you give the resulting dump to a human to figure out what went wrong or ignore it.
Inference time compute is through the roof, but when you can save thousands of dollars by spending hundreds it's a no brainer.
Some people want AI to be as infallible as god before they'd consider it useful.
Not sure why people keep falling into these mental traps.
Regardless of whether the system you're deriding is a "Chinese room", "stochastic parrot", "brute force" or whatever other dericive term-du-jour you want to use, if the system performs the required task, the only thing that actually matters is its cost to operate.
And if that cost is less than paying a human, that human, and society at large is in trouble.
Depends what problem you're trying to solve. Have we built something that can replace us completely in terms of reasoning? Not yet.
We have built something that can multiply a single persons productivity and in some constrained scenarios replace people entirely. Even if say your customer support bot is only 80% effective ( only 20% of interactions require humans to intervene ) that still means you can fire 80% of your support staff. And your bots will only get cheaper, faster, better while your humans require salary increases, hiring staff, can get sick, can't work 24/7 etc.
People so often forget that good is not the enemy of perfect.
“The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems, social alignment in their solutions, ownership of decision making / risk, action under risk, and so on”
Exactly! What a perfect formulation of the problem.
> The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.
I agree. That's why I think the next step is automating trivial physical tasks, i.e. robotics, not automating nontrivial knowledge tasks.
> An "AI Agent" replacing an employee requires intentional behaviour: the AI must act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically. However there has been no progress on these fronts.
This is a great example of how it's much easier to describe a problem that to describe possible solutions.
The mechanisms you've described are easily worth several million dollars. You can walk into almost any office and if you demonstrate you have a technical insight that could lead to a solution, you can name your price and $5M a year will be considered cheap.
Given that you're experienced in the field, I'm excited by your comment because its force and clarity suggest that you have some great insights into how solutions might be implemented but that you're not sharing with this HN class. I'm wishing you the best of luck. Progress in what you've described is going to be awesome to witness.
The first step may be formulating a programming language which can express such things to machine. We are 60% of the way there, I believe only another 20% is achievable -- the rest is a materials science problem
Had we an interpreter for such a language, a transformer would be a trivial component
This assumes one ai replaces one human, but what’s much more likely in the short term is one human plus ai replaces four humans. The ai augments the human, and vice versa. A borg is still better than either of its components.
I agree though, search space constraint is a glaring limitation at the moment. Notebooklm accomplished some amount of focus.
> The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.
There's a lot of pretty trivial shit to automate in the economy, but I think the gist of your comment still stands. Of the trivial stuff that remains to be automated, a lot of it can be done with Zapier and low-code, or custom web services. Of what remains after that, a lot is as you (eloquently) say hugely dependent on human agency; only a small fraction of that will be solvable by LLMs.
As the CTO of a small company the only opportunities for genuinely useful application of LLMs right now are workloads that would've could've been done by NLU/NLP (extraction, synthesis, etc.). I have yet to see a task where I would trust current models to be agents of anything.
The bulk of the computer work for the “knowledge class” is data mangling and transit. Like managing a SaaS app for your sales pipeline inputting results/outcomes of leads, aggregating stuff happening in various another places, uploading lists and connecting other SaaS apps together, which all then generates other data that gets translated to excel (because SaaS BI tools are rarely good enough) and humans analyze it and communicate the data.
Even though we have a million web services there’s still tons of work getting the data in and across them all as they are all silos with niche usecases and different formats.
There’s a reason most Zapier implementations are as crazy as connected Excel sheets
> An "AI Agent" replacing an employee requires intentional behaviour: the AI must act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically.
I mean this in the least cynical way possible: the majority of human employees today do not act this way.
> The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.
This simply isn't true. Take any law firm today for example - for every person doing the social alignment, ownership and risk-taking, there is an army of associates taking notes, retrieving previous notes and filling out boilerplate.
That kind of work is what AI is aiming to replace, and it forms the bulk of employment in the global West today.
The illusion you appeal to is so common, it ought have a name. I guess something like the "reptition-automaton illusion", I don't know or perhaps "the alienation of the mind in creative labour" . Here's a rough definition: the mistaken belief that producing repetitive products employ only repeatable actions (, skills, etc.).
A clear case: acting. An actor reads from a script, the script is pregiven. Presumably nothing could be more repetitive: each rehearsal is a repeat of the same words. And yet Antony Hopkins isn't your local high schooler, and the former paid millions and the latter not.
That paralegals work from the same template contracts, and produce very similar looking ones, tells you about the nature of what's being produced: that contracts are similar, work from templates, easily repeated, and so on. It really tells you nothing about the work (only under an assumption we could call "zero creativity"). (Consider if that if law firms were really paid for their outputs qua repeats, then they'd be running on near 0% profit margins.)
If you ask law firms how much they're employning GenAI here you'll hear the same ("we tried it, and it didnt work; we dont need our templates repeated with variation they need to be exact, and filled in with specific details from clients, etc."). And I know this because I've spoken to partners at major law firms on this matter.
The role of human beings in much work today is as I've described. The job of the paralegal is already very automated: templates for the vast majority of their contract work exist, and are in regular use. What's left over is very fine-grained, but very high-value, specialisation of these templates to the given case -- employing the seeking-out of information from partners/clients/etc., and so on.
The great fear amongst people subject to this "automaton" illusion is that they are paid for their output, and since their output is (in some sense) repeated and repeatable, they can be automated away. But these "outputs" were in almost all cases nighmarish liabilities: code, contracts, texts, and so on. They aren't paid to produce these awful liabilities, they are paid to manage them effectively in a novel business environment.
Eg., programmers aren't paid for code, they're paid to formalise novel business problems in ways that machines can automate. Non-novel solutions are called "libraries", and you can already buy them. If half of the formalisation of the business problem becomes 'formulating a prompt' you havent changed the reason the business employs the programmer
This is probably the best description of the central issue I've seen. I know even in my own work, which is a very narrow domain in software, I've found it troublesome to automate myself. Not because the code I write is unique or all that difficult, but because the starting conditions I begin with depend on a long history of knowledge that I've built up, an understanding of the business I'm part of, and an understanding of user behavior when they encounter what I've built.
In other words, I can form a prompt that often one-shots the code solution. The hard part is not the code, it's forming that prompt! The prompt often includes a recommendation on an approach that comes from experience, references to other code that has done something similar, and so on. I'm not going to stop trying to automate myself, but it's going to be a lot harder than anyone realized when LLMs first came out.
You're correct, but what can be affected is the number of workers. Considering the example of the acting career, in the old times every major city would have a number of actors and playhouses. Cinema and TV destroyed this need and the number of jobs for local actors is minuscule now.
Great comment. Maybe I'm missing it, but I'm puzzled why I don't see more discussion of the intentionality you refer to.
Things are interesting now but they will be really interesting when I don't tell the agent what problem I want it to solve, but rather it tells me what problems it wants to solve.
MMMU is not particularly high. Janus-Pro-7B is 41.0, which is only 14 points better than random/frequent choice. I'm pretty sure, their base DeepSeek 7B LLM will get around 41.0 MMMU without access to images, this is a normal number for a roughly GPT4-level LLM base with no access to images.
very balanced thought. World does run on incentives and social structure defines a major role . I am not sure how AI can ever replace that . I love your analogy of 70s word processor . I have always referred AI to my folks that it is nothing but an updated version of clippy
You are both right, and that's where it gets interesting.
While the category of tedious work you have described is indeed heavily optimized, it is also heavily incentivized by the structure of our economy. The sheer volume of tedious unnecessary work that is done today represents a very significant portion of work that is done in general. Instead of resulting in less work, the productivity gains from optimization have simply lead to a vacuum that is immediately filled with more equivalent work.
To get a sense for the scale of this pattern, consider the fact that wages in general have been stagnant since the mid '70s, while productivity in general has been skyrocketing. Also consider the bullshit jobs you are already familiar with, like inter-insurance healthcare data processing in the US. We could obviously eliminate millions of these jobs without any technical progress whatsoever: it would only require enough political will to use the same single-payer healthcare system every other developed nation uses.
Why is this the case? Why are we (as individual laborers) not simply working less or earning more? Copyright.
---
The most alluring promise of Artificial Intelligence has always been, since John McCarthy coined the term, to make ambiguous data computable. Ambiguity is the fundamental problem no one has been able to solve. Bottom-up approaches including parsing and language abstractions are doomed to unambiguous equivalence to mathematics (see category theory). No matter how flexible lisp is, it will always express precisely the answers to "What?" and "How?", never "Why?". The new wave of LLMs and Transformers is a top-down approach, but it's not substantive enough to really provide the utility of computability.
So what if it could? What if we had a program that could actually compute the logic present in Natural Language data? I've been toying with a very abstract idea (the Story Empathizer) that could potentially accomplish this. While I haven't really made progress, I've been thinking a lot about what success might look like.
The most immediate consequence that comes to mind is that it would be the final nail in the coffin for Copyright.
---
So what does Copyright have to do with all of this? Copyright defines the rules of our social-economic system. Put simply, Copyright promises to pay artists for their work without paying them for their labor. To accomplish this, Copyright defines "a work" as a countable item, representing the result of an artists labor. The artist can then sell their "work" over and over again to earn a profit on their investment of unpaid labor.
To make this system function, Copyright demands that no one collaborate with that labor, else they would breach the artist's monopoly on their "work". This creates an implicit demand that all intellectual labor be, by default, incompatible. Incompatibility is the foundational anti-competitive framework for monopoly. If we can work together, then neither of us is monopolizing.
This is how Facebook, Apple, Microsoft, NVIDIA, etc. build their moats. By abusing the incompatibility bestowed by their copyrights, they can demand that meaningful competition be made from completely unique work. Want to write a CUDA-compatible driver? You must start from scratch.
---
But what if your computer could just write it for you? What if you could provide a reasonably annotated copy of NVIDIA's CUDA implementation, and just have AI generate an AMD one? Your computer would be doing the collaboration, not you. Copyright would define it as technically illegal, but what does that matter when all of your customers can just download the NVIDIA driver, run a script, and have a full-fledged AMD CUDA setup? At some point, the incompatibility that Copyright depends on will be factored out.
But that begs the question: Copyright is arbitrary to begin with, so what if we just dropped it? Would it really be that difficult to eliminate bullshit work if we, as a society, were simply allowed to collaborate without permission?
"There is still no mechanism in GenAI that enforces deductive constraints (and compositionality), ie., situations where when one output (, input) is obtained the search space for future outputs is necessarily constrained (and where such constraints compose). Yet all the sales pitches about the future of AI require not merely encoding reliable logical relationships of this kind, but causal and intentional ones: ones where hypothetical necessary relationships can be imposed and then suspended; ones where such hypotheticals are given a ordering based on preference/desires; ones where the actions available to the machine, in conjunction with the state of its environment, lead to such hypothetical evaluations."
Everything you said in this paragraph is not just wrong, but it's practically criminal that you would go on the internet and spread such lied and FUD so confidently.
If you think my confidence is misplaced, feel free to offer a counterpoint. I feel as you do about people who would say the opposite of what I am saying, though, I'd think them naive, gullible, credulous over criminal.
Stochastic AI, by definition, does not impose discrete necessary constraints on inference. It does not, under very weak assumptions, provide counterfactual simulation of alternatives. And does not provide a mechanism of self-motivation under environmental coordination.
Why? Since [Necessarily]A|B is not reducible to P(A|B, Model) -- but requires P(A|B) = 0 \forall M. Since P(A|B) and P(B|A) are symmetric in cases where A -causes-> B are not. Since Action = max P(A->B|Goal,Environment) is not the distribution P(A, B, Goal, Environment) or any conditioning of it. Since Environment is not Environment(t), and there is no formulation of Goal(t, t`), Environment(t, t`), (A->B)(t, t`) I am aware of which maintains relevant constraints dynamically without prior specification (one aspect of the Framing Problem).
Now if you have a technology in mind which is more than P(A|B), I'd be interested in hearing it. But if you just want to insist that your P(A|B) model can do all of the above, then, I'd be inclined to believe you are if not criminal, then considerably credulous.
> act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically.
I don't know what this means, but it would make a great prompt.
Consider writing a program with types and semi-colons. Now, instead of giving variables a deterministic input you randomly sample from the allowed values of that type. And instead of `;` meaning, "advance one statement" it means "advance to some random statement later on in the program".
Is evaluated `example(63) // C63,A63,B63` on one run, and example(21), etc. on another.
This is something like the notion of "program" (or "reasoning") which stochastic AI provides, though its a little worse than this, since programs can be composed (ie., you can cut-and-paste lines of code and theyre still valid) -- where as the latent representations of "programs" as weights do not compose.
So what i mean by "deductive" constraints is that the AI system works like an actual program: there is a single correct output for a given input, and this output obtains deterministically: `int` means "an int", `;` means `next statement".
In these terms, what I mean by "causal" is that the program has a different execution flow for a variety of inputs, and that if you hit a certain input necessarily certain execution-flows are inaccessible, and other ones activated.
Again analogously, what I mean by "act according to a goal" is that of a family of all available such programs: P1..Pn, there is a metaprogram G which selects the program based on the input, and recurses to select another based on the output: so G(..G(G(P1..Pn), P2).. where G models preferences/desires/the-environment and so on.
In these very rough and approximate terms it may be more obvious why deductive/causal/intentional behaviour from a stochastic system is not reliably produced by it (ie., why a stochastic-; doesnt get you a determinsitic-;). By making the program extremely complex you can get kinda reliable deductive behaviour (consider eg., many print(A), many print(B), many print(C) -- so that its rare it jumps out-of-order). However, you pile on more deductive constraints you make out-of-order jumps / stochastic-behaviour exponentially more fragile.
Consider trying to get many families of deterministic execution flows (ie., programs which model hypothetical actions) from a wide variety of inputs with a "stochastic semi-colon" -- the text of this program would be exponentially larger than one with a deterministic semi-colon --- and would not be reliable!
How ever many modalities do end up being incorporated however, does not change the horizon of this technology which has progressed only by increasing data volume and variety -- widening the solution class (per problem), rather than the problem class itself.
There is still no mechanism in GenAI that enforces deductive constraints (and compositionality), ie., situations where when one output (, input) is obtained the search space for future outputs is necessarily constrained (and where such constraints compose). Yet all the sales pitches about the future of AI require not merely encoding reliable logical relationships of this kind, but causal and intentional ones: ones where hypothetical necessary relationships can be imposed and then suspended; ones where such hypotheticals are given a ordering based on preference/desires; ones where the actions available to the machine, in conjunction with the state of its environment, lead to such hypothetical evaluations.
An "AI Agent" replacing an employee requires intentional behaviour: the AI must act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically. However there has been no progress on these fronts.
I am still unclear on what the sales pitch is supposed to be for stochastic AI, as far as big business goes or the kinds of mass investment we see. I buy a 70s-style pitch for the word processor ("edit without scissors and glue"), but not a 60s-style pitch for the elimination of any particular job.
The spend on the field at the moment seems predicated on "better generated images" and "better generated text" somehow leading to "an agent which reasons from goals to actions, simulates hypothetical consequences, acts according to causal and environmental constraints.. " and so on. With relatively weak assumptions one can show the latter class of problem is not in the former, and no amount of data solving the former counts as a solution to the latter.
The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.