There is no such thing as too much optimization. Early stopping is to prevent overfitting to the training set. It's a trick just like most advances in deep learning because the underlying mathematics is fundamentally not suited for creating intelligent agents.
Is over fitting different from 'too much optimization'? Optimization still needs a value that is optimized. Over fitting is the result of too much optimization for not quite the right value (i.e. training error when you want to reduce prediction error)
I think the miscommunication is due to the proxy nature of our modeling. From one perspective, yes you're right because it's just on your optimization function and objectives. But if we're in the context where we recognize the practical usage of our model replies on it being an inexact representation (proxy) then certainly there is too much optimization. I mean most of what we try to model in ML is intractable.
In fact, that entire notion of early stopping is due to this. We use a validation set as a pseudo test set to inject information into our optimization products without leaking information from the test set (why you shouldn't choose parameters based on test results. That is spoilage. Doesn't matter if it's status quo, it's spoilage)
But we also need to consider that a lack of divergence between train/val does not mean there isn't overfittng. Divergence implies overfittng but the inverse statement is not true. I state this because it's both relevant here and an extremely common mistake.
Most practitioners seem to understand that what they are doing is creating executable models and they don't confuse the model based on numeric observations with the actual reality. This is why I very much do not like all the AI hype and how statistical models were rebranded as artificial "intelligence" because the people who are not aware of what the words mean get very confused and start thinking they are nothing more than computers executing algorithms to fit numerical data to some unspecified cognitive model.
> Most practitioners seem to understand that what they are doing is creating executable models and they don't confuse the model based on numeric observations with the actual reality.
I think you're being too optimistic, and I'm a pretty optimistic person. Maybe it is because I work in ML, but I've had to explain to a large number of people this concept. This doesn't matter if it is academia or industry. It is true for both management and coworkers. As far as I can tell, people seem very happy to operate under the assumption that benchmark results are strong indicators of real world performance __without__ the need to consider assumptions of your metrics or data. I've even proven this to a team at a trillion dollar company where I showed a model with lower test set performance had more than double the performance on actual customer data. Response was "cool, but we're training a much larger model on more data, so we're going to use that because it is a bit better than yours." My point was that the problem still exists in that bigger model with more data, but that increased params and data do a better job at hiding the underlying (and solvable!) issues.
In other words, in my experience people are happy to be Freeman Dyson in the conversation Calavar linked[0] and very upset to hear Fermi's critique: being able to fit data doesn't mean shit without either a clear model or a rigorous mathematical basis. Much of data science is happy to just curve fit. But why shouldn't they? You advance your career in the same way, by bureaucrats who understand the context of metrics even less.
I've just experienced too many people who cannot distinguish empirical results from causal models. And a lot of people who passionately insist there is no difference.
Are you launching into a semantic argument about the word 'experience'? If so, it might help to state what essential properties alphago was missing that makes it 'not having an experience'.
Otherwise this can quickly devolve into the common useless semantic discussion.
Just making sure no one is confused by common computationalist sophistry and how they attribute personal characteristics to computers and software. People can have and can create experiences, computers can only execute their programmed instructions.
And I am saying they are confused because they are attributing personal characteristics to computers and software. By spelling out what computers are doing it becomes very obvious that there is nothing that can be aware of any experiences in computers as it is all simply a sequence of arithmetic operations. If you can explain which sequence of arithmetic operations corresponds to "experiences" in computers then you might be less confused than all the people who keep claiming computers can think and feel.
> By spelling out what computers are doing it becomes very obvious that there is nothing that can be aware of any experiences in computers as it is all simply a sequence of arithmetic operations.
By spelling out what brains are doing it becomes very obvious that it's all simply a sequence of chemical reactions - and yet here we are, having experiences. Software will never have a human experience - but neither will a chimp, or an octopus, or a Zeta-Reticulan.
Mammalian neurons are not the only possible substrate for intelligence; if they're the only possible substrate for consciousness, then the fact that we're conscious is an inexplicable miracle.
If an algorithmic process is an experience and a collection of experiences is intelligence then we get some pretty wild conclusions that I don't think most people would be attempting to claim as it'd make them sound like a lunatic (or a hippy).
Consider the (algorithmic) mechanical process of screwing in a screw into a board. This screw has an "experience" and therefore intelligence. So... The screw is intelligent? Very low intelligence, but intelligent according to this definition.
But we have an even bigger problem. There's the metaset of experiences, that's the collection of several screws (or the screw, board, and screwdriver together). So we now have a meta intelligence! And we have several because there's the different operations on these sets to perform.
You might be okay with this or maybe you're saying it needs memory. If the later you hopefully quickly realize this means a classic computer is intelligent but due to the many ways information can be stored it does not solve our above conundrum.
So we must then come to the conclusion that all things AND any set of things have intelligence. Which kinda makes the whole discussion meaningless. Or, we must need a more refined definition of intelligence which more closely reflects what people actually are trying to convey when they use this word.
> If an algorithmic process is an experience and a collection of experiences is intelligence
Neither, what I'm saying is that the observable correlates of experience are the observable correlates of intelligence - saying that "humans are X therefore humans are Y, software is X but software is not Y" is special pleading. The most defensible positions here are illusionism about consciousness altogether (humans aren't Y) or a sort of soft panpsychism (X really does imply Y). Personally I favor the latter. Some sort of threshold model where the lights turn on at a certain point seems pretty sketchy to me, but I guess isn't ruled out. But GP, as I understand them, is claiming that biology doesn't even supervene on physics, which is a wild claim.
> Or, we must need a more refined definition of intelligence which more closely reflects what people actually are trying to convey when they use this word.
Well that's the thing, I don't think people are trying to convey any particular thing. I think they're trying to find some line - any line - which allows them to write off non-animal complex systems as philsophically uninteresting. Same deal as people a hundred years ago trying to find a way to strictly separate humans from nonhuman animals.
Continuing this reductio ad abusrdum, you might reach the fallactious conclusion, as some famous cranks in the past did, that intelligence is even found in plants, animals, women, and even the uncivilized savages of the new continent.
Intelligence appears in gradients, not a simple binary.
> Intelligence appears in gradients, not a simple binary.
Sure, I'm in no way countering such a notion and your snarky comment is a gross mischaracterization of my comment. So far off I have a difficult time believing it isn't intentional.
The "surprise" is not that plants, animals, or even women turn out to be intelligent under the definition of "collection of experiences" but that rocks have intelligence, atom, photons, and even more confusingly groups of photons, the set of all doors, the set of all doors that such that only one door per city exists in the same set. Or any number of meta collections. This is the controversial part, not women being intelligent. Plants are still up for debate, but I'm very open to a broad definition of intelligence.
But the issue is that I, and the general fields of cognitive science, neuroscience, psychology, and essentially everyone except for a subset of computer scientists, agree that intelligence is more than a collection of experiences (including if that collection has memory). In other words, it is more than a Turing Machine. What that more is, is debated but it is still generally agreed upon that intelligence requires abstraction, planning, online learning, and creativity. But all these themselves have complicated nuanced definitions that are much more than what the average person thinks they mean. But that's a classic issue where academics use the same words normal people do but have far more restrictions on their meaning. Which often confuses the average person when they are unwilling to accept this fact that words can have different meanings under different contexts (despite that we all do this quite frequently and such a concept exists in both our comments).
You seem to use the word intelligence to mean `consciousness` (if you replaced the first with the latter I would agree with your argument).
I would define "intelligence" as (1) the ability to learn or understand or to deal with new or trying situations and (2) the ability to apply knowledge to manipulate one's environment.
It turns out that this is also the Merriam-Webster definition [0].
By that definition, yes AlphaZero was learning and understanding how to deal with situations and is intelligent, and yes most machine-learning systems and many other systems that have a specific goal and manipulate data/the environment to optimize for that goal, are intelligent.
By this definition, a non-living, non-conscious entity can be intelligent.
And intelligence has nothing to do with "experiences" (which seem to belong in the "consciousness" debate).
This is a common retort. You can read my other comments if you want to understand why you're not really addressing my points because I have already addressed how reductionism does not apply to living organisms but it does apply to computers.
The comments where you demand an instruction set for the brain, or else you'll dismiss any argument saying its actions can be computed? Even after people explained that lots of computers don't even have instruction sets?
And where you decide to assume that non-computable physics happens in the brain based on no evidence?
What a waste of time. You "addressed" it in a completely meaningless way.
Every software platform is essentially a digital panopticon and as AI gets deployed more widely in the real world this will also be increasingly true for non-software interactions. My guess is everyone is eventually going to carry an AI "assistant" that records all signals and gives guidance to its owner just like most people in the developed world today carry a cell phone and a credit card.
LLMs can be whatever labels people choose to attribute to the system executing the instructions to generate "answers". It is fundamentally a category error to attribute any meaning to whatever arithmetic operations the hardware is executing because neither the hardware in the data center nor the software have any personal characteristics other than what people erroneously attribute to them because of confused ontologies and metaphysics about computers and software.
At which point would such attributions be accurate? Humans are fundamentally just computers too. A different medium, but still transforming electrical signals.
Extremely weird to me when people compare themselves to computers. What is that philosophical stance called and do you have any references for long form writing which makes the case for why people are "just" computers?
I am familiar with the computational theory of cognition. What I wanted to know was whether there were any people who actually claimed their thinking is nothing more than programmed computation. I am very curious to know if they have mapped out the instruction set for their mind along the lines of something like the SKI combinators.
A mental instruction set would be extremely interesting. Unfortunately, nobody has that level of understanding of brain processes (and it might be quite difficult to formulate in such a linear way since the underlying mechanism is so very parallel), but the idea that human cognition is computable falls pretty naturally out of the idea that nature is computable which I think is a common position (sometimes called the Church Turing Deutsch principle).
Yes, I understand why some scientists claim that nature is "just" some computer but no one still has given an answer to my very basic question: what is the instruction set that the people who claim they are computers are using to think? Surely there must be one if they are nothing more than programmable computers as they claim.
Just trying to figure out how rigorously people have thought about this. A computer with an undefined instruction set seems somewhat useless as a computer.
If you don't know how something works, do you assume it is magic? Why? It's a wildly irrational assumption to assume that it is magic rather than assume absent evidence to the contrary that it works according to known physics.
Well, that's essentially a logical tautology. Everything works according to the known laws of physics but it's certainly true that everything must also work with unknown laws of physics because of basic human ignorance.
No it is not. The argument is that absent any evidence of the existence of such unknown physics happening in the brain the most logical assumption is to assume there isn't any, rather that presuming the existence of something we've never observed any hint of.
Rule 110 can be specified with a rewrite system, also known as cellular automata: https://arxiv.org/abs/0906.3248. Cellular automatons have a correspondence with contextual grammars: https://www.cis.upenn.edu/~cis5110/notes/tcbook-lang.pdf. Each is equivalent to a Turing machine, another way of saying that there is a program for it which can be specified on a Turing machine with the usual Turing machine instruction set for writing, reading, and erasing binary digits on a tape. This usual program can then be "compiled" into a rewrite system corresponding to the instruction set for rule 110.
The reason rule 110 is said to be Turing complete is because someone went through the trouble of specifying an instruction set for rule 110 so that other people could verify that it would be possible to write programs with it. This is not the case for the people who claim that they are computers. They always leave the instruction set undefined which makes their claims hard to believe.
I personally have no problem with people who think they're computers but if they're not programmable then I'm not sure what the point would be of calling themselves computers.
> This is not the case for the people who claim that they are computers. They always leave the instruction set undefined which makes their claims hard to believe.
What is your alternative? Can you explain to us how the brain could possibly do something - down to the atomic level - that would allow it to do something that not only is not possible to simulate, but that also does not still constitute computation?
We don't even have language for talking about such "operations" that are so different from all forms of computation that it is not just another form of computation.
Just try to describe one such hypothetical state change that can not be reproduced with a Turing computable function.
At the same time, your insistence on "instruction sets" is meaningless. An "instruction set" the way we tend to consider them is not necessary to parameterize a function. A neural network with input/output used to provide the "tape" can trivially be made Turing complete. If you consider the weights or connections of the network an instruction set, then there you go - that we don't know how to measure and extract all the details of the neural network of a brain does not mean we can't observe their presence. And it also does not mean we haven't done a vast amount of measurements without observing any hint of unknown physics affecting state transitions.
To simplify it: Even a simple mechanical thermostat is parameterized - the dial provides "an instruction set" in the form of an ability to set a single threshold that alters the behaviour of the function computed.
But if you expect something that looks like what we typically talk about when we talk about an instruction set, then that is a very limiting view of computation, and one I've already pointed out to you is just one part of the multiple types of computational devices we've built. Including heavily parameterisable ones.
I expect claims to be backed by evidence that is consistent with our current state of knowledge. I have seen no such evidence so that's why I asked for references. In any case, this discussion has run its course, best of luck to you and your future computations.
And wouldn't that language need to be able to account for different physiological states? Thinking when one is hungry or sleepy is quite different than thinking when one is well-fed or fully rested.
Yes. To validate the claim would require not only a formal instruction set but also the code to account for all sorts of cognitive states and processes. I'm not ruling out that some people are indeed programmable computers but I would like to see some actual evidence presented by the people who make these claims about themselves.
For us to not be would require brains to be able to compute functions that can not be computed by an artificial computer. That would seem to be an extraordinary suggestion given we have no indication of unusual physics in the brain.
You'll have to define your terms first. Physicists now believe there is such a thing as dark matter and that there are objects so massive that no amount of observation can ever make sense of how massive they are because it is impossible to model it mathematically.
I am not the one making any extraordinary claims. Physicists themselves admit there are aspects of reality with no computational basis.
These terms have well understood meanings, and dark matter or black holes are entirely irrelevant to what I said.
For brains not to be computers would mean the physical Church-Turing thesis is invalid, and proof of that would be extraordinary enough to be Nobel Prize material.
Whether something is physical or not is orthognal to whether it computes or not. You're the one who brought up physics so that's why I showed why your logic was invalid. My contention was that calling something a computer without providing an instruction set was nonsensical and I wanted to know if someone had actually spent the time to rigorously think about what a computer without an instruction set would entail. So far it seems like no one has spent any time really thinking about it but that's probably for the best anyway. I'm sure an LLM will eventually figure out an instruction set for programming people and then take over the world.
The idea that a discernable instruction set is needed for something to compute suggests you don't understand how fundamental computation is.
We have built computers without instruction sets, e.g. in the form of mechanical devices to carry out calculations. Fairly complex computations were done that way before general purpose programmable computers, but even many early programmable computers had no fixed instruction set.
There is a rich history of computation through wiring up calculations without any instructions involved. And for that matter of mechanical computation.
Here's an outline for a simple computational device:
A bucket.
Pour predefined quantities of water into a bucket, and you can compute a threshold. Use buckets of different size and overflows, and you can separate a numeral into binary digits. Drain them into containers of different sizes and you can carry out logical operations. (Actual computation has been done this way - fluidics is one way, which dates back to the Tesla valve in 1920).
Every physical interaction is computation, whether or not it
is useful computation. The notion computation requires an instruction set is confusing a very limited notion of classical programmable computers with the general concept of computation.
It is also a notion contradicted by the history of computation, which is full of computation without an instruction set, and of implementing computers with instruction sets in terms of computations of fixed function devices without one.
E.g. it's not turtles all the way down - that instruction set runs on a CPU that ultimately is built of fixed function logic.
Instruction sets are an optional high level abstraction.
Steam engines and gears are a specific physical manifestation of computation. Computation does not have a single, specific physical manifestation - it can, and has, been done with organic matter, electronics, gears, pipes of water, light.
Per the Church-Turing thesis these can all compute the same set of functions, and unless you can demonstrate that brains and only brains can evoke unknown physics that allows brains to compute a set of functions that can not be computed by other means, the most logical assumption is that it holds, including for brains.
Especially given how much we measure brains without seeing any signs of unusual physics.
I think I understand. So what you're saying is that every function that can be implemented with computers must be computable. Your claim is that the brain is actually a computable function, can you tell me which one it is using your favorite version of a Turing complete instruction set? Or maybe I misunderstood and what you're saying is that the brain is not the function but what it does is compute a specific function called your mind in some unknown instruction set?
I'm saying that per the physical Church-Turing thesis, any function that is computable by ordinary physical means are Turing computable, and we have no evidence that even hints at the physical Church-Turing thesis not holding.
For it not to hold, there would need to be something unique about the physics of a brain that allows it to compute a class of functions which are inherently impossible to compute by other means. That'd imply entirely new/unknown physics that we're somehow not seeing any hints of.
> Your claim is that the brain is actually a computable function, can you tell me which one it is using your favorite version of a Turing complete instruction set?
No, my claim is that absent evidence of unknown physics or another way of disproving the physical Church-Turing thesis, the rational assumption is that the brain follows the same laws of physics as everything else, and so is limited to computation that is equivalent in power to Turing computable functions, just like everything else we know of.
For the brain not to be a computer would imply "magic" - not just that we don't know how the brain works, but for the brain to work in ways inconsistent with all known physics, and inconsistent in ways impossible to simulate with Turing computable functions. No sign of any such unknown physics happening in the brain has ever been recorded.
You made an incorrect assessment of a basic calculation in algebraic topology and claimed that it was correct. You didn't even look at what it was computing and simply looked at the final answer which lined up with the answer on Wikipedia. Simplicial calculations for projective planes are not simple. The usual calculations are done with cellular decomposition and that's why the LLM gives the wrong answer, the actual answer is not in the dataset and requires reasoning.
Are you confusing me with someone else? When I asked it GPT computed the homology from the CW decomposition of RP^2 with three cells. Which is a very simple exercise.
That's ok. It seems like LLMs know all about simplicial complexes and homology so I'll spend my time on more fruitful endeavors but thanks for the advice.
To be fair, it's not a simplicial complex, but simplicial and cellular homology coincide on triangulatable spaces like RP^2 so I gave it the benefit of the doubt =) algebraic topology is a pretty fun field regardless of how much a language model knows about it IMO.
TypeScript has a Turing complete type system so it's as powerful as it gets in terms of what can be expressed in the type system. As for learning and understanding what is going on with type systems in general you'll have to go through a textbook if you really want to develop an understanding. Type systems (not Turing complete ones) are fundamentally logical systems and I don't know how you can understand type systems without actually going through the trouble of understanding the underlying logical fundamentals.
No one at the commercial AI research labs knows what they're doing. As far as they are concerned there is nothing beyond gradient descent but it's accepted among academic researchers that gradient descent is insufficient for creating anything truly intelligent.
Eventually people will realize any underdetermined system of equations has infinitely many solutions. Give me any open source AI model and I will beat any SOTA benchmark. Why am I so confident? Because curve fitting can be applied to any data set to get as good of a result as needed. Combine this approach with mixtures of "experts" and any predetermined set of benchmarks will fall to a curve fit to the benchmark.
The hype is really getting tiresome. There is no way to get from here to any intelligent system with the current techniques. New breakthroughs will require insights into discrete spaces which are not amenable to curve fitting with gradient descent.