More

foobarqux · 2024-12-22T16:05:23 1734883523

Everything you have said is completely baseless

> Probability theory, computability theory. There's no other way.

There is no credible evidence that humans learn via statistics and significant evidence against (poverty of stimulus, all languages have hierarchical structure). One other way was proposed by Chomsky, which is that you have built-in machinery to do language (which is probably intimately related to human intelligence) just like a foal doesn't learn to walk via "statistics" in any meaningful sense.

> Neurons in silica are to neurons in vivo ...

Again not true. Observations about things like inter-neuron communication time suggests that computation is being done within neurons which undermines the connectionist approach.

You've just presented a bunch of your own intuitions about things which people who actually studied the field have falsified.

TeMPOraL · 2024-12-22T17:56:16 1734890176

> There is no credible evidence that humans learn via statistics and significant evidence against

Except living among humans on planet Earth, that is.

Even the idea of a fixed language is an illusion, of the same kind like idk "natural balance", or "solid surface", or discrete objects. There's no platonic ideal of Polish or English that's the one language English speakers speak, that they speak more or less perfectly. No singular the English language that's fundamentally distinct from the German and the French. Claiming otherwise, or claiming built-in symbolic machinery for learning this thing, is confusing map for territory in a bad way - in the very way that gained academia a reputation of being completely out of touch with actual reality[0].

And the actual reality is, "language" is a purely statistical phenomenon, an aggregate of people trying to communicate with each other, individually adjusting themselves to meet the (perceived) expectations of others. At the scale of (population, years), it looks stable. At the scale of (population, decades), we can easily see those "stable" languages are in fact constantly changing, and blending with each other. At the scale of (individuals, days), everyone has their own language, slightly different from everyone else's.

> poverty of stimulus

More like poverty of imagination, dearth of idle minutes during the day, in which to ponder this. Well, to be fair, we kind of didn't have any good intuition or framework to think about this until information theory came along, and then computers became ubiquitous.

So let me put this clear: a human brain is ingesting a continuous time stream of multisensory data 24/7 from the day they're born (and likely even earlier). That stream never stops, it's rich in information, and all that information is highly coherent and correlated with the physical reality. There's no poverty of language-related stimulus unless you literally throw a baby to the wolves, or have it grow up in a sensory deprivation chamber.

> all languages have hierarchical structure

Perhaps because hierarchy is a fundamental concept in itself.

> you have built-in machinery to do language (which is probably intimately related to human intelligence)

Another way of looking at it is: it co-evolved with language, i.e. languages reflect what's easiest for our brain to pick up on. Like with everything natural selection comes up with, it's a mix of fundamental mathematics of reality combined with jitter of the dried-up shit that stuck on the wall after being thrown at it by evolution.

From that perspective, "built-in machinery" is an absolutely trivial observation - our languages look like whatever happened to work best with whatever idiosyncrasies our brains have. That is, whatever our statistical learning machinery managed to pick up on best.

> like a foal doesn't learn to walk via "statistics" in any meaningful sense.

How do they learn it then? Calculus? :). Developing closed-form analytical solutions for walking on arbitrary terrain?

> Observations about things like inter-neuron communication time suggests that computation is being done within neurons

So what? There's all kinds of "computation within CPUs" too. Cache management, hardware interrupts, etc., which don't change the results of the computation we're interested in, but might make it faster or more robust.

> which undermines the connectionist approach.

Wikipedia on connectionism: "The central connectionist principle is that mental phenomena can be described by interconnected networks of simple and often uniform units."

Sure, whatever. It doesn't matter. Connectionist models are fine, because it's been mathematically proven that you can compute[1] any function with a finite network. We like them not because they're philosophically special, but because they're a simple and uniform computational structure - i.e. it's cheap to do in hardware. Even if you need a million artificial neurons to substitute for a single biological one, that's still a win, because making faster GPUs and ASICs is what we're good at; comprehending and replicating complex molecular nanotech, not so much.

Computation is computation. Substrate doesn't matter, and methods don't matter - there are many ways of computing the same thing. Like, natural integration is done by just accumulating shit over time[2], but we find it easier to do it digitally by ADC-ing inputs into funny patterns of discrete bits, then flipping them back and forth according to some arcane rules, and then eventually DAC-ing some result back. Put enough bits into the ADC - digital math - DAC pipeline, and you get the same[1] result back anyway.

--

[0] - I mean, what's next. Are you going to claim Earth is a sphere of a specific size and mass, orbiting the Sun in specific time, on an elliptical orbit of fixed parameters? Do you expect space probes will eventually discover the magical rails that hold Earth to its perfectly elliptical orbit? Of course you're not (I hope) - surely you're perfectly aware that Earth gains and loses mass, perfect elliptical orbits are neither perfect nor elliptical, etc. - all that is just approximations safe at the timescales usually consider.

[1] - Approximate to an arbitrary high degree. Which is all we can hope for in a physical reality anyways.

[2] - Or hey, did you know that one of the best ways to multiply two large numbers is to... do a Fourier transform on their digits and summing them up in the frequency domain?

(Incidentally, being analog, nature really likes working in the frequency domain; between that, accumulation as addition, and not being designed to be driven by a shared clock signal it should be obvious that natural computers ain't gonna look like ours, and conversely, we don't need to replicate natural processes exactly to get the same results.)

foobarqux · 2024-12-20T19:07:11 1734721631

This is just as silly as claiming that people "moved the goalposts" when a computer beat Kasparov at chess to claim that it wasn't AGI: it wasn't a good test and some people only realize this after the computer beat Kasparov but couldn't do much else. In this case the ARC maintainers specifically have stated that this is a necessary but not sufficient test of AGI (I personally think it is neither).

og_kalu · 2024-12-20T20:02:25 1734724945

It's not silly. The computer that could beat Kasparov couldn't do anything else so of course it wasn't Artificial General Intelligence.

o3 can do much much more. There is nothing narrow about SOTA LLMs. They are already General. It doesn't matter what ARC Maintainers have said. There is no common definition of General that LLMs fail to meet. It's not a binary thing.

By the time a single machine covers every little test humanity can devise, what comes out of that is not 'AGI' as the words themselves mean but a General Super Intelligence.

foobarqux · 2024-12-20T20:53:01 1734727981

It is silly, the logic is the same: "Only a (world-altering) 'AGI' could do [test]" -> test is passed -> no (world-altering) 'AGI' -> conclude that [test] is not a sufficient test for (world-altering) 'AGI' -> chase new benchmark.

If you want to play games about how to define AGI go ahead. People have been claiming for years that we've already reached AGI and with every improvement they have to bizarrely claim anew that now we've really achieved AGI. But after a few months people realize it still doesn't do what you would expect of an AGI and so you chase some new benchmark ("just one more eval").

The fact is that there really hasn't been the type of world-altering impact that people generally associate with AGI and no reason to expect one.

og_kalu · 2024-12-20T21:16:25 1734729385

>It is silly, the logic is the same: "Only a (world-altering) 'AGI' could do [test]" -> test is passed -> no (world-altering) 'AGI' -> conclude that [test] is not a sufficient test for (world-altering) 'AGI' -> chase new benchmark.

Basically nobody today thinks beating a single benchmark and nothing else will make you a General Intelligence. As you've already pointed out out, even the maintainers of ARC-AGI do not think this.

>If you want to play games about how to define AGI go ahead.

I'm not playing any games. ENIAC cannot do 99% of the things people use computers to do today and yet barely anybody will tell you it wasn't the first general purpose computer.

On the contrary, it is people who seem to think "General" is a moniker for everything under the sun (and then some) that are playing games with definitions.

>People have been claiming for years that we've already reached AGI and with every improvement they have to bizarrely claim anew that now we've really achieved AGI.

Who are these people ? Do you have any examples at all. Genuine question

>But after a few months people realize it still doesn't do what you would expect of an AGI and so you chase some new benchmark ("just one more eval").

What do you expect from 'AGI'? Everybody seems to have different expectations, much of it rooted in science fiction and not even reality, so this is a moot point. What exactly is World Altering to you ? Genuinely, do you even have anything other than a "I'll know it when i see it ?"

If you introduce technology most people adopt, is that world altering or are you waiting for Skynet ?

foobarqux · 2024-12-20T21:41:53 1734730913

> Basically nobody today thinks beating a single benchmark and nothing else will make you a General Intelligence.

People's comments, including in this very thread, seem to suggest otherwise (c.f. comments about "goal post moving"). Are you saying that a widespread belief wasn't that a chess playing computer would require AGI? Or that Go was at some point the new test for AGI? Or the Turing test?

> I'm not playing any games... "General" is a moniker for everything under the sun that are playing games with definitions.

People have a colloquial understanding of AGI whose consequence is a significant change to daily life, not the tortured technical definition that you are using. Again your definition isn't something anyone cares about (except maybe in the legal contract between OpenAI and Microsoft).

> Who are these people ? Do you have any examples at all. Genuine question

How about you? I get the impression that you think AGI was achieved some time ago. It's a bit difficult to simultaneously argue both that we achieved AGI in GPT-N and also that GPT-(N+X) is now the real breakthrough AGI while claiming that your definition of AGI is useful.

> What do you expect from 'AGI'?

I think everyone's definition of AGI includes, as a component, significant changes to the world, which probably would be something like rapid GDP growth or unemployment (though you could have either of those without AGI). The fact that you have to argue about what the word "general" technically means is proof that we don't have AGI in a sense that anyone cares about.

og_kalu · 2024-12-20T22:10:07 1734732607

>People's comments, including in this very thread, seem to suggest otherwise (c.f. comments about "goal post moving").

But you don't see this kind of discussion on the narrow models/techniques that made strides on this benchmark, do you ?

>People have a colloquial understanding of AGI whose consequence is a significant change to daily life, not the tortured technical definition that you are using

And ChatGPT has represented a significant change to the daily lives of many. It's the fastest adopted software product in history. In just 2 years, it's one of the top ten most visited sites on the planet worldwide. A lot of people have had the work they do significant change since its release. This is why I ask, what is world altering ?

>How about you? I get the impression that you think AGI was achieved some time ago.

Sure

>It's a bit difficult to simultaneously argue both that we achieved AGI in GPT-N and also that GPT-(N+X) is now the real breakthrough AGI

I have never claimed GPT-N+X is the "new breakthrough AGI". As far as I'm concerned, we hit AGI sometime ago and are making strides in competence and/or enabling even more capabilities.

You can recognize ENIAC as a general purpose computer and also recognize the breakthroughs in computing since then. They're not mutually exclusive.

And personally, I'm more impressed with o3's Frontier Math score than ARC.

>I think everyone's definition of AGI includes, as a component, significant changes to the world

Sure

>which probably would be something like rapid GDP growth or unemployment

What people imagine as "significant change" is definitely not in any broad agreement.

Even in science fiction, the existence of general intelligences more competent than today's LLMs does not necessarily precursor massive unemployment or GDP growth.

And for a lot of people, the clincher stopping them from calling a machine AGI is not even any of these things. For some, that it is "sentient" or "cannot lie" is far more important than any spike of unemployment.

foobarqux · 2024-12-20T22:55:41 1734735341

> But you don't see this kind of discussion on the narrow models/techniques that made strides on this benchmark, do you ?

I don't understand what you are getting at.

Ultimately there is no axiomatic definition of the term AGI. I don't think the colloquial understanding of the word is what you think it is (i.e. if you had described to people, pre-chatgpt, today's chatgpt behavior, including all the limitations and failings and the fact that there was no change in GDP, unemployment, etc), and asked if that was AGI I seriously doubt they would say yes.)

More importantly I don't think anyone would say their life was much different from a few years ago and separately would say under AGI it would be.

But the point that started all this discussion is the fact that these "evals" are not good proxies for AGI and no one is moving goal-posts even if they realize this fact only after the tests have been beaten. You can foolishly define AGI as beating ARC but the moment ARC is beaten you realize that you don't care about that definition at all. That doesn't change if you make a 10 or 100 benchmark suite.

og_kalu · 2024-12-21T00:17:23 1734740243

>I don't understand what you are getting at.

If such discussions only made when LLMs make strides in the benchmark then it's not just about beating the benchmark but also what kind of system is beating it.

>You can foolishly define AGI as beating ARC but the moment ARC is beaten you realize that you don't care about that definition at all.

If you change your definition of AGI the moment a test is beaten then yes, you are simply post moving.

If you care about other impacts like "Unemployment" and "GDP rising" but don't give any time or opportunity to see if the model is capable of such then you don't really care about that and are just mindlessly shifting posts.

How do such a person know o3 won't cause mass unemployment? The model hasn't even been released yet.

foobarqux · 2024-12-21T03:36:08 1734752168

> If such discussions only made when LLMs make strides in the benchmark then it's not just about beating the benchmark but also what kind of system is beating it.

I still don't understand the point you are making. Nobody is arguing that discrete program search is AGI (and the same counter-arguments would apply if they did).

> If you change your definition of AGI the moment a test is beaten then yes, you are simply post moving.

I don't think anyone changes their definition, they just erroneously assume that any system that succeeds on the test must do so only because it has general intelligence (that was the argument for chess playing for example). When it turns out that you can pass the test with much narrower capabilities they recognize that it was a bad test (unfortunately they often replace the bad test with another bad test and repeat the error).

> If you care about other impacts like "Unemployment" and "GDP rising" but don't give any time or opportunity to see if the model is capable of such then you don't really care about that and are just mindlessly shifting posts.

We are talking about what models are doing now (is AGI here now) not what some imaginary research breakthroughs might accomplish. O3 is not going to materially change GDP or unemployment. (If you are confident otherwise please say how much you are willing to wager on it).

og_kalu · 2024-12-21T04:02:21 1734753741

I'm not talking about any imaginary research breakthroughs. I'm talking about today, right now. We have a model unveiled today that seems a large improvement across several benchmarks but hasn't been released yet.

You can be confident all you want but until the model has been given the chance to not have the effect you think it won't then it's just an assertion that may or may not be entirely wrong.

If you say "this model passed this benchmark I thought would indicate AGI but didn't do this or that so I won't acknowledge it" then I can understand that. I may not agree on what the holdups are but I understand that.

If however you're "this model passed this benchmark I thought would indicate AGI but I don't think it's going to be able to do this or that so it's not AGI" then I'm sorry but that's just nonsense.

My thoughts or bets are irrelevant here.

A few days ago I saw someone seriously comparing a site with nearly 4B visits a month in under 2 years to Bitcoin and VR. People are so up in their bubbles and so assured in their way of thinking they can't see what's right in front of them, nevermind predict future usefulness. I'm just not interested in engaging "I think It won't" arguments when I can just wait and see.

I'm not saying you are one of such people. I just have no interest in such arguments.

My bet ? There's no way i would make a bet like that without playing with the model first. Why would I ? Why Would you ?

foobarqux · 2024-12-21T17:07:09 1734800829

> I'm not talking about any imaginary research breakthroughs. I'm talking about today, right now.

I explicitly said so was I. I said today we don’t have large impact societal changes that people have conventionally associated with the term AGI. I also explicitly talked about how I don’t believe o3 will change this and your comments seem to suggest neither do you (you seem to prefer to emphasize that it isn’t literally impossible that o3 will make these transformative changes).

> If however you're "this model passed this benchmark I thought would indicate AGI but I don't think it's going to be able to do this or that so it's not AGI" then I'm sorry but that's just nonsense.

The entire point of the original chess example was to show that in fact it is the correct reaction to repudiate incorrect beliefs of naive litmus test of AGI-ness. If we did what you are arguing then we should accept AGI having occurred after chess was beaten because a lot of people believed that was the litmus test? Or that we should praise people who stuck to their original beliefs after they were proven wrong instead of correcting them? That’s why I said it was silly at the outset.

> My thoughts or bets are irrelevant here

No they show you don’t actually believe we have society transformative AGI today (or will when o3 is released) but get upset when someone points that out.

> I'm just not interested in engaging "I think It won't" arguments when I can just wait and see.

A lot of life is about taking decisions based on predictions about the future, including consequential decisions about societal investment, personal career choices, etc. For many things there isn’t a “wait and see approach”, you are making implicit or explicit decisions even by maintaining the status quo. People who make bad or unsubstantiated arguments are creating a toxic environment in which those decisions are made, leading personal and public harm. The most important example of this is the decision to dramatically increase energy usage to accommodate AI models despite impending climate catastrophe on the blind faith that AI will somehow fix it all (which is far from the “wait and see” approach that you are supposedly advocating by the way, this is an active decision).

> My bet ? There's no way i would make a bet like that without playing with the model first. Why would I ? Why Would you ?

You can have beliefs based on limited information. People do this all the time. And if you actually revealed that belief it would demonstrate that you don’t actually currently believe o3 is likely to be world transformative

og_kalu · 2024-12-22T00:04:19 1734825859

>You can have beliefs based on limited information. People do this all the time. And if you actually revealed that belief it would demonstrate that you don’t actually currently believe o3 is likely to be world transformative

Cool...but i don't want to in this matter.

I think the models we have today are already transformative. I don't know if o3 is capable of causing sci-fi mass unemployment (for white collar work) and wouldn't have anything other than essentially a wild guess till it is released. I don't want to make a wild guess. Having beliefs on limited information is often necessary but it isn't some virtue and in my opinion should be avoided when unnecessary. It is definitely not necessary to make a wild guess about model capabilities that will be released next month.

>The entire point of the original chess example was to show that in fact it is the correct reaction to repudiate incorrect beliefs of naive litmus test of AGI-ness. If we did what you are arguing then we should accept AGI having occurred after chess was beaten because a lot of people believed that was the litmus test?

Like i said, if you have some other caveats that weren't beaten then that's fine. But it's hard to take seriously when you don't.

Jensson · 2024-12-20T23:32:29 1734737549

> But you don't see this kind of discussion on the narrow models/techniques that made strides on this benchmark, do you ?

This model was trained to pass this test, it was trained heavily on the example questions, so it was a narrow technique.

We even have proof that it isn't AGI, since it scores horribly on ARC-AGI 2. It overfitted for this test.

og_kalu · 2024-12-21T00:08:01 1734739681

>This model was trained to pass this test, it was trained heavily on the example questions, so it was a narrow technique.

You are allowed to train on the train set. That's the entire point of the test.

>We even have proof that it isn't AGI, since it scores horribly on ARC-AGI 2. It overfitted for this test.

Arc 2 does not even exist yet. All we have are "early signs", not that that would be proof of anything. Whether I believe the models are generally intelligent or not doesn't depend on ARC

Jensson · 2024-12-21T00:27:21 1734740841

> You are allowed to train on the train test. That's the entire point of the test.

Right, but by training on those test cases you are creating a narrow model. The whole point of training questions is to create narrow models, like all the models we did before.

og_kalu · 2024-12-21T00:50:21 1734742221

That doesn't make any sense. Training on the train set does not make the models capabilities narrow. Models are narrow when you can't train them to do anything else even if you wanted to.

You are not narrow for undergoing training and it's honestly kind of ridiculous to think so. Not even the ARC maintainers believe so.

Jensson · 2024-12-21T01:11:56 1734743516

> Training on the train set does not make the models capabilities narrow

Humans didn't need to see the training set to pass this, the AI needing it means it is narrower than the humans, at least on these kind of tasks.

The system might be more general than previous models, but still not as general as humans, and the G in AGI typically means being as general as humans. We are moving towards more general models, but still not at the level where we call them AGI.

foobarqux · 2024-12-20T19:04:27 1734721467

Yes that's why it is "semi"-private: From the ARC website "This set is "semi-private" because we can assume that over time, this data will be added to LLM training data and need to be periodically updated."

I presume evaluation on the test set is gated (you have to ask ARC to run it).

foobarqux · 2024-12-20T18:51:21 1734720681

You should look up the terms necessary and sufficient.

sigmoid10 · 2024-12-20T18:58:11 1734721091

The real issue is people constantly making up new goalposts to keep their outdated world view somewhat aligned with what we are seeing. But these two things are drifting apart faster and faster. Even I got surprised by how quickly the ARC benchmark was blown out of the water, and I'm pretty bullish on AI.

foobarqux · 2024-12-20T19:11:46 1734721906

The ARC maintainers have explicitly said that passing the test was necessary but not sufficient so I don't know where you come up with goal-post moving. (I personally don't like the test; it is more about "intuition" or in-built priors, not reasoning).

manmal · 2024-12-20T20:01:42 1734724902

Are you like invested in LLM companies or something? You‘re pushing the agenda hard in this thread.

foobarqux · 2024-12-20T17:05:57 1734714357

> No, they didn't. Their methodology didn't adjust for things like highway vs city street miles, although it did adjust for city and state.

Doesn't Waymo only drive in the city? And damage/injury is much greater on highways? If so that pretty much makes the study worthless.

Reubend · 2024-12-21T06:12:11 1734761531

AFAIK they drive on highways for testing, but they don't offer commercial rides on highway routes to customers yet. If that's still the case, I'd expect the majority of their miles to be on city streets. As I already pointed out, that's a big flaw in their methodology.

foobarqux · 2024-12-01T20:54:59 1733086499

> Fwiw, I don't agree with Chomsky. Clearly LLMs are extracting structure in language and I think it is obtuse to claim that a system designed for pattern matching won't identify these patterns. One doesn't need reasoning or abstraction to converge to this, one simply needs sufficient sampling and for structure to exist. Clearly structure exists in the language, so we should expect a sufficient pattern matcher to be able to extract these patterns.

Chomsky has never said that LLMs can't extract patterns from language. His point is that humans have trouble processing certain language patterns while LLMs don't, which means that LLMs work differently and therefore can't shed any light on humans.

foobarqux · 2024-12-01T20:44:59 1733085899

As I said in another comment the only relevant synthetic language that would refute Chomsky's claim are the ones we have human experiments for. Specifically those of Moro.

I believe the relevant papers are referenced here on page 4. (Tettamanti et al., 2002; Musso et al., 2003; Moro, 2016)

https://acesin.letras.ufrj.br/wp-content/uploads/2024/02/Mor...

foobarqux · 2024-12-01T20:22:49 1733084569

The real problem with the paper is not any of the mathematical details that others have described it is more fundamental. Chomsky's claim is that humans have a distinctive property that they seem to not be able to process certain synthetic language constructions --- namely linear (non-hierarchical) languages --- as well as synthetic human-like (hierarchical) languages and they use a different part of the brain to do so. This was shown in experiments (see Moro, Secrets of Words, I think his nature paper also cites the studies).

Because the synthetic linear languages are computationally/structurally simple LLMs will, unlike humans, learn them just as easily as real human languages. Since this hierarchical aspect of human language seems fundamental/important LLMs therefore are not a good model of the human language faculty.

If you want to refute that claim then you would take similar synthetic language constructions to those that were used in the experiments and show that LLMs take longer to learn them.

Instead you mostly created an abstraction of the problem that no one cares about: that there exist certain synthetic language constructions that LLMs have difficulty with. But this is both trivial (consider a language that requires you to factor numbers to decode it) and irrelevant (there is no relation to what humans do except in an abstract sense).

The one language that you use that is most similar to the linear languages cited by Moro, "Hop", shows very little difference in performance, directly undermining your claimed refutation of Chomsky.

canjobear · 2024-12-02T21:03:30 1733173410

> Instead you mostly created an abstraction of the problem that no one cares about: that there exist certain synthetic language constructions that LLMs have difficulty with. But this is both trivial (consider a language that requires you to factor numbers to decode it) and irrelevant (there is no relation to what humans do except in an abstract sense).

Thanks for your feedback. I think our manipulations do establish that there are nontrivial inductive biases in Transformer language models and that these inductive biases are aligned with human language in important ways. There's no universal a priori sense in which Moro's linear counting languages are "simple" but our deterministically shuffled languages aren't. It seems that GPT language models do favor real language over the perturbed ones, and this shows that they have a simplicity bias which aligns with human language. This is remarkable, considering that the GPT architecture doesn't look like what one would expect based on existing linguistic theory.

Furthermore, this alignment is interesting even if it isn't perfect. I would be shocked in GPT language models happened to have inductive biases that perfectly match the structure of human language---why would they? But it is still worthwhile to probe what those inductive biases are and to compare them with what humans do. As a comparison, context-free grammars turned out to be an imperfect model of syntax, but the field of syntax benefited a lot from exploring them and their limits. Something similar is happening now with neural language models as models of language learning and processing, a very active research field. So I wouldn't say that neural language models can't shed any light on language simply because they're not a perfect match for a particular aspect of language.

As for using languages more directly based on the Moro experiments, we've discussed this extensively. There are nontrivial challenges in scaling those languages up to the point that you can have a realistic training set, where the control condition is a real language instead of a toy language, without introducing confounds of various kinds. We're open to suggestions. We've had very productive conversations with syntacticians about how to formulate new baselines in future work.

More generally our goal was to get formal linguists more interested in defining the impossible vs. possible language distinction more carefully, to the point that they can be used to test the inductive biases of neural models. It's not as simple as hierarchical vs. linear, since there are purely linear phenomena in syntax such as Closest Conjunct Agreement, and also morphophonological processes can act linearly across constituent boundaries, among other complications.

> The one language that you use that is most similar to the linear languages cited by Moro, "Hop", shows very little difference in performance, directly undermining your claimed refutation of Chomsky.

I wouldn't read much into the magnitude of the difference between NoHop and Hop, because the Hop transformation only affects a small number of sentences, and the perplexity metric is an average over sentences.

foobarqux · 2024-12-03T16:12:03 1733242323

> these inductive biases are aligned with human language in important ways.

They aren’t, which is the entire point of this conversation, and simply asserting otherwise isn’t an argument.

> It seems that GPT language models do favor real language over the perturbed ones, and this shows that they have a simplicity bias which aligns with human language. This is remarkable, considering that the GPT architecture doesn't look like what one would expect based on existing linguistic theory.

This is a non-sensical argument: consider if you had studied a made up language that required you to factor numbers or do something else inherently computationally expensive. LLMs would favor simplicity bias “just like humans” but it’s obvious this doesn’t tell you anything and specifically doesn’t tell you that LLMs are like humans in any useful sense.

> There's no universal a priori sense in which Moro's linear counting languages are "simple" but our deterministically shuffled languages aren't.

You are missing the point, which is that humans cannot as easily learn Moro languages while LLMs can. Therefore LLMs are different in a fundamental way from humans. This difference is so fundamental that you need to give strong, specific, explicit justification why LLMs are useful in explaining humans. The only reason I used the word “simple” is to argue that LLMs would be able to learn it easily (without even having to run an experiment) but the same would be true if LLMs learned a non-simple language that humans couldn’t.

Again it doesn’t matter if you find all the ways that humans and LLMs are the same —- for example that they both struggle with shuffled sentences or with a language that involves factoring numbers —— what matters is that there exists a fundamental difference between them exemplified by the Moro languages.

> But it is still worthwhile to probe what those inductive biases are and to compare them with what humans do.

Why? There is no reason to believe you will learn anything from it. This is a bizarre abstract argument that doing something is useful because you might learn something from it. You can say that about anything you do. There is a video on YouTube where Chomsky engages with someone making similar arguments about chess computers. Chomsky said that there wasn’t any self evident reason why studying chess playing computers would tell you anything about humans. He was correct, we never did learn anything significant about humans from chess computers.

> As a comparison, context-free grammars turned out to be an imperfect model of syntax, but the field of syntax benefited a lot from exploring them and their limits.

There is a difference between pursuing a reasonable line of inquiry and having it fail versus pursuing one that you know or ought to know is flawed. If someone had pointed out the problems with CFG at the outset it would have been foolish to pursue it, just as it is foolish to ignore the Moro problem now.

> There are nontrivial challenges in scaling those languages up to the point that you can have a realistic training set

I can’t imagine what those challenges are, I don’t remember the details but I believe Moro made systematic simple grammar changes. Your Hop is in the same vein.

> where the control condition is a real language

Why does the control need to be a real language? Moro did not use a real language control on humans. (Edit: Because you want to use pre-trained models?).

> More generally our goal was to get formal linguists more interested in defining the impossible vs. possible language distinction more carefully

Again you’ve invented an abstract problem to study that has no bearing on the problem that Chomsky has described. Moro showed that humans struggle with certain synthetic grammar constructions. Chomsky noted that LLMs do not have this important feature. You are now trying to take this concrete observation about humans and turning it into the abstract field of the study of “impossible languages”.

> It's not as simple as hierarchical vs. linear

There are different aspects of language but there is a characteristic feature missing from LLMs which makes them unsuitable as models for human language. It doesn’t make any sense for a linguist to care about LLMs unless you provide justification for why they would learn anything about the human language faculty from LLMs despite that fundamental difference.

> I wouldn't read much into the magnitude of the difference between NoHop and Hop, because the Hop transformation only affects a small number of sentences, and the perplexity metric is an average over sentences

Even if this were true we return to “no evidence” rather than “evidence against”. But it is very unlikely that Moro-languages are any more difficult for LLMs to learn because, as I said earlier, they are very computationally simple, simpler than hierarchical languages.

foobarqux · 2024-11-25T16:43:38 1732553018

Why don't you read the article? If you did you would see things like

"The door to the plant’s giant casting furnace.. wouldn’t shut, spewing toxins into the air and raising temperatures for workers on the floor to as high as 100 degrees. Hazardous wastewater from production—containing paint, oil and other chemicals—was also flowing untreated into the city’s sewer, in violation of state guidelines... dumped toxic pollutants into the environment near Austin for months."

"Tesla violated air-pollution permits at its Fremont factory 112 times over the past five years and alleged it repeatedly failed to fix equipment designed to reduce emissions, releasing thousands of pounds of toxic chemicals in excess of permissible limits into the surrounding communities."

'One environmental-compliance staffer in the Austin plant claimed that “Tesla repeatedly asked me to lie to the government so that they could operate without paying for proper environmental controls,” '

"Austin Water regulators notified Tesla that it had violated its permit with the city when it discharged to the sewer system more than 9,000 gallons of wastewater that wasn’t properly treated for pH,... TCEQ notified Tesla of five violations, including exceeding its permitted emissions limit for certain air pollutants and not disclosing deviations."

"Tesla employees employed an “elaborate ruse” to hide the issues, adjusting the amount of fuel going into the furnace and temporarily closing the door ... These actions allowed Tesla to pass the important emissions test, according to the memo."

"The pond was filled with toxins, including sulfuric and nitric acids, and the algae-colored water had begun to smell of rotten eggs, former employees said. At one point, employees found a dead deer in the water, they said. For a time, Tesla discharged untreated pond water directly into the sewer system"

"Sometimes during rainstorms, Tesla discharged a sludgy mix of mud and chemicals from occasional spills outside the plant,"

"the company released 259,000 gallons of caustic water into the Austin sewer system"

"Environmental staff notified Austin Water, but one member refused to comply with a request from Tesla managers to lobby the regulator not to consider the violation as a “significant non-compliance,”.. Tesla fired the staffer for “pushing back on their requests” according to the memo."

foobarqux · 2024-11-22T21:36:08 1732311368

This is obviously false: consider a (cryptographic) pseudorandom number generator.

naasking · 2024-11-22T22:52:55 1732315975

Trivial, m is not invertible in that case. By contrast, measuring devices need to be invertible within some domain, otherwise they're not actually measuring, and we wouldn't use them.

foobarqux · 2024-11-23T14:46:27 1732373187

You defined "m" as the measuring function which is not the pseudo-random number generator itself. I guess I don't understand your definitions.

In any case it's pretty obvious you have have deterministic chaotic output from which you cannot practically (or even in theory) recover the internal workings of the system that generated them. Take just a regular pseudorandom number generator or a cellular automata.

naasking · 2024-11-23T17:18:43 1732382323

> In any case it's pretty obvious you have have deterministic chaotic output from which you cannot practically (or even in theory)

Solomonoff induction says otherwise. Of course it might take a stupendously large number of samples, but as the number of samples goes to infinity, the probability of reproducing the PRNG goes to 1.