More

fjkdlsjflkds · on Nov 22, 2024

Each cell of the 2D grid (or "neuron") is connected to (and updated as a function of) its immediate neighbours, which makes it a network/graph.

fjkdlsjflkds · on Nov 22, 2024

> For one, Bayesian inference and UQ fundamentally depends on the choice of the prior, but this is rarely discussed in the Bayesian NN literature and practice, and is further compounded by how fundamentally hard to interpret and choose these priors are (what is the intuition behind a NN's parameters?).

I agree that, computationally, it is hard to justify the use of Bayesian methods on large-scale neural networks when stochastic gradient descent (and friends) is so damn efficient and effective.

On the other hand, the fact that there's a dependence on (subjective) priors is hardly a fair critique: non-Bayesian training of neural networks also depends on the use of (subjective) loss functions with (subjective) regularization terms (in fact, it can be shown that, mathematically, the use of priors is precisely equivalent to adding regularization to a loss function). Non-Bayesian training of neural networks is not "a failed approach" just because someone can arbitrarily choose L1 regularization (i.e., a Laplacian prior) over L2 regularization (i.e., a Gaussian prior).

Furthermore, we do have some intuition over NN parameters (particularly when inputs and outputs are properly scaled): a value of 10^15 should be less likely than a value of 0. Note that, in Bayesian practice, people often use weakly-informative priors (see, e.g., http://www.stat.columbia.edu/~gelman/presentations/weakprior...) to encode such intuitive statements while ensuring that (for all practical purposes) the data will effectively overwhelm the prior (again, this is equivalent to adding a minimal amount of regularization to a loss function, to make a problem well-posed when e.g. you have more parameters than data points).

datastoat · on Nov 22, 2024

Non-Bayesian NN training does indeed use regularizers that are chosen subjectively —- but they are then tested in validation, and the best-performing regularizer is chosen. Thus the choice is empirical, not subjective.

A Bayesian could try the same thing: try out several priors, and pick the one that performs best in validation. But if you pick your prior based on the data, then the classic theory about “principled quantification of uncertainty” doesn’t apply any more. So you’re left using a computationally unwieldy procedure that doesn’t offer theoretical guarantees.

panda-giddiness · on Nov 22, 2024

You can, in fact, do that. It's called (aptly enough) the empirical Bayes method. [1]

[1] https://en.wikipedia.org/wiki/Empirical_Bayes_method

datastoat · on Nov 22, 2024

Empirical Bayes is exactly what I was getting at. It's a pragmatic modelling choice, but it loses the theoretical guarantees about uncertainty quantification that pure Bayesianism gives us.

(Though if you have a reference for why empirical Bayes does give theoretical guarantees, I'll be happy to change my mind!)

fjkdlsjflkds · on Nov 23, 2024

> Non-Bayesian NN training does indeed use regularizers that are chosen subjectively —- but they are then tested in validation, and the best-performing regularizer is chosen. Thus the choice is empirical, not subjective.

I'd argue the choice is still subjective, since you are still only testing over a limited (subjective) set of options. If you are doing this properly (i.e., using an independent validation set), then you can apply the same approach to a Bayesian method and obtain the same type of information ("when I use prior A vs. prior B, how does that change the generalization/out-of-bag error properties of my model?"), without violating any properties or theoretical guarantees of "Bayesianism".

> A Bayesian could try the same thing: try out several priors, and pick the one that performs best in validation. But if you pick your prior based on the data, then the classic theory about “principled quantification of uncertainty” doesn’t apply any more.

If you subjectively define a set of possible priors (i.e., distributions and parameters) to test in a validation setting, then you are not picking your prior based on the data (again, assuming that you have set up a leakage-free partition of your data in training and validation data), and you are not doing empirical Bayes, so you are not violating any supposed "principled quantification of uncertainty" (if you believe that applying a standard subjective Bayesian approach provides you with "principled quantification of uncertainty").

My point was that, in practice, there are ways of choosing (subjective) priors such that they provide sufficient regularization while ensuring that their impact on the results is minimized, particularly when you can assume certain things about the scale of data (and, in the context of neural networks, you often can, due to things like "normalization layers" and prior scaling of inputs and outputs): "subjective" doesn't have to mean "arbitrary".

> So you’re left using a computationally unwieldy procedure that doesn’t offer theoretical guarantees.

I won't argue about the fact that training NN using Bayesian approaches is computationally unwieldy. I just don't see how evaluating a modelling decision (be in Bayesian or non-Bayesian modelling), using a proper validation process, would violate any specific theoretical guarantees.

If you can explain to me how evaluating the generalization properties of a Bayesian training recipe on an independent dataset violates any specific theoretical guarantees, I would be thankful (note: as far as I am concerned, "principled quantification of uncertainty" is not a specific theoretical guarantee).

fjkdlsjflkds · on Nov 15, 2024

It shouldn't be too much extra state. I assume that 2 bits should be enough to cover castling rights (one for each player), whatever is necessary to store the last 3 moves should cover legal en passant captures and threefold repetition, and 12 bits to store two non-overflowing 6 bit counters (time since last capture, and time since last pawn move) should cover the 50 move rule.

So... unless I'm understanding something incorrectly, something like "the three last moves plus 17 bits of state" (plus the current board state) should be enough to treat chess as a memoryless process. Doesn't seem like too much to track.

chongli · on Nov 15, 2024

Threefold repetition does not require the three positions to occur consecutively. So you could conceivably have a position repeat itself for first on the 1st move, second time on the 25th move, and the third time on the 50th move of a sequence and then players could claim a draw by threefold repetition or 50 move rule at the same time!

This means you do need to store the last 50 board positions in the worst case. Normally you need to store less because many moves are irreversible (pawns cannot go backwards, pieces cannot be un-captured).

fjkdlsjflkds · on Nov 15, 2024

Ah... gotcha. Thanks for the clarification.

fjkdlsjflkds · on Nov 8, 2024

> It's not so much that as there is the net-negative side effect of having people redo the same work over and over again at their day job or whatever. It horrifies me when I think about the number of hours of people's lives wasted on things like that.

Given that this issue applies to all proprietary/non-opensource software, which is the overwhelming majority of software out there, you can hardly blame GPL3 for this fact of life.

fjkdlsjflkds · on Nov 4, 2024

> R's magrittr

These days, base R already includes a native pipe operator (and it is literally `|>`, rather than magrittr's `%>%`).

fjkdlsjflkds · on Nov 1, 2024

> "an effect this large, or larger, should happen by chance 1 time out of 20"

More like "an effect this large, or larger, should happen by chance 1 time out of 20 in the hypothetical universe where we already know that the true size of the effect is zero".

Part of the problem of p-values is that most people can't even parse what it means (not saying it's your case). P-values are never a statement about probabilities in the real world, but always a statement about probabilities in a hypothetical world where we all effects are zero.

"Effect sizes", on the other hand, are more directly meaningful and more likely to be correctly interpreted by people on general, particularly if they have the relevant domain knowledge.

(Otherwise, I 100% agree with the rest of your comment.)

fjkdlsjflkds · on Nov 1, 2024

> You know that in your research field p < 0.01 has importance.

A p-value does not measure "importance" (or relevance), and its meaning is not dependent on the research field or domain knowledge: it mostly just depends on effect size and number of replicates (and, in this case, due to the need to apply multiple comparison correction for effective FDR control, it depends on the number of things you are testing).

If you take any fixed effect size (no matter how small/non-important or large/important, as long as it is nonzero), you can make the p-value be arbitrarily small by just taking a sufficiently high number of samples (i.e., replicates). Thus, the p-value does not measure effect importance, it (roughly) measures whether you have enough information to be able to confidently claim that the effect is not exactly zero.

Example: you have a drug that reduces people's body weight by 0.00001% (clearly, an irrelevant/non-important effect, according to my domain knowledge of "people's expectations when they take a weight loss drug"); still, if you collect enough samples (i.e., take the weight of enough people who took the drug and of people who took a placebo, before and after), you can get a p-value as low as you want (0.05, 0.01, 0.001, etc.), mathematically speaking (i.e., as long as you can take an arbitrarily high number of samples). Thus, the p-value clearly can't be measuring the importance of the effect, if you can make it arbitrarily low by just having more measurements (assuming a fixed effect size/importance).

What is research field (or domain knowledge) dependent is the "relevance" of the effect (i.e., the effect size), which is what people should be focusing on anyway ("how big is the effect and how certain am I about its scale?"), rather than p-values (a statement about a hypothetical universe in which we assume the null to be true).

pks016 · on Nov 1, 2024

I get that in general. I was replying to the person taking about bioinformatics and p value as filter.

fjkdlsjflkds · on Sept 12, 2024

> It seems to me like KANs should be able to find expressions like these given experimental data.

Perhaps, but this is not something unique to KANs: any symbolic regression method can (at least in theory) find such simple expressions. Here is an example of such type of work (using non-KAN neural networks): https://www.science.org/doi/10.1126/sciadv.aay2631

Rephrasing: just because you can reach simple expressions with symbolic regression methods based on neural networks (or KANs) does not necessarily imply that neural networks (or KANs) are inherently interpretable (particularly once you start stacking multiple layers).

fjkdlsjflkds · on Sept 3, 2024

The lack of semantics associated to DC (and near-DC) components in audio data is important, and a big difference compared to image data, no doubt.

I'm not sure this changes if you look at a cepstral representation (as suggested in the article). In this case, the DC component represents the (white) noise level in the raw audio space (i.e., the spectrum averaged over all frequencies), so it doesn't have strong semantics either (other than... "how noisy is the waveform?").

fjkdlsjflkds · on Sept 1, 2024

> If he did that, it would be just as deplorable [...]

No need for hypotheticals. He did do that (this is an easily-verifiable fact [1][2][3]).

> [...] the censorship they were claiming to oppose.

The thing is that this is clearly an empty claim, when Musk has no problems either complying with similar censoring orders from right-wing governments (Modi, Erdogan) or with arbitrarily censoring people for using medically-approved terms (like "cis" or "cisgender") that he simply does not like [4][5].

All of this censorship by Twitter is (legally) 100% within their right to do, as a private entity, but then whatever claims he (or Twitter) has of being a "defender of free speech" ring a bit hollow.

Given these things, the more plausible explanation for Musk's actions is not that he wants to defend free speech (or that he is fundamentally against censorship), but simply that the request comes from a (left-wing) government that is not ideologically aligned with his views.

It's a choice. But choices have consequences.

[1] https://slate.com/technology/2023/05/elon-musk-turkey-twitte...

[2] https://www1.folha.uol.com.br/internacional/en/world/2024/04...

[3] https://theintercept.com/2023/03/28/twitter-modi-india-punja...

[4] https://www.advocate.com/news/cisgender-restriction-x-twitte...

[5] https://nitter.poast.org/elonmusk/status/1719077000483066319...

fluoridation · on Sept 1, 2024

Regardless of Musk's motivations, would you rather he had complied with this latest request? In other words, would you rather there's more censorship in the world?

fjkdlsjflkds · on Sept 1, 2024

The world is not black and white... there are shades of grey. Sometimes censorship is lawful and/or justified, sometimes it is not.

I don't know if, in this case, it is justified or not, but it seems to be lawful (the same way that the censorship requests in India and Turkey were), as far as I can tell (I assume a judge of the Supreme Court knows a bit more about Brazilian law than you and me).

Given that Musk/Twitter seemingly has no problem complying with lawful censorship demands (or engaging in arbitrary censorship even without lawful censorship demands), it seems clear to me that Musk has no problem with "more censorship in the world". That was my only point.

My personal opinion on whether there is higher or lower need for censorship in the world is rather irrelevant (since I have no power or platforms to censor), but I certainly see no problem in actively censoring terrorists, bots, spammers and scammers (for example).

fluoridation · on Sept 1, 2024

It's not irrelevant to me, which why I'm asking. I'm asking if you would have preferred Musk to be consistent and ban those accounts instead. And if so, why? Do you agree with censorship if and only if it's legal (whatever that means in a particular jurisdiction)? Or is there some other reason?

fjkdlsjflkds · on Sept 2, 2024

As I mentioned, I agree with censorship when it is legitimate (ethically or morally justified), and I agree with the need for rule-of-law. It is not me that is arguing that censorship is ok when it is legal (and not ok otherwise), but Twitter/Musk.

In this particular case, I do not have enough information to state with certainty whether I think this particular case is legitimate or not, but it does seem to be lawful (which is the criterion that is seemingly important for Twitter/Musk).

I have no particular preference with regards to whether Musk chooses to be consistent or not: that's his decision and he/Twitter is the one that has to endure the consequences of his actions (not me). Since I am not a Twitter user, it does not affect me either way, and I don't see how it will significantly affect Brazilian's capacity to freely communicate (note: there are plenty of other private communication platforms that do comply with Brazilian law... Telegram, Whatsapp, Instagram, Facebook, etc.).

On the other hand, I do think it is hypocritical to claim to be a "defender of free speech", and then both engage in non-state-mandated censorship AND comply with state-mandated censorship (as long as it suits him or Twitter). It's a laughable claim. That was my only point.