More

njohnson41 · on May 29, 2017

Even with the birthday problem, sqrt(52!) is still about 10^34, which is still huge. It's unlikely that any two (sufficiently random) shuffles in history have been the same.

taneq · on May 30, 2017

> (sufficiently random)

Isn't that a true Scotsman?

Edit: I guess the GP post uses the same criterion, though.

benchaney · on May 30, 2017

No, that isn't what the No True Scottsman Fallacy is. Putting qualifiers on statements is completely valid. A No True Scottsman Fallacy is when you use qualifiers in a hand wavy way to move the goal posts.

taneq · on May 30, 2017

What makes it a No True Scotsman is the term 'sufficently random', which kind of makes it unfalsifiable. After all, if any two shuffles are the same, then (by this logic) they're not sufficiently random.

joshuamorton · on May 30, 2017

Except its unlikely that "sufficiently random" refers to "do not equal each other" in this context. Its likely that "sufficiently random" means "passes one of a variety of proofs of being statistically random (for example: nlogn repetitions of fisher-yates)". That implies randomness, but doesn't imply "doesn't equal any other deck of cards" except with high probability.

benchaney · on May 30, 2017

That doesn't make it unfalsifiable at all. Sufficiently random is not defined as "not the same". you also clearly misunderstood the logic of the post in question. The poster started with a clear definition of random and used that to determine if shuffles are likely repeated. If there were fewer cards in a deck, that post would have reached a different conclusion.

njohnson41 · on March 24, 2017

Or maybe it's just a good excuse to carry fewer cups and pitchers around?

njohnson41 · on March 9, 2017

Still, in any consistent way of assigning probabilities to events, if A implies B, then P(A) <= P(B).

Neural network outputs are not probabilities. I think that's the main lesson here.

thomasdullien · on March 9, 2017

Given that the last layer of a NN is a logistic regression, they are in fact well-calibrated probabilities under the assumption disjoint classes.

The issue at hand is training them on overlapping classes :-)

I will shut up now, sorry for nitpicking.

njohnson41 · on March 13, 2017

You're right (and nitpicking nitpicks seems appropriate to me :P).

But, I'm pretty sure the assumptions of logistic regression are even stronger than just that. The inputs are assumed to be independent given the output class, and the log odds of the output vary as a linear function of each input. The first one is essentially the naive Bayes assumption, and the second one is completely unreasonable for almost any problem ever (roughly equivalent to assuming every dataset has a multivariate normal distribution). If they are both correct, though, you get a perfectly good Bayesian posterior probability of each output class.

I think the lesson is that gradient descent will build a decent function approximation out of pretty much anything powerful enough, which is why neural networks still work even when probability theory has been thrown completely out the window.

njohnson41 · on Jan 19, 2017

I personally liked "Godel's Proof" by Ernest Nagel. It's pretty old, but it's to-the-point and inexpensive.

njohnson41 · on July 12, 2016

Especially if you assume that the scientific method is always valid--that all true statements can be determined empirically--you can't make statements like "Ultimate Meaning doesn't/does exist". If Ultimate Meaning cannot be defined, it cannot be tested for or measured, so no statements about it can be true or false. The answer to meaning would not be false, but null (or maybe 42).

Just like 42, "atoms and the void" here is just a science-flavored attempt to answer a non-question.

squidlogic · on July 12, 2016

Positivism has fallen out of favor precisely because it cannot withstand its own criteria for meaning.

For example, the statement that "all true statements can be determined empirically" is not itself verifiable by empirical means.

njohnson41 · on July 12, 2016

Fair enough; I'll be a little more precise.

I think the correct way to state it is: "all true statements are either tautologies, or can be determined empirically (with arbitrarily high but not necessarily measure-1 probability)". This statement is itself a tautology, because tautologies are by definition true, and because "determined" implies some method of determination, which if it actually can be used to determine truth, means it can be used empricially. Determination and empiricism are secretly defined in terms of each other, basically. The reason this tautology is worth stating is that it gives a simple criterion for discarding non-questions: questions that have no method of determining whether they are true or false (with arbitrarily high probability), are always non-questions.

titzer · on July 12, 2016

Gödel's incompleteness theorem already rules this out for mathematics, so why should we believe it about the universe?

grenoire · on July 12, 2016

What if you do do the determination of all true statements empirically? Is that not the empirical way to determine the truth of that all-inclusive statement?

dragonwriter · on July 13, 2016

> What if you do do the determination of all true statements empirically?

How would you demonstrate that you have done this? Particularly, how would you demonstrate that there is no true statement which you have not empirically demonstrated?

rimantas · on July 12, 2016

But if it cannot be measured it by definition cannot have any effect on reality so can be safely ignored.

Natsu · on July 12, 2016

I assume you mean that in the absolute sense, but one also has to realize that "cannot be measured" can also be a measure of our ability to measure things. So you certainly can have real effects that we're simply not able to measure.

And so in that sense, I'm not sure the premise follows, as some of the things we cannot measure are things like the conditions of the early universe, which certainly did effect[1] us.

[1] It affects us too, but I chose that word deliberately

arthur_pryor · on July 12, 2016

sorry to ruin the joke if that's what it is, but this is sarcasm, right? (my point being: our instrumentation constantly gets better, and there are things that we can't now measure/might never be able to measure which have a material impact on things we can measure)

njohnson41 · on July 7, 2016

> ... all possible combinations of 8-16 characters with 100 character possibilities ...

Yes, but this doesn't even come close to describing the typical users' password, which is most likely a 6-letter English word with a capital letter and a 1! appended to the end. Your calculation here isn't really relevant, because it's all about the worst or common case. (You also assume that people are using a GPU for a compute-bound problem, when much faster FPGAs are also available, but either way it's moot.)

Security through obscurity, which is what you're proposing with the shuffled salt idea, is also not normally considered the right way to go. If you wanted to use a similar but much simpler and straightforward method, you could just encrypt the salted hashes before storing them in the database.

brokenwren · on July 7, 2016

Simple passwords are mitigated by salting and slat modification. The guide covers both of these and hashing. This should be sufficient even against thousands of GPUs.

aj-code · on July 7, 2016

In short, it's not. A robust scheme should be secure if the attacker knows the algorithm or not. The secret should not be the algorithm, the secret should be a sufficiently long secret value.

njohnson41 · on June 6, 2016

This should be marked (2004), not (2014).

njohnson41 · on May 29, 2016

There could also be a simpler causal link: books and lucrative jobs could both caused by higher family socio-economic status.

qb45 · on May 29, 2016

Or higher social status being the result of having books and a good job.

dragonwriter · on May 31, 2016

> There could also be a simpler causal link: books and lucrative jobs could both caused by higher family socio-economic status.

ISTR in the last few years seeing similar research that controlled for most typical measures of SES that are associated with positive outcomes (parents income, parents educational attainment, etc.) which found that growing up with more books in the home had a positive impact on a number of outcome measures (leveling off once at about 100 books.)

I suspect that there is some degree in which it is serving as a proxy for as-yet-unidentified socio-economic, genetic, and parenting style and parental personality traits, but I also suspect even once all those were identified and taken into account, simply having books handy has some positive effects.

robotresearcher · on May 29, 2016

I'm pretty sure the study controls for that. It's too obvious an important factor. They probably control for parental education level too.

njohnson41 · on May 18, 2016

That's because this result is not about combining weak deterministic PRNGs, it's about combining entropy sources (like two hardware random number generators).

This has always been possible, but it sounds like they've lowered the minimum entropy needed in the source streams to produce a high-quality output.

electrograv · on May 18, 2016

Thanks for the explanation, that makes sense. I think this quote threw me off:

> The academics’ latest work hurdles those restrictions allowing the use of sequences that are only weakly random

What does "weakly random" mean, if not a PRNG? Just low pure entropy per bit of sequence data? What's the threshold then between strong random and weak random -- wouldn't it be a continuum of entropy?

Minor nitpick: Also, how can a deterministic PRNG have less entropy (0) than that of its seed?

njohnson41 · on May 18, 2016

Right, the amount of entropy per bit of sequence is always between 0 (deterministic) and 1 (every bit is independent and 50/50) (... or between 0 and log2(k) in general if the element varies over a set of k things). These "weak" sources just have low entropy per bit. They could be biased (more 0s than 1s) or correlated (long runs of 0s/1s or periodicity), or just have some other pattern that sometimes holds.

A deterministic PRNG's sequence has exactly the entropy of it's seed, actually, but it has 0 bits of entropy per symbol, because its sequence is infinite.

The thing most people get confused about with entropy is in thinking that entropy is a property of some single object, like a bit string. Really, entropy is always a measurement about a probability distribution, just like mean or variance is. In the usual case with random streams, the distribution is P(x_i | x_i-1 ... x_0) for bits x_i in the stream, i.e. the distribution remaining for the current bit even if we know all previous bits. For a deterministic PRNG, once we can extract the key from the history (given unlimited compute power) that distribution becomes deterministic, so the entropy is 0.

dietrichepp · on May 18, 2016

The entropy of a single object is a meaningful concept. It is usually called Kolmogorov complexity.

njohnson41 · on May 18, 2016

Kolmogorov complexity is definitely meaningful, but it's not (Shannon) entropy, just conceptually similar. Many people think of something like Kolmogorov-complex sequences when they think of "random" sequences, which is (IMO) why they have trouble thinking of entropy as being about a probability distribution.

The one case where they coincide (sort of) is if you believe your random sequence is generated by a randomly chosen Turing machine, which I've only really seen in philosophical settings.

A uniformly chosen 64-bit integer still has exactly 64 bits of entropy, regardless of how much Kolmogorov complexity the actual bits you generate have.

x0x0 · on May 18, 2016

I'm not sure if I know the precise definition, but I think I remember how it's used.

weakly random:

let X a random variable over {0,1}^n. That is, X is a distribution over bit sequences of length n. Distribution means I assign each of the 2^n bit sequences a nonnegative probability that sums to 1.

If X were uniform (true random), then each sequence in the 2^n such possible sequences would have probability 2^-n. Unfortunately, we have no such sequence generators, or life would be easy.

Let the min entropy of X be the largest natural k such that P[X=x] <= 2^-k for every x. ie roughly how likely are such sequences to take their most likely value. Or how biased is our RV.

eg: uniform random over sequences of length n would have min-entropy n.

Again, the problem is we have no uniform random sequences available to us. Instead, we have biased sequences. So there is a long history of asking how can we take a biased random sequence and turn it into a random sequence. You do this with an extractor.

For example, suppose we had a sequence which produced independent bits 1 with probability p and 0 with probability q=1-p. However, we don't know p. What we could do (an extractor!) is use two sequence bits to produce one output bit, as so: sequence 0,1 has probability p x q, and sequence 1,0 has probability q x p. Map 0,1 to 0 and 1,0 to 1, and ignore 0,0 and 1,1. Now we can produce 0 with probability 1/2 and 1 with probability 1/2, ie we have uniform random from an input sequence of unknown bias (but perfectly uncorrelated, which also doesn't exist in nature. Life is hard.)

We measure the quality of extractors by calculating how bad (lower entropy) a sequence they can take and how close (statistical distance, which has a technical definition) they can output a distribution close to uniform.

Anyway, I got distracted, but in general weakly random means distributions or sequences which are biased. The less biased they are the more they're called "almost random." In general, you want to build extractors that run on sequences with unknown bias.

hth

Sophistifunk · on May 18, 2016

> how can a deterministic PRNG have less entropy (0) than that of its seed

If there's any way two seeds can eventually lead to identical internal states would be one example.

hinkley · on May 18, 2016

Let's say I have a machine that runs one cron job. It looks for work, writes a few files if it finds some. Over a three day holiday weekend, no work is queued so the machine hardly does anything. And it's on a vlan so there's little to no broadcast traffic hitting it.

Tuesday morning I ssh into the box. How much high quality entropy do you think I got from the nearly deterministic network and disc traffic on this machine? Maybe a few bits.

What I'm hoping comes of this is that we can find a better set of inputs for several/random, and that we can use more conservative estimates of entropy from the sources that are already used. Also I hope we find a way to treat intel's hardware RNG as a low quality input (currently it's a no quality input.)

njohnson41 · on April 1, 2016

Interestingly, that may be the new ordering if the disks are SSDs, but the typical seek latency on a spinning disk (~5 ms) is definitely higher than the latency to read data from another machine's memory across ethernet (a few hundred us), and even the bandwidths are comparable (~150 MB/s).

So, now it has jumped from (disk -> network -> memory -> ...) to (network -> disk -> memory -> ...), which is a big change.