Hacker News new | past | comments | ask | show | jobs | submit login
Creating randomness Without Math.random (healeycodes.com)
78 points by healeycodes on July 12, 2020 | hide | past | favorite | 32 comments



There are a few algorithms I think are simple enough to memorise yet seem universally useful enough to be bound to be good to know by heart at some point or other. LCG is one of them!

Something else that might be useful to know are methods to transform the uniform output of LCG into various other types of distribution shapes:

When U is uniform between 0 and 1 exclusive,

One can get an Exp(lambda) distributed variable with -log(U)/lambda

One can get a heavy-tailed Pareto(m,a) distributed variable with m/(1-u)^(1/a)

I'm still missing something easily memorable for bell-shaped distributions. The normal distribution would be nice because it's so easy to transform N(0,1) into a distribution with arbitrary parameters, but there's no simple way to compute the standard normal.


By way of the central limit theorem, summing a dozen uniformly distributed variables (and normalizing) will give a a reasonable approximation to a normal distribution.

https://en.wikipedia.org/wiki/Central_limit_theorem


Very clever. My first instinct was that that would be inefficient, but addition is super cheap and the uniform distribution converges rathe quickly to normal. I like it a lot!

Even if LCG itself would be considered expensive, for quick prototyping needs this should be sufficient.


Once you have U([0,1]) you can generate anything you want. Just take Finv(u) where Finv is the inverse CDF of the thing you want. See [1] property 5.

https://en.wikipedia.org/wiki/Cumulative_distribution_functi...


Obviously. But as further reading exposes, the inverse is not always easily memorable -- or even easy to to compute! The inverse of the normal distribution, specifically, needs estimation of an integral to compute.


I guess it depends on what you assume are mathematical "primitives".

Log is an integral. Exp is an infinite series.

Normal cdf (erf) and its inverse (erfinv) are implemented almost everywhere exp and log are implemented. E.g. MKL and CUDA. Both can be computed via series or integral (like exp and log).

[1] https://en.wikipedia.org/wiki/Error_function

[2] https://people.maths.ox.ac.uk/gilesm/files/gems_erfinv.pdf


Now that! Is much more constructive information. I'll look for those methods sooner in the future.


One can get two normally distributed random variables Z1 and Z2 from two uniformly distributed random variables U1 and U2, but the formulas are a little harder to remember:

https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform


There's good intuition for this. The key moral of the story is this:

> The normal distribution provides uniform sampling on the > unit n-sphere.

What does that mean? Assume one wants to sample a point uniformly from the unit circle. So we want to uniformly sample from the set { (x, y) : xx + yy = 1 }.

The naive method of uniformly sampling x over [0, 1] and then uniformly sampling y in [0, 1 - xx] doesn't work. It doesn't work because for a given change in dx, the change in dy is not uniform.

For example, if we are the point (x=0, y=1), in an infinitesimal region around this point, it looks like "flat land", so moving in dx does not change dy. That is, the tangent to the point (x=0, y=1) is horizontal.

On the other hand, if we are at the point (x=1, y=0), in an infinitesimal region around this point, it looks like a cliff face: all we have is vertical. So the dy change is "infinity". That is, the tangent to the point (x=1, y=0) is vertical.

Let's sample both x and y from the normal distribution. So the probability of getting x=r is e^{-rr}. The probability of getting y=s is e^{-ss}. The probability of getting some random point (x, y) = (r, s) is e^{-(rr+s*s)} = e^{-1} = constant.

So if we sample two points using the normal distribution, the points are uniformly distributed on the unit circle.

### Box–Muller transform

On the flip side, assume we have a uniformly distributed point on the circle. This means that the angle theta is uniformly distributed in [0, 2pi].

Given such a uniformly distributed point, we can generate two gaussian distributed numbers by taking their x and y component.

The intuition for why this works comes from the above:

- We saw that (x ~ normal, y ~ normal) => (theta ~ uniform) [as explained before].

- We can also show that (theta ~ uniform) => (x ~ normal, y ~ normal)

which is what the box-muller transform does.


Brilliant! I suspect this might help me remember the Box--Muller transform, which would be even better than relying on the CLT!


Great article, I love "because why not articles".

Checkout Perlin Noise, for when you need "smooth" and "natural" looking random numbers (making terrain, skies, waves, etc)

https://www.youtube.com/watch?v=8ZEMLCnn8v0


Deterministic pseudorandomness is really useful for seeding dummy data, data that you’d like to be fairly random (though it doesn’t need to be high-quality randomness, so LCG is fine) but consistent across runs/page loads.


Good stack overflow answer about js implementation of pseudorandom number generator: https://stackoverflow.com/a/47593316/1248177


I just watched a really good video explaination of this the other day: https://www.youtube.com/watch?v=4sYawx70iP4


The real value is the reference to the API calls which inherently represent less predictable seed state. If you can't call out, then these are acheing to be used alongside some kind of introspection into the runtime.


“Each Math.random function created for distinct realms must produce a distinct sequence of values from successive calls.”

Nitpick: it would surprise me if any implementation implemented this requirement to the letter.

I think neither of the claims (FTA) “PRNGs are deterministic” and “A PRNG eventually repeats its sequence” are strictly required (for the first you can’t seed Javascript’s Math.Random, so there is no need to be able to regenerate a sequence; for the second, one could generate good pseudo random sequences that never repeat), but the implementations I’m aware of all have those properties, and use fixed-size state.

Given that, the number of PRNGs whose period is at most P is finite. So, one could exhaust that set by creating realms and random number sequences in each of them in a loop (a looooong loop)

The only ways to avoid generating a copy would be to gradually move to generators with more state or to crash.


Any prng with a fixed (finite) amount of state will eventually return to a state it has already been in, if run long enough. Determinism then dictates that it will begin repeating outputs. It's odd as a requirement, but it's a natural property of a prng.


> “Each Math.random function created for distinct realms must produce a distinct sequence of values from successive calls.”

If this were not true, it would have security implications.

Thought experiment: If there was a single state then a web page could pull a bunch of numbers, use that data to derive the seed AND the next number that will be pulled without pulling it.

Then the predictor page can STOP pulling numbers when they know the next number is an auspicious one. The predictor page has now successfully "rigged" the dice roll for another page that shares the same state.


I know that an am not disputing it. My claim is that most, if not all, implementations do not satisfy that requirement.

Also, that requirement isn’t sufficient to prevent such attacks, by a long stretch.

An implementation that has only one sequence of random numbers r1, r2, r3, r4…, but drops a single item of the sequence for each subsequent realm (the second instance returns r2, r3, r4…, the third one r3, r4…, etc.) satisfies the requirement, and the individual sequences are as perfectly random as the original one), but still is vulnerable to attacks.

And it can be worse: an implementation that generates a unique r0 for each realm (could even be an easily predictable value) and returns r0, r1, r2, r3, r4…, with r1 and subsequent identical for all instances also satisfies the requirement.


Sounds to me the standard was trying to say "diffrent domains must use diffrent seeds." without requiring you to use seeds I.e. genuine phisicaly random sources should comply. I don't feel you need to know anything about other domains to implement this requirment.


TIL Hedy Lamarr's frequency-hopping patent https://patents.google.com/patent/US2292387 used a OTP instead of a PRNG.


Both of the claims apply to PRNGs. Math.random isn't required to be a PRNG.


> Both of the claims apply to PRNGs.

Why? A reasonable interpretation of "pseudo-random number generator" is a generator that yields a sequence of numbers that looks pretty random, but isn't. Why require it to be repeating?

For an example of a PRNG that doesn't repeat (infinite period), take your favourite periodic PRNG and XOR it with bits from pi or e or the Champernowne constant.

I do think the definition implies it's deterministic, though. Things that are not deterministic are random, so they are not pseudorandom. The fact that you can't seed the state of the Javascript PRNG doesn't mean that it's not deterministic: what matters is that once you know its complete state you can predict its output.


Technically, as long as you only have a finite amount of memory, there is only a finite number of states the computer can be in, so the output must repeat eventually. Though "eventually" can be long after the heat death of the universe.


We normally ignore that technicality. Like when we say adding two numbers is constant-time, or that a computer can simulate a Turing machine.


True, but in this case, xorshift128+ already maximizes its period by going through every possible state once before repeating. It only has 128 bits of state, but still, the difference between "128 bits" and "all of RAM" is only a constant factor. :) And of course, a period of 2^128 is already more than high enough that it will never repeat in practice.


I was simply clarifying the author's claims, which did not match what GGP said they were.

It turns out both of the author's claims are correct. Sibling comment shows why the repetition claim is correct, and you showed why the deterministic claim is correct (by definition).



It would be great if JavaScript had also a built-in seedable random number generator.


It depends on the implementation, e.g. Node has crypto.getRandomBytes()¹.

If you mean the standard browser implementation, Crypto.getRandomValues()² may be what you're looking for. Note that the browser implementation of this is only a recommendation, so YMMV:

> To guarantee enough performance, implementations are not using a truly random number generator, but they are using a pseudo-random number generator seeded with a value with enough entropy. The pseudo-random number generator algorithm (PRNG) may vary across user agents, but is suitable for cryptographic purposes. Implementations are required to use a seed with enough entropy, like a system-level entropy source.

> [...]

> There is no minimum degree of entropy mandated by the Web Cryptography specification. User agents are instead urged to provide the best entropy they can when generating random numbers, using a well-defined, efficient pseudorandom number generator built into the user agent itself, but seeded with values taken from an external source of pseudorandom numbers, such as a platform-specific random number function, the Unix /dev/urandom device, or other source of random or pseudorandom data.

¹ https://nodejs.org/api/crypto.html#crypto_crypto_randombytes...

² https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getR...


From what I can see at https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getR... or https://nodejs.org/api/crypto.html#crypto_crypto_randombytes...

... neither crypto.getRandomBytes() nor Crypto.getRandomValues() are SEEDABLE. There is no argument I could pass in that would produce the same random sequence for the same seed-value.

A reproducible random sequence would be useful in many situations I believe. I understand such a thing is not too difficult to program but it may be difficult to get verifiably good quality randomness out of your own implementation.


It's true that the standard is currently just a recommendation, but the implementation exists in every major browser already since, and including IE 11 https://caniuse.com/#search=getRandomValues




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: