FWIW, Sollya's Remez algorithm takes intermediate floating-point accuracy into account. At the very least, one should start by such a polynomial and then try to tweak it by hand, not just drag coefficients around from scratch. Numerics isn't “just messing around with numbers” as he claims; it's a large field with lots of cleverness.
As a quick example (using Maple, though, not Sollya): f(x) = x * (1.020875931 + x * (-0.0683760152 - x * 0.1122036320)) gives f(0) = 0 (exactly), f(Pi/2) = 1 (to at least nine decimal digits) and has a max error around 0.0019, significantly better than his hand-tweaked one (max error 0.007). For comparison, his fourth-order function has 0.002787 and the Bhaskara formula has about 0.0016. I don't know how many bytes it needs in MIPS assembly, but I normally count function speed in cycles :-)
If you allow a divide between two second-degree polynomials, like in the Bhaskara function, you can go down an order of magnitude in error, of course at extra cycle cost.
I looked at the video a bit more closely, and it seems he requires the symmetry around 0 (he doesn't explicitly do range reduction to positive values), so you want only odd-numbered coefficients. If so, you can do a formula of the type x * (c + x*x*(b + x*x*a)) (aka: ax⁵ + bx³ + cx; this is essentially the same as his formula with c=1 and prescaling x by a constant), which Sollya readily gives for single precision:
> fpminimax(sin(x*2*Pi/65536.0),[|x,x^3,x^5|],[|SG...|],[0;16384]);
Warning: For at least 2 of the constants displayed in decimal, rounding has happened.
x * (9.58634263952262699604034423828125e-5 + x^2 * (-1.46252571143166976153082714517950080335140228271484e-13 + x^2 * 6.1585575698811928868593794069233315902067715796875e-23))
This has a maximum error of 0.000108 over that range. This is pretty close to optimal, but you can squeeze it ever so slightly lower (just below 0.0001) if you're willing to spend a lot of CPU time. :-)
Take an interval, for sine it might be, for example [0,π/4], I think it was [0,π] in the video (a mistake, most probably). You're interested in the maximum approximation error over the interval, so you could, for example, sample ten thousand points from the interval uniformly, evaluate the error over all of them, and take the maximum.
Technically, what you're really looking for is the supremum, not the maximum. The former is the limit of the latter.
Sollya does something smarter than just sampling uniformly, though, it knows about the derivatives of all the functions it works with (and derivatives of the derivatives, etc), and uses that information to compute an arbitrarily small interval that is guaranteed to contain the supremum: https://www.sollya.org/sollya-weekly/help.php#supnorm
My package also does something more complicated than just sampling uniformly, however it doesn't know anything about the derivatives of the relevant functions, so the maximum it finds is just approximate.
In this case, you know the input is one out of 16384 distinct values (originally 65536, but due to range-reduction and symmetries, you only need to consider one of them). So you can simply test them all.
But in the case of the minimax polynomial, I just got it out of Maple (it can tell you as part of computing it). That's not accounting for floating-point inaccuracies, though.
As a quick example (using Maple, though, not Sollya): f(x) = x * (1.020875931 + x * (-0.0683760152 - x * 0.1122036320)) gives f(0) = 0 (exactly), f(Pi/2) = 1 (to at least nine decimal digits) and has a max error around 0.0019, significantly better than his hand-tweaked one (max error 0.007). For comparison, his fourth-order function has 0.002787 and the Bhaskara formula has about 0.0016. I don't know how many bytes it needs in MIPS assembly, but I normally count function speed in cycles :-)
If you allow a divide between two second-degree polynomials, like in the Bhaskara function, you can go down an order of magnitude in error, of course at extra cycle cost.