The audio CD thing is pretty clever. Even if you don't know important factors li...

The audio CD thing is pretty clever. Even if you don't know important factors like the maximum frequency, you can get a great guess based on what you already know... like knowing Freddie Mercury could sing four octaves, starting from probably somewhere above "transformer hum sound".

You'd have to know each octave doubles in frequency.

Side quest: When you play the bugle, the played frequency increases or decreases by MULTIPLES of the base frequency---NOT powers of 2. Suppose this base frequency is 250 Hz. There is an octave from 250 to 500, but there's a note between the octave from 500 to 1000 at 750 Hz, and a few notes between 1000 and 2000 Hz, which is the part of the musical scale something like Reveille is played. If Reveille jumped from octave to octave, it would just sound like the intro to Justin Hawkin's cover of This Town Ain't Big Enough.

So, if you know transformer hum is 50 or 60 Hz and Queen's frontman starts his singing at 100 Hz, then he can sing up to 1600 Hz, or four octaves. Mentally recalling what his falsetto sounds like, you can imagine a really high-pitched guitar solo an octave above this, and you can still imagine what an octave above that would sound like. (Maybe you're getting close to dog whistle territory in your imagination.)

This, then, is 6400 Hz you are imagining. The top of each sound wave to the top of the next is 6400 Hz. To record this, you'd need the top AND bottom of each sound wave, because the speaker cone moving from maximum to minimum displacement is how the sound is made. If you want to make sure you aren't accidentally recording the middle (zero crossing) of each wave, you can even take three or four or five samples per sound wave instead of two. It's a lot of thought, but you can reasonably decide that 25000 Hz is a good sampling rate for capturing much of the range of human hearing. Going too far beyond that, you're wasting storage space.

A CD holds a bit more than an hour of music, or 3600 seconds. If you've listened to Dire Straits, Eagles, Cyndi Lauper, Metallica, David Bowie, Led Zeppelin, ELP, or nearly any other band, you're probably aware the recordings have independent left and right channels.

Finally, each sample is going to be somewhere between "speaker fully retracted" and "speaker fully extended". With 5 bits, this gives 16 "stops" from the middle point to fully extended. But we know that music can get really quiet when it fades out, and a lot of volume knobs can go from zero to thirty and sometimes higher. When you have the volume at one, you can still tell the difference between loud parts and quiet parts, so you'd need an extra 5 bits just to get good dynamic range at loudest and quietest volume settings, or 10 bits. What happens when you double this? If you have 20 bits, you are probably close to wasting bits. You have a million places where the speaker coils can move to. For a speaker that moves a few millimeters, this means 20-bit resolution allows steps of a few nanometers. This is the scale of computer chips and color wavelengths. If you took the color blue and shifted its wavelength by a few nanometers, it would still be practically the same shade of blue! Without knowing about bit depth, you can reasonably assume 16 bits is good because it's a power of two and will give a lot of dynamic range. 8 would be too low. 32 is just wasteful.

With 32 bits, a speaker capable of moving 1 cm end-to-end will have 10 carbon atom diameters of linear resolution. The ears are impressive, but I don't know they can differentiate the air displacement of (speaker cone area) x (ten carbon atoms). Even having 0 to 100 on the volume knob, this leaves 25 bits of range at each volume setting. This is audiophile (and arguably, snake oil) territory.

So then, you can say 3600 seconds is pretty close to 3000 seconds, 2 channels is close to 3, 16 bits is close to 10, 25000 Hz is close to 30000 Hz... 3 x 3 x 3 x 10 x 1000 x 10000 ≈ 3,000,000,000. Since a byte has about 10 bits, divide by ten, and this yields a first approximation---based on logic reasoning of what we know---of 300 MB. It's wrong, but it's not "very" wrong. (It's off by a factor of two, not a factor of ten! Not bad for 4 rounded, intermediate conversion terms...)

(The idea is to round each term to a value starting with 1 or 3, because multiplying 3 and 3 is close to 10. The reason 2 is close to 3: 10^(1/2) = 3.16. This states that a good midpoint of 1 and 10 is 3.16, because if you square each term, you get: 1, 10, 100. Now, 10^(1/4) = 1.78. This means that any value less than 1.78 would be closer to 1 after squaring, and any value higher will be closer to 10.)

You can even take the analysis further and back-calculate things like how fast the CD might spin by guessing the track width and bit area, how long a track skip would be, whether the size limitation of the CD is due to optical or material properties, how far the laser would need to be to converge at one bit while being close enough that any deviation in the surface flatness doesn't send the return beam away from the sensor, etc. (This is all the info you'd probably use to begin the approximation if you weren't aware an audio CD holds an hour of music, like if you were asked in 1975 to "back of envelope" whether a compact, non-contact, vinyl-like, LP-length recording medium was possible.)