I don't want to detract from the images, or the work that went into this, but, er, there are a lot of types of birds. Just among passerines (songbirds), we're talking over 6,000 species.
So any sweeping statement like "birds don’t seem to bother to create a complex multi-layered harmonics pattern" is practically guaranteed to be wrong. And so it is. Lots and lots of birds sing incredibly harmonically complex songs. Browse any of these (https://www.remoteenvironmentalassessmentlaboratory.com/expl...) if you're interested - it's a tiny sample of birds, and many, many of them do in fact have harmonically complex songs.
I should have been clearer. I am also talking about moment-to-moment spectral complexity or entropy.
I agree that the point TFA is suggesting seems to be about the spectral complexity at any given moment in song - what you might call the timbre - and how nearly pure the tones in their spectrograms are. (They don't specify what species they're showing. Looks like several, but I can't ID them by eye.)
My point is that no, in fact, many bird species produce vocalizations that are indeed spectrally complex (beyond even just harmonic stacks) from moment-to-moment.
Take a look at song from a blue jay (https://www.remoteenvironmentalassessmentlaboratory.com/expl... ; not the best example, or one I produced, but an easy one to hand), particularly the syllables near the end of the clip. That's an example of a complex timbre.
And lots of species produce song and calls with features like this.
Harmonically complex is unrelated (unless they do something like Tuvan throat singing), but the species thing is a good point.
Compare a parrot or a crow vocalizing. From that it seems some can control the amount of overtone or at least can use two different modes or something.
Seems almost like comparing humans whistling vs speaking or something (assuming whistling is closer to a pure sine tone).
See my reply to your sibling comment - I do in fact mean spectral complexity at any given moment, not just over the course of the song. Many species of birds can and do vocalize with complicated timbre, including harmonic stacks, buzzes, clicks, and others harder to characterize.
And it's great that you bring up the possibility that two modes might be engaged. The avian vocal organ, the syrinx, has two sets of membranes which can vibrate as air is passed over them. Many species (particularly, as you'd expect, the ones best at imitation, like corvids, parrots, lyrebirds, etc) are able to control these two sources independently (but even those which generally don't control them separately can produce syllables with rich timbre), layering a harmonic stack with a click or a buzz.
I feel I should reiterate: nothing here is meant to detract from TFA's demonstration of what looks like a nice acoustic analysis tool. But TFA is, unfortunately, just plain wrong in its conclusion that birdsong is mostly pure tones.
seems like selection bias, plenty of birds have both simple tone songs and harmonically complex vocalizations, if you just draw images from the first sample you will get nice single tone pictures
It makes sense to me that bird song would have few harmonics. Although we may appreciate the beauty of bird song they don't sing for aesthetics. They are signaling. Concentrating energy into a narrow band delivers greater range for the same energy than harmonically rich signals. We do likewise with electromagnetic communication.
For a loud, high-pitched sound produced from a tiny object like a bird, is it even physically possible to have the loud resonances needed for overtones and timbre? I think it's just physics.
Whales are "signaling" over huge distances too but have plenty of overtones, since that's what low frequencies in huge cavities produce.
I'm not sure the dichotomy between "aesthetics" and "signaling" you suggest actually exists. A peacock's plumage is certainly aesthetics but also certainly signaling. And what is a birdsong's melody if not aesthetics?
When a bird sings to attract a mating partner, competing against other birds, does it not sing for aesthetics?
Also, couldn’t harmonics be used to improve decoding in noisy environments? Spreading the signal over a wide band is not unheard of in man-made electromagnetic communication either.
When using spread-spectrum transmission it would be counterproductive to utilize harmonics. Non-sinusoidal noise at frequency F would pollute frequencies k*F. Also the transmission band is likely less than an octave.
> However it would be a mistake to to call flute sound simple: as you see, every level has its own regular pattern that can’t be recreated with a simple mix of sinusoidal tones.
The resonance of the flute's air chamber is driven by the noise of forced air stream, which is why it contains high frequencies at all. Of course you can't recreate the flute sound just with some periodic sinusoidals, because the noise component has to be present. The noise is complex and is itself filtered by the flute's chamber, in a different way depending on which valves are closed. You can play a recognizeable scale on a flute without getting a tone out of it, just using air noise, the same way you can produce a musical tune using chhhh sounds out your mouth. Those notes, or something like them, are still there in the background when a tone is being produced. There is no flute tone without them.
Those are intersting. They remind me of the sonographs we collected of bat echolocation calls when we were doing bat surveys with anabat detectors. Just by the shape of the calls you could narrow the bat down to a likely species and determine whether it was just flying around, actively hunting or about to eat a hapless insect.
Audio person here. I found the post fascinating, but I wish they did more to explain what they were talking about to a layperson.
Basically, all sounds that you hear are composed of many layered sine waves of different frequencies and intensities. The graphs in the post are spectrograms, which graph those frequencies over time. The Y axis represents pitch, the X axis represents time, and the brightness at any given point represents how loud that particular frequency was at that particular time.
Most sounds, even seemingly simple ones, look very complex on a sonogram, like a smudged pen stroke. The images of different instruments below demonstrate this; these are all very complex sounds, even though we only hear it as a single note being played. The voice one is one of my favorites, because it shows just how weird and complicated everyday sounds can get.
But bird songs are different; on a sonogram, they appear as a single line. The complexity of the bird songs here comes from the fact that they're taking a single sine wave and changing the pitch over time. Where most sounds look like a complex mix of smudged paint strokes, bird songs look like a single, precise, bouncing stroke.
Which is akin to dragging a single finger across different piano keys. Only a single frequency, or note, played at a time. This is common among songbirds.
Contrast that with the sound of a crow. The sonogram is much more broadband in signature. This is akin to mashing a bunch of keys on a piano all at the same time. Many frequencies present at simultaneously.
> Which is akin to dragging a single finger across different piano keys. Only a single frequency, or note, played at a time.
I think there's a key difference.
Assuming this is the spectrogram of single note being played on the Piano (https://soundshader.github.io/hss/gallery/piano/2.jpg) (which I can't be certain of, since the audio sample wasn't provided). Seems like a single piano note fires on multiple frequencies, and our ear 'aggregates' them so we hear it as a single note.
Songbird belts out a single frequency at each point in time. We still hear a single note but there's nothing to aggregate.
At least that's my interpretation of the parent comments. Again, can't be sure.
I'd like to see fMRI of the listening birds' brains.
Crows, in the morning, seem to be broadcasting work gang related information, organising their crew to go and harvest certain regions, then report back on the yield.
If songbirds are courting, and hence broadcasting different information, for different purposes, I wonder if some generalisable differences might be apparent in the receiving birds brains.
That's really interesting... I wonder if it has something to do with birds have such small resonating chambers, so I looked for bigger birds. Apparently, emus make several calls that have interesting harmonics.
Andrew Huang does a good job of explaining harmonics and overtones[0]. You only need to watch until about the 4 minute mark (from the timestamp) to get an explanation of harmonics and what the sonograms represent.
The short of it is that most natural sounds product a root tone plus a varying amount of related tones above it. Our ears hear the root tone, and the other tones above it are what give the sound its uniqueness. That's why a guitar, a clarinet and 3 singers can produce the same note while sounding distinct.
Birds seem to produce a natural sound without a lot of the related tones above it. Their sound is, relatively speaking, much purer than most other natural sounds. That's very unique.
For a moment I thought they would hypothesize that the patterns might be an actual “rendering” of what the birds were looking at, especially when I saw the 7th image which looks remarkably similar to a rodent.
Apparently I'm pretty far off, but just this idea of animals communicating images with sound waves was something I'd otherwise never have.
What if groups/species of birds had "collective synesthesia"?
Some people see colors or feel tastes... and it doesn't seem that unlikely that there would be selection pressure for parts of the bird brain to connect their spatial awareness to their auditory system and then songs. There could be a nice feedback loop that their own songs would strike similar experiences for themselves (if they hear themselves similar to others).
Of course we know that bats have this type of neural connection in echolocation and dolphins/whales may even using it to communicate in similar ways with their songs.
If you place migratory birds in a big round cage, they show 'Zugunruhe[0]', or migratory restlessness: they jump and flutter in the direction that they would be otherwise be migrating if they were out in the wild. Rotating the magnetic field (e.g., by putting magnets around the aviary) also rotates the direction of their Zugunruhe.
No one totally understands how this works, but the magnetic information is thought to be 'overlaid' on other sensory information. One candidate pathway involves a light-dependent pathway in the retina. When a cryptochrome absorbs light, it generates radical pairs that affect how visual information is perceived[1]. This could give the birds something like HUD which displays magnetic field lines on top of the visual scene. Consistent with this idea, birds can only orient to magnetic fields under certain colors of light, with the color varying a bit from bird to bird[2]. It's almost as if the colored light washes out the HUD.
There's another parallel pathway involving bits of magnetite in the beak[3,4]. These signals flow through the trigeminal nerve, which carries a lot of different signals; it would not be impossible for this to manifest as "pressure", as it carries touch/somatosensory information in many animals.
I remember that "cymascope" experiment that used a dolphin's sound as input. Turned out the water surface had an image (after lots of noise deblurring) of a human the dolphin just seen. This means the organ that decodes sound is a simple membrane with water. How dolphins encode images into sound is a separate question.
I heard a crazy theory that this is how dolphins communicate. As they perceive using sound and are able to produce sound, perhaps they project the sound of the thing they want to talk about. It would be the same as a human being able to shine any image we like onto a wall to communicate. Can they recreate the echo of a fish with high enough fidelity that it's recognisable to another dolphin?
Perhaps it's good to know that the x axis (from left to right) is temporal. So for someone/something to translate this to an image with some special sense would also necessitate some memory part.
With a big fantasy you could continue your hypothesis by comparing it to describing an object you see with words by telling the height of the outline from left to right.
This is amazing. The GitHub repo has more information too, about how the common explanation of how our ears work (isolating sounds in an FFT-like-process) may be incorrect:
> Contrary to what you might think, our ears don't seem to rely on an FFT-like process to extract isolated frequencies. Instead, our ears detect periodic parts in the signal, although in most cases those periodic parts closely match the FFT frequencies. There is a simple experiment that proves this point:
Put in earbuds or a headset and (quietly) play sounds that are slightly different in frequency (1-10hz).
You will still hear a sort of ethereal beat tone between them that's different than the beat you would hear if you were listening to the same through speakers. I don't see how this would occur frequency-domain perception and it's too present (at least in my experiment) to be attributed to bone conduction across the skull.
As far as I know, we don't know the exact mechanism by which binaural beats are produced, but it does seem to be from FFT information.
The hair cells in our cochlea function essentially as a great big FFT -- this is well-established -- and so our source neural input begins with that. The brain doesn't have access to the underlying waveform at all, as far as I'm aware. It is incredibly sensitive to the timing from each ear though (just as FFT includes phase information).
Our brain does perform advanced signal processing to condense sets of overtones into a single fundamental frequency, which even works in the case of a "missing fundamental", and binaural beats are conceivably explained by this mechanism. Though it could be a different mechanism at work as well, related to how we process audio spatially.
Whoa! At least that gives me some comfort I wasn't imagining it.
I wish I understood some of the neurological explanation of how the beats are perceived because it doesn't seem to match with my understanding of the explanation of how our ears work. If everything is pushed into the frequency domain based on the stimulation of different parts of the cochlea, where does the time domain beat emerge?
Part of why we evolved two ears is to be able to locate sounds within our perceptive field. I think the best example is to listen to some recordings made with two head-mounted microphones. I like these here; one was the closest I’ve come to believing I was standing in a pond while in my own house. It requires headphones to get the effect: https://quietamerican.org/field_vietnam.html
There’s another component at play too: beat frequencies. This happens anytime you have different frequencies playing simultaneously. This is a result of simple waveform interference. Lots of examples, but I’ll never pass up a chance to link to Julius Sumner Miller[0]: https://youtu.be/7dxkW5bsUgs
So the brain is doing lots of work to integrate the stereo “image,” in much the same way we can wear 3D glasses and perceive depth[1]. Binaural beats reduce things down to a more fundamental level: you’re playing with how your mind integrates the stereo field in a weird way, and it produces a beat frequency that does not exist in the pulsed air. This may be learned behavior.
[0] I’m eagerly waiting for some music producer to sample this video: “all the music fell out”, “we should have this mechanism called beats”, “beats: wonderful!”, etc.
[1] I wonder what the effect of rapidly switching the left/right components of a stereo image would be. Probably nausea.
The part that doesn't make sense to me is that the binaural beat frequency corresponds with the physical beat frequency of the sound. So if you got 432Hz in one ear and 428Hz in the other, you're going to hear a 4Hz beat frequency between the two.
If the cochlea is effectively taste buds for sound, the only thing the brain is going to get is which part of the cochlea is being tickled. There's no time domain information there, just some ambiguous 'pitch'.
If that's the case though, how does the brain know to synthesize the 4Hz differential between these two frequencies. The 432Hz and the 428Hz aren't making it to the brain, just the fact both ears are getting tickled in very close but different places.
(Also my dad absolutely LOVED watching JSM and would always call us into the room any time he was doing one of his crazy experiments on TV. I agree his stuff is very 'sampleable'
Edit: Just watched the video, it's actually a gold mine for hip hop lol. Just play this in the background and scrub around his videos - https://www.youtube.com/watch?v=JVISRjhXzzM
Oh that simultaneous site is neat! Doesn’t work for me on iOS but what a great idea. I’ve been using two separate devices for that kind of thing (mostly confirming mashups that I will likely never follow through with, but it’s still fun to do).
Looks like you know more than me about how our brains process audio. I was running on the assumption that some kind of frequency analysis made it to our higher processing centers, is that not the case? Given that what we hear all the time is incredibly chaotic (multiple pitches that we hear as chords, lyrics vs the rest of the music, focusing on one person speaking, etc) I thought we must at least be running some kind of internal spectrum analyzer and continuously comparing new input to previous averages or something.
There almost has to be a clock-like reference construct somewhere, right? The ability of some people to perceive perfect pitch points to it IMO.
The hair cells in our inner ear are activated in response to specific frequencies, virtually identical to how an FFT operates.
There is then further processing attached to that, in the same way that we perceive colors rather than cones directly in our vision. With audio, we need to collapse entire series of overtones into a fundamental frequency with a timbre (color).
And specifically in the experiment linked, it's showing how the ear intelligently restores the missing fundamental.
But I don't believe there's any biological/neurological evidence to support that the ear detects periodic repetitions of an overall waveform, and that this explains the missing fundamental. To the contrary, how our hearing works biologically suggests that we are indeed performing something like FFT, but then applying sophisticated pattern recognition on it -- on the "FFT", not on the waveform.
Slight related tangent: For all those interested in birding, I highly recommend the BirdNet app available on Android and iOS! You can record and send audio data, and it will try to classify the bird for you based on the recording.
In the process of recording, it will show you the bird's spectrograms, which is really cool!
I wanted to add that the nature of those thin horizontal layers and the patterns they form is similar to diffraction. A laser beam passing thru a thin slit forms a pattern with the sinc-wave distribution of intensity, and so is FFT of a rectangular window used in these spectrograms makes a sinc-wave shape that "diffracts" the "true" sound frequency.
So any sweeping statement like "birds don’t seem to bother to create a complex multi-layered harmonics pattern" is practically guaranteed to be wrong. And so it is. Lots and lots of birds sing incredibly harmonically complex songs. Browse any of these (https://www.remoteenvironmentalassessmentlaboratory.com/expl...) if you're interested - it's a tiny sample of birds, and many, many of them do in fact have harmonically complex songs.