My question, why focus on "can we talk with whales" when we already have the problem of "can we talk with humans" that has a known right answer?
We can have a spanish speaker (or speakers) sit down, feed us with hours and hours of content (heck, that's already there) and then work with AI to try and create a black box interpreter that can turn spanish to english.
Has anyone done this? If the goal is universal translator why not start off with a simple case?
My worry with trying to do this first with whales or other animals is we don't know the answer. Nobody can look at the whale AI translation and say "You know what, this was a good translation". Sort of the "Mars attacks" problem.
All this said, I do recognize that this creates a "human bias". The way humans talk may not readily translate to animal speech (But perhaps it would for other simians?).
But the problem is not a black box. Presumably at least some (likely most) of whale conversation is things like “I see some fish to our left” where you can measure a result, ie does the pod go left. (Some whales perform highly complex coordinated hunting strategies, which you’d think includes some verbal coordination.)
Or at least, you shoot for that and maybe discover that 99% is philosophizing you can’t understand. But maybe you can bootstrap from present tense to suggest they are reminiscing about previous fish seasons. And so on.
Perhaps, though I could see there being issues like "I see fish on our left" "well, you're always wrong so we are going right" being an issue. There's even the issue with "shorthand" that might not be universalizable. For example, with a hunting strategy you could imagine "Execute the Janeway protocol" being an issue for translation.
You can see some of the hunting strategy problems play out with sheep dogs. A well trained sheep herding dog can execute really impressive actions based only on specific whistles trained by the handler. [1]. It'd be wrong to conclude a specific element of language for all dogs based on the whistles.
> Perhaps, though I could see there being issues like "I see fish on our left" "well, you're always wrong so we are going right" being an issue.
This isn't a black box because the whalesong encodes some type of behavior that we could, in theory, observe and decode even if we didn't understand the atomic pieces of the whalesong. It's how you study any unknown language--you learn a few relations and then gradually build up a richer lexicon and grammar that gives finer-grained understanding.
I'm not sure what the dog whistle example is intended to demonstrate, as even human language has a significant learned component.
I agree you could find higher-order concepts, but I don’t see any reason to assume they would be remotely as common as object-level world-descriptions.
When you are asking someone to “pass the spoon, it’s to the left of the cup”, you don’t dip into a codebook and say “play 22, hut” even if you are a quarterback.
The problem with this type of thinking is that it undervalues the role that non-vital communication has in social species. Look at uncontacted peoples around the world. They've had music and dance even though their tribes consisted of a couple hundred people. These forms of communication serve a role in their communities but it isn't "there are fish in the pond over there" it's more special and abstract.
If we are to treat whalesong the same way then we can't immediately assume that they're just trying to communicate base concepts.
In any given human language, at least, most vocabulary tends to be concrete objects or physical processes. The parent is suggesting that we begin by looking for that subset of the language by correlating the speech sounds of whales with their behavior to objects in their environment.
Not sure exactly what you're saying, but: Bible translators regularly translate the Bible into other languages, including the kind of "tribal" languages you refer to. The people they translate for are fine with saying "there are fish in the pond over there", and they can also talk about abstract concepts. They may not have vocabulary for all the same concepts we have, whether abstract ("justification") or concrete (the Inuit didn't have words for sheep), but they have plenty of words for things for which we don't have individual words, and those other concepts can be described in phrases. And of course they have grammars.
I think that if you were trying to decode one of those languages you would take the same approach and observe coordination behavior, “there is a deer to our left” -> tribe goes left.
You need to start somewhere. Maybe you would never be able to decode the non-functional communication without a Rosetta Stone or common language root.
The point is that at least some of language needs to cash out into a description of the physical world. I’d argue almost all of it, even in modern societies.
It's easy to start learning another language without an interpreter or dictionary, you just start with concrete objects and actions. The hard part is representing the sounds (phonemes), which are often quite different from the language the learner knows. It helps that there is a way of writing all the known sounds of all the studied languages (the IPA, which Wikipedia uses), and it's pretty rare these days that a "new" sound is found.
I also see a problem with "talking" to whales being: "Hello, I am human who learned your click-language." "I don't care one bit, now where are the squid..."
Basically imagine them like some remote people eking out a survival but ten times more incompatible.
Kinda. As I recall this happened by accident with French in (GPT-2? Not confident which LLM) — even though it wasn't meant to be in the training set, there were phrases just lifted directly from the other language.
"Hasta la vista" et cetera have a certain je ne sais quoi, but I suggest a better Gedankenexperiment would be the language of the people of the North Sentinel Island — with whom you cannot interact by both law and them being homicidally xenophobic, and will therefore be limited to remote sensing with parabolic or laser microphones only.
Ideally, you'd use a language with native speakers who can ultimately verify the translation efforts. Perhaps a language like Icelandic, Celtic, or Korean would be better. Languages with little cross over into english while also having accessible translators.
North Sentinel island would be a good stress test, but not a good first step as we can't ask them if we got the language right.
They are solving a different problem. They are doing the "white box" translation, that is knowing good inputs and outputs.
I'm talking about something more akin to what these researchers are doing. Without feeding the AI information about the correct translation, can we make an AI that can translate spanish to english? That is, what the researchers are trying to do with this whale translation.
You don't need Parallel corpora for all language pairs in a "predict the next token" LLM.
What I'm saying is that if an LLM is trained on English, French and Spanish and there is Eng to French data, you don't need Eng to Spa data to get Eng to Spa translations.
Spanish may have been a bad first language, but others like Korean, Icelandic, or even Russian would work as there is very little cross over or related languages.
You'd have to be careful with the input data, though, as it would be easy to corrupt your dataset with translations if you try and do this fast.
Those are trained differently, being exposed to translations in the training. My understanding of the person you are replying to is proposing "discovering the meaning" of Spanish by just using audio without any translations in the training. Pretending we don't know Spanish as if it were an animal/alien language.
I wonder how much animal body language could help with that. Maybe instead of trying to focus purely on the language (like with human language translation), the algorithm could try to observe and infer the meaning from more than just audio? Dogs were domesticated long ago and can communicate with humans, sometimes purely through "facial" expressions and body movement.
I'd think it would depend a lot on the animal. Whales, for example, don't really have great eyesight. They depend a lot more on sound and have been observed communicating over long ranges (particularly because water carries sound quiet well). So it seems natural to conclude that whales would communicate more primarily through sound than other mechanisms.
We can have a spanish speaker (or speakers) sit down, feed us with hours and hours of content (heck, that's already there) and then work with AI to try and create a black box interpreter that can turn spanish to english.
Has anyone done this? If the goal is universal translator why not start off with a simple case?
My worry with trying to do this first with whales or other animals is we don't know the answer. Nobody can look at the whale AI translation and say "You know what, this was a good translation". Sort of the "Mars attacks" problem.
All this said, I do recognize that this creates a "human bias". The way humans talk may not readily translate to animal speech (But perhaps it would for other simians?).