Talking to Whales II

DolphinGemma is an audio-in, audio-out model that processes natural dolphin sound sequences to identify patterns and predict subsequent sounds.

Talking to Whales II

The Long Conversation: From Whistles to AI

The Wild Dolphin Project (WDP) has been conducting what might be the longest ongoing conversation in scientific history. Since 1985, they've been studying a specific community of Atlantic spotted dolphins in the Bahamas, creating a uniquely rich dataset spanning generations of dolphin lives, complete with individual identities, life histories, and behavioral contexts.

This longitudinal approach has allowed researchers to correlate specific sound types with particular behaviors: signature whistles functioning as names, burst-pulse "squawks" during confrontations, and click "buzzes" during courtship or shark encounters. As Dr. Denise Herzing, Research Director and Founder of WDP explains, knowing the individual dolphins involved provides crucial context for accurate interpretation [1].

What makes DolphinGemma fundamentally different from previous efforts is its approach as an audio-in, audio-out model that processes natural dolphin sound sequences to identify patterns and predict subsequent sounds—similar to how language models predict the next word in a sentence. Rather than simply categorizing sounds, it's seeking the underlying structure of dolphin communication.

Beyond Translation: Generating Dolphin-Like Communication

While previous AI models have focused on classifying animal sounds, DolphinGemma takes a generative approach. Developed by Google in collaboration with Georgia Tech and using data from the WDP, the model employs the SoundStream tokenizer to represent dolphin sounds efficiently, which are then processed using architecture optimized for complex sequences [1].

The ~400M parameter model is designed to run directly on Pixel phones that WDP uses in the field, making it practical for real-time deployment in oceanic environments. This represents a significant departure from lab-based studies, bringing AI analysis directly into the dolphins' natural habitat.

What's perhaps most intriguing is DolphinGemma's potential to generate novel, dolphin-like sound sequences. This isn't merely about translating dolphin to human or vice versa, but potentially allowing researchers to produce contextually appropriate responses in a form that dolphins might recognize as meaningful communication.

The Anthropomorphism Paradox

Here we reach a fascinating tension in the project. Scientists must simultaneously avoid anthropomorphizing dolphin communication—assuming it works like human language—while using human-designed AI systems trained on human language to analyze it. This creates what we might call the anthropomorphism paradox: we need models based on human communication to help us understand non-human communication precisely because it may be structured so differently from our own.

DolphinGemma attempts to navigate this paradox by focusing on identifying patterns rather than forcing human linguistic frameworks onto dolphin communication. The model isn't trying to find dolphin "words" or "grammar" directly analogous to human language; instead, it seeks statistical patterns and regularities that might reveal the underlying structure of a potentially alien communication system.

This approach acknowledges that dolphin communication might be organized around fundamentally different principles than human language—perhaps prioritizing emotional states, spatial relationships, or cooperative hunting strategies over the object-oriented concepts that dominate human communication.

Beyond Dolphins: The Wider Implications

While DolphinGemma is specifically focused on Atlantic spotted dolphins, its methodological approach could ultimately transform multiple fields. Unlike other animal communication AI projects like Earth Species Project's NatureLM or the BEANS (Benchmark of Animal Sounds) initiative, DolphinGemma combines species specificity with generative capabilities, potentially offering the best of both worlds [1].

The model's planned release as an open model in summer 2025 signals Google's intention to foster broader research applications [1]. Marine biologists studying other cetacean species could fine-tune the model for bottlenose or spinner dolphins, while the underlying architecture might inform approaches to analyzing communication in entirely different species.

Beyond biology, DolphinGemma's approach to sequence analysis could inform advances in various fields:

  1. Environmental monitoring: Detecting changes in acoustic environments that might signal ecosystem stress or recovery
  2. Bioacoustics: Creating more sophisticated models of how animals use sound across species and environments
  3. Grounded AI generation: Improving how AI systems produce outputs tied to real-world contexts and citations [1]
  4. Human-machine interaction: Developing more natural interfaces between humans and artificial intelligence

The Ocean as Information Space

Perhaps most profoundly, DolphinGemma invites us to reconsider the ocean not just as a physical environment but as an information space—a medium through which complex messages propagate, carrying meaning between intelligent beings.

Dolphins evolved in an environment where sound travels nearly five times faster than in air, with significantly less attenuation over distance. Their acoustic world is rich with information: they can "see" inside each other's bodies through echolocation, potentially sharing sensory experiences we can barely imagine. Their communication may encode information in dimensions—like sound reflection patterns or precise timing intervals—that human languages typically ignore.

If DolphinGemma and similar projects succeed, even partially, in helping us understand the structure of dolphin communication, we may gain not just scientific knowledge but philosophical insight. We might begin to grasp how intelligence can manifest in forms radically different from our own—an understanding that could prove valuable not just for interspecies communication on Earth, but for how we approach potential non-human intelligence beyond our planet.

References

[1] Google: DolphinGemma: How Google AI is helping decode dolphin communication