Back to podcasts

The Future of Audio AI: Insights from ElevenLabs

with Mati Staniszewski

Also available on

Founder Talks
Machine Learning
Platform Engineering
Startup
Technical Deep Dive

Chapters

Introduction and Welcome
[00:00:00]
Mati Staniszewski's Background and ElevenLabs Origin Story
[00:01:00]
Early Technical Breakthroughs in AI Audio
[00:04:00]
The Pivot to Text-to-Speech and Voice Cloning
[00:10:00]
Developing the Platform Approach
[00:20:00]
Challenges and Innovations in Conversational AI
[00:30:00]
Organizational Structure and No Titles Concept
[00:40:00]
The Future of AI Audio and Multimodal Models
[00:50:00]
Closing Thoughts and Future Excitements in AI
[01:01:00]

In this episode

In this episode of AI Native Dev, hosts Guy Podjarny sits down with Mati Staniszewski, the visionary CEO and co-founder of ElevenLabs, a leader in AI audio technology. Mati shares the origin story of ElevenLabs, detailing how a frustration with subpar dubbing in Polish movies sparked a mission to revolutionize audio processing. The conversation delves into the technical challenges and breakthroughs that have positioned ElevenLabs at the forefront of AI-powered audio experiences. Mati also discusses the company's unique organizational culture and its commitment to pushing the boundaries of audio AI. Whether you're a developer, entrepreneur, or AI enthusiast, this episode offers valuable insights into the future of audio technology and the role ElevenLabs is playing in shaping it.

The Genesis of ElevenLabs

ElevenLabs began its journey with the combined vision of Mati Staniszewski and his co-founder Piotr. Both had rich backgrounds in tech, with Mati having worked at Palantir and Piotr at Google. Their personal bond, formed over 15 years of friendship and professional collaboration, played a crucial role in the founding of the company. Mati explains, "We always wanted to potentially find a problem that we are both excited about and can work on together." The frustration with subpar dubbing in Polish movies sparked their motivation to innovate in the audio space. They identified significant technological gaps, particularly in the quality of dubbing and the limitations of existing audio technologies. This realization set the stage for ElevenLabs' mission to revolutionize audio processing.

Breaking Down the Audio Problem

The audio problem is multifaceted, encompassing challenges in transcription, translation, and text-to-speech conversion. Mati describes how early audio technology was limited, often resulting in robotic-sounding outputs. To tackle this, ElevenLabs broke down the problem into manageable components. They began by addressing speech-to-text accuracy, speaker diarization, and timestamp precision. As Mati notes, "The speech to text was okay for English, but the element of when things are being said was hard." This segmentation allowed them to focus on improving each aspect, setting the groundwork for more advanced solutions.

Initial Innovations and Discoveries

ElevenLabs' initial innovations centered around developing a robust text-to-speech solution. Although they started with a focus on dubbing, feedback from potential users highlighted the immediate demand for high-quality text-to-speech capabilities. Mati shares, "We took a step back and that's to your second part of the question, which was to be able to solve this problem truly, instead of relying on the technologies that exist." This pivot was crucial, as it aligned their offerings with market needs, leading to the creation of their first prototype that integrated innovative AI models for voice synthesis.

The Role of Transformers an Diffusion Models

Transformers and diffusion models have been pivotal in enhancing ElevenLabs' audio processing capabilities. These models allow for nuanced voice cloning and the expression of emotions in synthesized speech. Unlike traditional methods that relied on hardcoded characteristics, ElevenLabs' approach lets AI models autonomously determine voice features. Mati explains, "We took a slightly different approach where instead of us hard coding those features... Let model decide what those components should be." This innovation has significantly improved the naturalness and emotional depth of their audio outputs.

The Evolution to a Platform Approach

As ElevenLabs evolved, so did their approach to service delivery. Initially focused on discrete audio components, they shifted towards providing comprehensive APIs and conversational AI solutions. This transition not only expanded their market reach but also reinforced their role as a platform provider. Mati elaborates, "We decided, let's meet our customers where they are. Let's try to build solutions that are actually solving their entire problem versus part of their problem." This strategy has enabled them to offer both foundational tools and end-to-end solutions tailored to diverse use cases.

Organizational Structure and Culture

A unique aspect of ElevenLabs is its organizational culture, characterized by the absence of formal titles. This flat structure promotes innovation and collaboration, allowing ideas to flow freely across the company. As Mati puts it, "Impact shouldn't be defined by the title. It should be defined by individuals." This approach has fostered an environment where the best ideas win, encouraging team members to contribute meaningfully regardless of their tenure or position.

The Future of AI Audio Technology

Looking ahead, Mati envisions a future where AI audio technology enables real-time dubbing and enriched conversational interfaces. He is particularly excited about the potential of multimodal AI models, which could create more immersive and interactive experiences. ElevenLabs is committed to maintaining its research excellence while expanding its product offerings, ensuring they remain at the cutting edge of audio innovation. Mati concludes, "We want to be known as one of the research hubs... and have the full-fledged audio AI platform."

Summary

In this insightful discussion, Mati Staniszewski shares the evolution and future direction of ElevenLabs. From their early challenges in the audio industry to pioneering breakthroughs in AI audio, ElevenLabs has consistently pushed the boundaries of what's possible. Their innovative approach to organizational structure and commitment to research excellence positions them as a leader in AI audio technology. As they continue to expand their offerings, we can look forward to new product launches and ongoing advancements that will redefine our engagement with audio content.

Founder Talks
Machine Learning
Platform Engineering
Startup
Technical Deep Dive

Chapters

Introduction and Welcome
[00:00:00]
Mati Staniszewski's Background and ElevenLabs Origin Story
[00:01:00]
Early Technical Breakthroughs in AI Audio
[00:04:00]
The Pivot to Text-to-Speech and Voice Cloning
[00:10:00]
Developing the Platform Approach
[00:20:00]
Challenges and Innovations in Conversational AI
[00:30:00]
Organizational Structure and No Titles Concept
[00:40:00]
The Future of AI Audio and Multimodal Models
[00:50:00]
Closing Thoughts and Future Excitements in AI
[01:01:00]