Episode Description
In this episode of AI Native Dev, hosts Guy Podjarny sits down with Mati Staniszewski, the visionary CEO and co-founder of ElevenLabs, a leader in AI audio technology. Mati shares the origin story of ElevenLabs, detailing how a frustration with subpar dubbing in Polish movies sparked a mission to revolutionize audio processing. The conversation delves into the technical challenges and breakthroughs that have positioned ElevenLabs at the forefront of AI-powered audio experiences. Mati also discusses the company's unique organizational culture and its commitment to pushing the boundaries of audio AI. Whether you're a developer, entrepreneur, or AI enthusiast, this episode offers valuable insights into the future of audio technology and the role ElevenLabs is playing in shaping it.
Overview
The Genesis of ElevenLabs
ElevenLabs began its journey with the combined vision of Mati Staniszewski and his co-founder Piotr. Both had rich backgrounds in tech, with Mati having worked at Palantir and Piotr at Google. Their personal bond, formed over 15 years of friendship and professional collaboration, played a crucial role in the founding of the company. Mati explains, "We always wanted to potentially find a problem that we are both excited about and can work on together." The frustration with subpar dubbing in Polish movies sparked their motivation to innovate in the audio space. They identified significant technological gaps, particularly in the quality of dubbing and the limitations of existing audio technologies. This realization set the stage for ElevenLabs' mission to revolutionize audio processing.
Breaking Down the Audio Problem
The audio problem is multifaceted, encompassing challenges in transcription, translation, and text-to-speech conversion. Mati describes how early audio technology was limited, often resulting in robotic-sounding outputs. To tackle this, ElevenLabs broke down the problem into manageable components. They began by addressing speech-to-text accuracy, speaker diarization, and timestamp precision. As Mati notes, "The speech to text was okay for English, but the element of when things are being said was hard." This segmentation allowed them to focus on improving each aspect, setting the groundwork for more advanced solutions.
Initial Innovations and Discoveries
ElevenLabs' initial innovations centered around developing a robust text-to-speech solution. Although they started with a focus on dubbing, feedback from potential users highlighted the immediate demand for high-quality text-to-speech capabilities. Mati shares, "We took a step back and that's to your second part of the question, which was to be able to solve this problem truly, instead of relying on the technologies that exist." This pivot was crucial, as it aligned their offerings with market needs, leading to the creation of their first prototype that integrated innovative AI models for voice synthesis.
The Role of Transformers an Diffusion Models
Transformers and diffusion models have been pivotal in enhancing ElevenLabs' audio processing capabilities. These models allow for nuanced voice cloning and the expression of emotions in synthesized speech. Unlike traditional methods that relied on hardcoded characteristics, ElevenLabs' approach lets AI models autonomously determine voice features. Mati explains, "We took a slightly different approach where instead of us hard coding those features... Let model decide what those components should be." This innovation has significantly improved the naturalness and emotional depth of their audio outputs.
The Evolution to a Platform Approach
As ElevenLabs evolved, so did their approach to service delivery. Initially focused on discrete audio components, they shifted towards providing comprehensive APIs and conversational AI solutions. This transition not only expanded their market reach but also reinforced their role as a platform provider. Mati elaborates, "We decided, let's meet our customers where they are. Let's try to build solutions that are actually solving their entire problem versus part of their problem." This strategy has enabled them to offer both foundational tools and end-to-end solutions tailored to diverse use cases.
Organizational Structure and Culture
A unique aspect of ElevenLabs is its organizational culture, characterized by the absence of formal titles. This flat structure promotes innovation and collaboration, allowing ideas to flow freely across the company. As Mati puts it, "Impact shouldn't be defined by the title. It should be defined by individuals." This approach has fostered an environment where the best ideas win, encouraging team members to contribute meaningfully regardless of their tenure or position.
The Future of AI Audio Technology
Looking ahead, Mati envisions a future where AI audio technology enables real-time dubbing and enriched conversational interfaces. He is particularly excited about the potential of multimodal AI models, which could create more immersive and interactive experiences. ElevenLabs is committed to maintaining its research excellence while expanding its product offerings, ensuring they remain at the cutting edge of audio innovation. Mati concludes, "We want to be known as one of the research hubs... and have the full-fledged audio AI platform."
Summary
In this insightful discussion, Mati Staniszewski shares the evolution and future direction of ElevenLabs. From their early challenges in the audio industry to pioneering breakthroughs in AI audio, ElevenLabs has consistently pushed the boundaries of what's possible. Their innovative approach to organizational structure and commitment to research excellence positions them as a leader in AI audio technology. As they continue to expand their offerings, we can look forward to new product launches and ongoing advancements that will redefine our engagement with audio content.
Resources
Chapters
[00:00:00] - Introduction and Welcome
[00:01:00] - Mati Staniszewski's Background and ElevenLabs Origin Story
[00:04:00] - Early Technical Breakthroughs in AI Audio
[00:10:00] - The Pivot to Text-to-Speech and Voice Cloning
[00:20:00] - Developing the Platform Approach
[00:30:00] - Challenges and Innovations in Conversational AI
[00:40:00] - Organizational Structure and No Titles Concept
[00:50:00] - The Future of AI Audio and Multimodal Models
[01:01:00] - Closing Thoughts and Future Excitements in AI
Full Script
# Tessl -
**Mati Staniszewski:** [00:00:00] What you would try to do. It's effectively a map set of characteristics of that voice in the model. So try to predict what's the gender of the voice, what's the potential age group of the voice, what's the emotionality of the voice, and you would hard code all those components. And then the model will try to predict those components and then recreate that voice based on those components.
**Simon Maple:** You're listening to the AI native dev brought to you by Tessl.
**Guy Podjarny:** Hello, everyone. Welcome back to the AI Native Dev. I'm really excited today to have with us Mati Staniszewski. I don't know if I got that right. Who is the CEO and co founder of ElevenLabs. Super excited. I know Mati from from the sort of the London scene for a little bit here and really happy to have you here.
Thanks for taking the time.
**Mati Staniszewski:** Guy, thanks for having me and a pleasure to be here.
**Guy Podjarny:** So if you don't know ElevenLabs first, I'm a little [00:01:00] bit surprised. Second is, I think really established itself quickly as the leader in AI audio right in generating anywhere from just get some text to audio.
We'll talk about that some more to much more elaborate and dubbing processes. Just a little bit of vital signs now have recently raised a series C a couple of 100 million bucks or 180, in the press release. And the stats are first of all, interesting in the choice of metrics, but also impressive that you've recorded, you have millions of users and they've generated a thousand years of audio content.
It's a fun stat on it. And clearly it's widely adopted by well over half of the Fortune 500 companies. So first of all. Amazing. Well done.
**Mati Staniszewski:** Thank you. No it's the right moment for these additional funds as well. I'm sure every company says that, but it feels like it's still at the very beginning of what's possible across audio space.
And the combination that we are trying to achieve both building research and the product work and the application side requires a lot of those resources to be able to do both at the same time. So building the models has high amount of GPUs and for that's actually needed to get, to get [00:02:00] that off the ground. And we still think there's like a number of innovations coming that we want to be at the forefront as we think about multimodal models, making it more interactive, more emotional, and then the application side, still so many products as we think about AI in space and bringing voice, dubbing content internationally or just producing the content at scale.
**Guy Podjarny:** Yeah. No. Amazing. And yeah, there's a lot to build. And I guess on top of that, the AI world moves at a lightning pace which I guess I didn't point out that the company is less than three years old, right?
**Mati Staniszewski:** That's right. We started at the beginning of 2022. So very both my co founder and I left our previous companies.
Piotr used to work at Google before I used to work at Palantir. And then we swapped over at the beginning of 2022 for the first few months, worked deeply to understand whether the problem that we want to solve actually exists out there. And second, whether we can actually achieve anything from the product research side.
And then after a few months, we we raised our first round as we started scaling the team and Infra too. bring a little bit more of those costs outside of our own pockets into a few others. And then started [00:03:00] hiring. So at the end of 2022 we brought a few first people in first three people.
And so it was team of five at the end of 2022. And then we publicly launched at the beginning of 2023. So the product is out there for two years. The company is out there for three years and the idea, I know my co founder for 15 years we've been best friends since since high school, took all the same classes then traveled, studied, worked together for the years and are still best friends.
Yeah, no, yeah,
**Guy Podjarny:** that's amazing. And that's, it's so important being a co founder is it's a marriage and it's hard to find them on the spot. It happens. It's sometimes successful, but it's much better when it's someone you know so well.
**Mati Staniszewski:** We are so lacking that perspective.
Like we in high school, we are the best friends through that, through the years. And through the years we always wanted to potentially find a problem that we are both excited about and can work on together for after. And then everything aligned, which, we had a good and comfortable set up of the previous companies.
Then we had the idea that we were excited about and we wanted to work together. So it was perfect and have a, a clear things we enjoy in our work together. So it's also a [00:04:00] clear split of responsibilities and some things that we like working together, which was some of those parts.
**Guy Podjarny:** So very lucky. Sounds like a very good beginning to it. So let's indeed maybe talk about the early days and maybe the sort of the technical breakthroughs, right? So I, I know when people like it's easy to find when you write, look for a write ups about ElevenLabs, origin story. How you were frustrated with the dubbing experience.
You and your co founder Piotr, right? Of Polish movies of it. I grew up in Israel. I'm familiar. Israel fortunately has embraced that nobody's gonna bother dubbing to Hebrew, not enough sort of Hebrew speakers. And therefore you see things in English, but dubbing in Polish was underwhelming.
And he wanted to fix that. But I think, that's a nice, succinct, easy thing to say. But from there to actually technically execute that, it's not one thing. And I know you, you had to break through a bunch of walls, maybe in those months that you were exploring the ideas.
I guess I'm curious, what were the sort of the core original innovations or breakthroughs or even just problem spaces you had [00:05:00] to do? to do to deliver on this, Hey, we want to do AI power to dubbing
**Mati Staniszewski:** And you are right. It's even worse than just poor dubbing, where if you watch a foreign movie in Polish, you're not only usually you would have dubbing with low quality, but you have one voice narrating all the characters.
So whether it's a male or female voice, you have one, one person narrating everyone. No emotions, no intonation. They on purpose keep it even flat so you can infer yourself the emotions on what's happening. And the crazy thing is, of course, we knew it when we grew up, but it's still happening for majority of content today.
And something that we know will change as the technology evolves. So maybe a quick background there. So my co founder worked a lot for the years, and he's a brilliant researcher. He was doing that at university, initially spending time on image space, then working Google on Knowledge Graph, so closer to a tech space, and he hasn't worked on audio.
But but a lot of the ideas that that we've seen in those other sister fields or other fields across AI were [00:06:00] potentially applicable in audio. And we know that this will be something that is possible and fixable. And on the other side from what I was saying is that there's no great product for the space to really do anything with audio AI space.
And then very clear problem set of people that we were outbounding to off, like would it actually help in your process when you started the company?
**Guy Podjarny:** What was the sort of state of the art for audio at the time, like three years ago? What would have been like the best product out there at the time?
**Mati Staniszewski:** There were some early indications for 2022 of some open source projects that were a bit better, but like most of the voice quality was what we were experiencing with Alexa or Siri.
Like something that you can immediately tell is that it's robotic. Exactly. Yeah.
But we, so we started. And, when we speak about this initial problem, we started actually with building dubbing solution. So in the early 22, the first prototype we've built was effectively a stitch up of few of those different steps.
So the first step is understanding the speech to text part, who is saying what and what they are saying. And already in that step, the [00:07:00] process isn't perfect because most of the models weren't able to detect different speakers or overlaps accurately. The speech to text was okay for English, but the element of when things are being said was hard.
And the second was hard, was the timestamps. of, of like exactly knowing and when is it being said? So that was the first speech to text part.
**Guy Podjarny:** It's more than just transcription, which maybe was a little bit more evolved, at least for English, but it's a lot more of that sort of, Exactly.
Segregation of the conversation and the timestamps of it.
**Mati Staniszewski:** Yeah, exactly. Exactly. And that's where we've seen already the gaps that we might need to address. So speech to speech to text the first one and maybe just to give a wider overview. Then you have the translation part of changing from one language to another.
Of course, in dubbing, you need to keep the length rightly the same, given you will be dubbing in the other language in audio, unless you extend the video. But that's something that most platforms wouldn't support. So there's a transition step. And then there's the last one, which is the text to speech stuff.
So speech to text, we knew that there are some faults. There were good open source models for the transcription side, like Nemo from NVIDIA, but but really the other parts where we're lacking [00:08:00] and where we spend a lot of time okay, can we build that additional speaker diarization to detect the speakers?
Can we build some better understanding of timestamp and how we chunk that together? The translation, of course, something that It's a massive problem that we know we won't be able to fix ourselves. So here we would rely on anything best that exists at the time.
**Guy Podjarny:** And this is three years ago is pre ChatGPT but its not pre LLM, GPT 2 is around. Were you already using those types of technology? Exactly.
**Mati Staniszewski:** Like for 2022, like you could see some of that momentum. So both GPT 2, there were like other just callable API's that we will try. And we'll try all of them. And the truth is like already, Dan, you had, 80 percent of the quality of the translation well, and then it's really dependent on the type of content.
If you have a voiceover documentary, translation was pretty good. The moment it started being like short sentences with a lot of emotions that when it would fall through, but we would use any external API or starting to use the models, which is also good for us because it gave [00:09:00] us the signal what's happening in the field.
And then the first step text to speech here I think in the very first prototypes, we just used whatever was best. There was the famous open source project for some people in the field called tortoise, which was incredible where you could, effectively, the first step change from those Siri, Alexa voices into something that sounds human.
The problem with that model was that it only performed well on the very short segments. So when you had a longer, it would deteriorate, it wouldn't be stable, you couldn't create that voice stability over time. So we started looking into like how we could potentially modify it to our own setup, using the Tortoise as a base.
But that was early 2022. And you're,
**Guy Podjarny:** You're my kind of layman view a little bit. Think about it. There's also the notion of capturing emotion, right? Transcription doesn't capture emotions doing it. So you'd have to somehow annotate that or maybe the Polish background. You started way, not expressing any emotions, like the numbers there, right?
Then is that that's correct.
**Mati Staniszewski:** So like the [00:10:00] main thing, like you want to capture in that process, of course, it's what is being said, how it's being said and who says it. And the how, hasn't been solved before. And the way we thought about this in those early 2022 days was, could we effectively take the original voice, clone or reproduce that?
If we do it in a short enough sample, In that short enough sample, there will be enough of the emotions that we can effectively take from the original. And then when we recreate that in the other language, that short sample will carry some of that emotionality across. But we've built this prototype.
It was like this end to end dubbing with those three steps. And it was okay. It was like, not perfect, especially like looking now at what's possible. And we knew that all those components aren't good enough to produce the dubbing solution. Also, as you think about dubbing, it's like combining those three tech, like three different technology stacks, which were underdeveloped at the time.
And if you combine them it's, and the sum of the parts wasn't greater of the parts. And at the same time, we were starting to ask a lot of the potential users, creators [00:11:00] would you be interested in dubbing trying to prototype based on our work.
And it was very clear quickly that the quality is slightly below where they would actually be interested in like pushing ahead at the time we could deliver. And then we took a step back and that's to your second part of the question, which was. To be able to solve this problem truly, instead of relying on the technologies that exist, what we really need to build is our own models.
So not rely on any of the ones that do exist for audio, but actually go a step, step deeper and try to create our own starting with text to speech model and voice cloning or voice creation model to fix that last step, which at the same time coincided with understanding the pain point of when we outbound it about dubbing was like, yes, dubbing is a problem out there in the future. But really, if I could just do the voiceovers more easily without my voice, if I could correct and pre process or post process, understand how the script can sound or change the corrections after I've done it, that would be so much more immensely helpful.
So we had those two very clear signals of there is this big other problem that is on the way [00:12:00] to solve what we. set out with, and then the research isn't good enough. So we actually need to start, take a step back and build our own models.
**Guy Podjarny:** And in those models, did you still break it up? Like when I think about building a company in the space and we do have kind of a lot of developer audience.
So if you think about how do you chunk, how do you modularize this problem, did it require thinking about dubbing as one thing in which you'll train some sort of, brilliant model to just take voice in one language and spit out correct kind of alternatives in another. Or do you, did you still find a kind of a need and value in breaking it up into these different stages that you talked about in training a model for each of them?
**Mati Staniszewski:** Definitely breaking it up and training each of them. It's a, it's an interesting question because this is very true for our work over last three years, like breaking it up and working on a separate models. But interestingly, as we think about the future, We actually think it will be more like a joint one model that does all those actions.
And we can talk about that too. But but in that space, it was like, let's break down the model and let's focus on where [00:13:00] we see the biggest value that we can create, both in terms of the innovation, the delta to the space and then in terms of value to the user. So the first one was specifically text to speech model.
Can we create a model that understands the text better than the models that exist? And based on that understanding delivers better emotions and intonation. So to make it like more specific on an example, if you say, what a wonderful day, the model should infer that, okay, this is likely a positive sentence and pronounce that in that positive way.
Now, if I said, what a wonderful day, and it's, let's say a fragment of the book, what a wonderful day I said, ironically. The model should know that and pronounce that very differently. And then if you have like a dialogue sentence, the model should once again know this and be able to reflect this dialogue between two, two interactions.
So we knew that the first, and that was like the first innovation we brought into the space was this much better context understanding of of the text itself to bring that now into speech. So that was in the text to speech [00:14:00] model. But
**Guy Podjarny:** it is still not there was no product, or I don't remember a stage with ElevenLabs where you were offering a sort of like the transcription piece of it. This is still the text to speech piece. So you're still reliant on existing on existing models in the first phase.
**Mati Staniszewski:** So yeah, that's, yeah, that's a great point. So here after we realized how dubbing is like still complex problem where the quality isn't good enough, but at the same time realize In that text to speech space or narration space, there's such a much bigger opportunity.
Let's take a step back. Let's not do dubbing.
Initially, let's still keep it as our North Star in terms of technical problem. But initially, let's start on the thing that we know we can solve. So we took a step back, put dubbing on the side. Let's start with text to speech and then we'll get to dubbing and the company as we develop.
It makes sense.
**Guy Podjarny:** Big vision, small steps. Like you have to, it's not a very small step but you still have to aspire to that big technical problem. But there's real value in just the text to voice piece.
**Mati Staniszewski:** [00:15:00] Exactly. And then, like we knew that even if we fix one of those components best in the, hopefully in the world, then we knew that other components are also improving based on what we see in the field.
So there is this great help of the field just in generally doing better where at some point translation will hopefully be fixed to the extent we can adopt it instead of building something ourselves, which would be such a different and hard problem. So text to speech was first where we've done that innovation.
And that was like the first product around that. And the second was how you can create or clone voices, how you can create synthetic voices or clone existing voices. And here also, that was like a second big innovation that we brought where traditionally when you try to recreate a voice, what you would try to do is effectively map set of characteristics of that voice in the model.
So try to predict what's the gender of the voice, what's the potential age group of the voice, what's the emotionality of the voice. And you would hard code all those components. And then the model would try to predict [00:16:00] those components and then recreate that voice based on those components. We took a slightly different approach where instead of us hard coding those features that we didn't felt represented the voices in the right way.
Let model decide what those components should be. So keep it abstract to the human and model will effectively pick its own vector numbers for each of those characteristics. And then we found a way where we can have a lot more of them than usually any model would do and reconstruct that in a better way.
**Guy Podjarny:** Which is very very aligned with in general kind of the transformer and LLM world type models in which instead of trying to encode it across, a thousand or 2000 or 10,000 different properties, now you can have millions and billions and the world of exactly how many parameters, if that's the right term to use, did you end up creating for that?
**Mati Staniszewski:** The initial models were pretty small. It was like, I think the first model was a hundred million.
**Guy Podjarny:** Yeah, so it's pretty small for a model, but there's no way you could have defined a hundred million different sort of distinct attributes of a piece of voice of the
**Mati Staniszewski:** [00:17:00] No, It was 100%.
It's, both the transformer innovation and the diffusion work was where a lot of the ideas and the initial basis for that, that models that we've created came from. So both like great thing which is exactly what hasn't been happening in the audio space. Most of those didn't yet bring transformer or diffusion models into the space and where we could leapfrog a little bit of that space and start with text to speech and voices.
**Guy Podjarny:** Exciting. So you had to combine I don't know if it's really two things, but I think if I go back the journey, you started with a sort of big aspiration of dubbing. You broke it down into these different pieces of it as you should any problem. You started with the state of the art in the moment got to a proof of concept, shall we say, right?
Or sort of some offering. But then customer conversations got you to appreciate that actually this middle piece there of text to speech is actually very valuable in its own right. And so you can really focus on making that amazing. And you identify the pieces that are maybe don't have to be the things that you excel at, like translation where you can say, okay, I can rely on the ecosystem for [00:18:00] that.
And then you just set about making the text to speech piece be incredible and ahead of the competition and leaned into that.
**Mati Staniszewski:** Yeah, exactly. And it's it's good timing that this conversation is happening today. And like through, for, of course, for the years, we then adopted a lot of those components from the field into the models, but the speech to text, interestingly, while the transcription is incredible with what Whisper did, Gemini and few other solutions it still is relatively poor on those components that we're lacking, like speaker diarization, timestamps are still not there. And we are as we speak today or over the next few days we'll be releasing our own speech to text model, which will solve all those components that traditionally weren't good enough. So it feels like all those components are finally three years now on getting into that place where technically are solvable.
**Guy Podjarny:** Amazing. First of all, congrats, and very excited to get my hands on it and try it out. I think we've had an exchange as I was building an app with my son.
Creating sort of small [00:19:00] YouTube shorts, from some facts and things like that. As he was playing with that at the time, we're talking about getting the timings and the subtitles and the understandings of it at the time. So I'm very much keen to that play.
Did you end up applying it? We ended up creating, it's it's like specific, very simple text that calls out facts in, it's some YouTube shorts, so yeah, there's actually like a channel for it right now, and we get some sort of Okay. Amazing. Some tens of thousands of views of it. And I think this is I think you were just a few months into the like the product was out for a few months when I was okay. User. So very excited, but keen to do that. And also I love the expansion element of
**Mati Staniszewski:** the,
are you still a user? Do you know?
**Guy Podjarny:** I am still a user. Yeah. Okay. I used it.
**Mati Staniszewski:** So if you check your inbox, we've done this thing as part of series C where we wanted to figure out how like. Like we wouldn't exist without our users and our community, which really propelled this set of possibilities, both to tell the world what's possible, but tell us what's possible too.
So we did give everyone a small all the early users, I think the first thousand users, an early gift of appreciation where they will get. Same swag that we have as a team. So I think you should check your email.
**Guy Podjarny:** I will check my email. I don't know if I made the cut to the first [00:20:00] thousand.
Maybe. I don't know. I don't know what to count.
**Mati Staniszewski:** And they need to be active still. So it's maybe it's,
**Guy Podjarny:** I am still a user, including some some workshops for the kids. So I love the, I just want to get a little bit into sort of the product side. So I love the journey and I love the, both the sort of the growth aspect of it as well. Focusing on it. I think there's a lot of like entrepreneurial learnings from it. That build out. I do like I want to talk a little bit about sort of the way you package the product and the developer experience. But I just want to hone in a little bit on. Just like when you got started, you had to innovate and do these big things indeed on like big research breakthroughs.
But now you have a product, you have millions of users and you probably have this tension between how much do you swing for the fences and try to think about some brand new thing that is amazing. Versus where is it that you probably have a million. incremental things on how, in Polish, these words don't get pronounced correctly or whatever it is, right?
Making it up. How do you think about balancing the more incremental aspects of research versus trying to reinvent the foundation? Because it is [00:21:00] a fast moving world. You probably can't do zero in the latter bucket.
**Mati Staniszewski:** Yeah. The, I think the ease, there's an easy answer here and a harder truth around it, but an easy answer.
We still think you just not do innovation on new models or to make it more concrete. We actually think that there will be one or two new model breakthroughs that are required in the audio space to really get it to the another level. If you were to rely on current infrastructure and just improve some of the pieces, it's it's not going to solve the problems you want to solve, especially as we think about agentic workflows where you need to add a voice and like interactions and customer support call centers or education where you like learn something that's going to require a very different type of experience.
So the thing will happen over this year, and hopefully you will be one of the first to do it well is going to be multi model approach, which effectively combines the LLMs, the reasoning side, and and the audio state of the art. Here, what we would do is effectively use our state of the art models, use the open source LLMs, fuse them together.
Fusing [00:22:00] isn't an easy task. And then create a multi model experience that is effectively. people will refer to as voice to voice. You don't go through those steps in the middle, which should help for everything from understanding the emotion better to doing dubbing better to doing that interactive experiences better.
And this is so important that if we didn't do it, what do you think would happen is that we would create an okay company that will be there for the next, five years do set of use cases, but not company that we would love to create, which is like still leading on the research side.
Hopefully it's here for a few more decades. And it's leading from that research perspective in terms of what's possible. Now so that's the easy answer that like you need to. And that's where we actually took an active choice to like overweight towards doing the innovation and the models and potentially sacrifice some of our lead with the current generation of models because of how much we believe that the new generation will be important.
**Guy Podjarny:** Although that is very use case specific voice to voice. I can intuitively see how that would be beneficial. It feels like [00:23:00] any intermediary representation, like getting it down to text, probably loses so much information.
You have to be able to to get it end to end, but also we just spoke about how text to speech is also on its own very valuable. So I guess just ask is, maybe the the software kind of geek in me, might say is there a format in which you can represent something as an intermediate so that people can create text like that?
Does it have to be? The sort of the black box brain that just gets trained. And the sort of all knowing I am being a little bit facetious here. Does it? Or can you represent something in the middle?
**Mati Staniszewski:** It's a tricky question, but it's like still, even with that fused approach you will definitely have the same components.
So even if the backbone is a model that has combination of audio and LLMs, you can still effectively create like a fine tuned version, which is text to speech model. And it's better than the existing ones. And maybe to make it like in a specific example, let's take a narration of a book. Today, you give a text, it understands somewhat [00:24:00] the context and can deliver the audio pretty good to the human level quality in some cases of narration.
But, What it couldn't do is, for example, detect the different speakers in that book and assign different voices or create sounds. And that's what we think the multi modal approach will bring, where if you listen to the book, it will be like more like a movie or immersive experience where all the voices are created and assigned that you can have that that different set of narration or narrators.
When you hear a scene about say, thunderstorm, you hear raindrops in the background. So there's a much bigger brain behind that model. So yes, you will definitely have those separate models with higher brain power than what was possible before, but your point earlier on is spot on, which is it will take a while before those models are stable. They're as good of their quality in audio. And as you think about a lot of our big set of clients on the enterprise side, we work for 500 different companies of a very bigger size. It [00:25:00] will likely not be at the level of quality they require for the next year, maybe next two years.
And that's to your second and the harder part of this question is initially we overweighted to almost no, we need to do all on the research and Innovate on the next model, right? Let's not do anything on the current. And then we realized after pretty quickly, but after a few months, maybe a bit slowly, want some good customers?
Yeah, you cannot just abandon a lot of the work that's required to make that one or 2 percent differences to get a little bit better. So now we fuse. We have a research, core research team builds the new models, and then you have research engineering team that tries to improve the current generation of models, like how we can really cut there to close that gap and make sure that the it's where it's 95 percent today and we can get it to 96 or recently one of the things we've done was we we had a model which was text to speech model, which was great for interactive use cases had like roughly 200 milliseconds of the time to first byte. And then another model got created that from the competitor that was [00:26:00] roughly 100 milliseconds. And it felt okay, what, how much time do we invest? And and still wanted to show to our customers that yes, we are the leading lab. So our research engineering team brought it back to 75 milliseconds.
We have a great researchers, Max and Tony that pushed it across And now we have the fastest model.
**Guy Podjarny:** And that's interesting. I'm going to go back a little bit to organization later. But but you describe these not just as a percentage investment, but actually literally different teams.
So you have, one team, there might be multiple under that, but one part of the org focusing on, the more sort of a big swings try to reinvent what is the status quo. And then another that is more on optimization. And is it a different, what would be the difference between the profiles of the individuals you'd have in one team versus the other? Is it just allocation or is it actual different skills?
**Mati Staniszewski:** There's some, definitely there's some overlap where you could imagine one person being closer to the other, but roughly. research team is extremely small and here these are mostly the people that we believe can create net new ideas, net new work, [00:27:00] make new breakthroughs for the world.
So they could create a paper that we believe could be one of the top papers on Europe's ICML or any of the other conferences. So they need to be really well versed in the space and I think we've been able to accomplish I've been with my co founder who I think is a genius level and pushing all of the research and one of the best researchers that we have was able to bring a lot of the other researchers that love working in that space and are at that level.
So few world class researchers that are the best in the world. And then research engineers are usually more people that we think you could give them any paper. And they could implement that paper. There would be a bit harder for them to create those new net new ideas or create the papers themselves.
But if you give them any piece of work, they could likely replicate it and make it slightly better or tweak it. And for research engineers to also make it a little bit more concretely on how we frequently staff those people, researchers, we just get from wherever we see the piece of achievement on open source or a paper or for a recommendation, [00:28:00] these are like the top people.
We think there's maybe hundreds in the world that the pool of candidates. But research engineers, we usually. Start with people that join a software engineers, maybe with some ML experience, and then if they are proving themselves in a core product, we can bring them into research engineering space.
**Guy Podjarny:** Got it.
Yeah, very interesting on it. And numbers wise, probably that first team, the research team, extremely selective. And so the skills that you bring into that but probably numbers wise, a lot smaller than the research engineering.
**Mati Staniszewski:** Both are still pretty small. Impressive what they can do, but it's
**Guy Podjarny:** a it's a part of the funding, part of getting the funding.
**Mati Staniszewski:** We have five researchers and five to 10 research engineers. Oh yeah.
**Guy Podjarny:** Very tiny. Yeah.
**Mati Staniszewski:** Yeah.
**Guy Podjarny:** Amazing. How much how much output they
**Mati Staniszewski:** A hundred percent. Yeah, a hundred percent.
**Guy Podjarny:** So let's talk a little bit about by way
**Mati Staniszewski:** you've been on IBM, right?
**Guy Podjarny:** I sadly , I don't know, sadly. I really actually, when I was at IBM, which I got acquired into with my company, the IBM research was actually quite interesting to me.
But I do not have a degree. And so the one primary thing that was well out of my reach was IBM research.
**Mati Staniszewski:** Got it. Got it. I've [00:29:00] recently read about this concept that I did. Now, of course, this applies across the organization and some of the ideas we try to steal. But I think they had, it was like one, which when IBM's become this big behemoth, like, how do you keep that innovation going?
And they created this concept of wild ducks. So you had a wild ducks where eight people selected and reporting directly to the founders which just got a free mandate on just going after any idea that they thought will push the business forward. And sometimes I feel about our research is it's kept a little bit separate so they can really innovate. So they can do something well.
**Guy Podjarny:** Yeah. Yeah. That's interesting, like I've not heard of it when I was at IBM. And who knows, maybe I would have stayed at IBM a little bit longer if I get to be part of those. Maybe switching a little bit back more to interacting with your platform. And I want to talk about developer experience and APIs. I just mentioned how I, I've used your APIs on it. It's a good, pleasant experience, using those. And I was very much using the modular pieces, like literally, here's a piece of text, give me give you back some audio.
Here's more parameters, more sort of tweaks, not AI parameters. And and I [00:30:00] think for a while that was your only API type, you're building blocks, a platform, a model. And I think we'll talk in general about the evolution towards app. But I think on the API front you now, if you open up the ElevenLabs API page, you'll see conversational AI.
You'd see pieces that are more end to end of sorts, like they provide a more complete functionality. So I guess my question is one is just I'm interested a little bit on the journey from one to the other. What made you do it? And maybe even when you think about someone building on top of the platform, what are you seeking or optimizing for?
Is it more about the full unit? Is it more about the pieces? How do you think about it?
**Mati Staniszewski:** That's a big question. And definitely our approach evolved over, over time or expanded over time. But
**Guy Podjarny:** Let's start with the story. Just like how did this conversational AI kind of piece came to be and how do you think about it?
**Mati Staniszewski:** So starting from there, it's as you've said, initially we had text to speech voices and we didn't have much more of the other components in the conversational stack, but what we've seen happen is both developers and our enterprise [00:31:00] clients effectively what they were doing is using that component and building a conversational experience.
And now it's not a simple thing to do. It takes usually months of implementation time to do it well. If you, like we're speaking now, you will adjust your emotions. You want to make it relatively smooth and quick when you respond to each other. Maybe you want to interrupt each other and then get responsive responses back.
**Guy Podjarny:** I wouldn't interrupt you in the middle. Sorry, can't do that.
**Mati Staniszewski:** But this is exactly what we've seen being tried to solve across across so many of. of the clients. And it wasn't easy. Everybody had a slightly different approach. It was not always great experience for the end user. And and it was a complex problem, which was like great.
One thing we know we should and want to provide is help provide a end to end solution. So people don't have to build from scratch, but something that we can bring to the table. And then two, as we built so closely to the research pieces, we can actually optimize it even better for how to optimize for turn taking or interruptions to to make it smoother because we can embed a few predictors on [00:32:00] both sides of like, when will the speech when I will stop speaking or will the content will go to the pause.
And and that's how kind of the element came to be where we decided, let's meet our customers where they are. Let's try to build solutions that are actually solving their entire problem versus part of their problem. And in the conversational AI was one of the big steps in doing that.
**Guy Podjarny:** Yeah, but conversational sorry to interrupt.
I'm gonna be self conscious every time I interrupt you, talk about interruptions as you as you do the work to allow your customers to handle interruptions more often, like, why pick that right? You probably have, you have these millions of users. You have conversational AI as one, a common use case, but there are probably many others.
And I don't know exact prevalence, but I don't even know if conversational AI is any sort of like majority of the use cases. It's probably some single digits, maybe sub 20 percent usage.
**Mati Staniszewski:** No, so it's low double digits. No, maybe not low, medium double digits percent. But the interesting thing is that like in general, we think voice will be the future of interactions, the digital interactions of how [00:33:00] you interact with interfaces of of the digital world.
And it can carry so much more emotion, so much understanding than text. And the plethora of use cases is wide and happy to speak about some of those too, but maybe to give two quick examples, like from healthcare space where you can work with nurses, we work with company like Hippocratic, which will try to automate the process where nurses call patients to remind them about taking medicine or asking about how they are feeling. People that frequently cannot use computer at the same level of proficiency. Such a great use case. So much of potential of reaching the people and then a completely different side company like ThisGen which helps train 911 responders so they can speak with the potential person, understand how to respond in that emotional situation.
And similar massive scale, then customer support, call centers, new ways of digital media. So we see it. Even if today it's a little bit smaller, although it's starting to get some momentum in the next five years, it will be one of the key models. And then the second, it [00:34:00] was definitely one of the places where we knew we can provide that that solution.
So conversational AI is like probably one of the categories which we believe will be one of the biggest. The flip side of that question is going to be more interesting of we've probably seen so many different companies trying to build agentic workflows. It's clearly becoming a buzzword.
So there will be an interesting competition of competing for talent, competing for attention that will be harder. But going back to your, to your other part of the question, as we think about our work, we have on one side creators that use the platform and the entirety of our audio tools across effectively the interface.
Then we have a big group of developers that will use and deploy the API or try new use cases. And from the start, one thing we are trying to do was bring the interface and the API roughly at the same time. So people can easily build with that, can show that what's possible to the world. And then of course, third group the big one is enterprises where you do need end to end solutions.
And that's where like our focus is, [00:35:00] is being built. So at the core, we are providing horizontal layer for everything. Audio AI, whether it's text to speech, voices, text to sound effects in the future speech to text and then for our business parts, we are hoping and are building end to end solutions for specific things that we believe are important conversational AI is one for specific sectors.
And the second one is media entertainment where we provide the dubbing solution for end to end work.
**Guy Podjarny:** Yeah, interesting. As the complete path. And I guess, can you share today indeed, you can use ElevenLabs through the through the UI and create a bunch of of audio pieces.
And, I've done the the amateur version of it, but I'm sure there's full on business use cases for it and you can do the developers. What would you say is the rough split today in terms of platform usage? Is it is it mostly, are you mostly a platform or mostly an offering today, like mostly a product.
**Mati Staniszewski:** Yeah, the probably the most frustrating answer, which is it depends on the user group, given we have so many but starting with like high level numbers, it's mostly platform for self serve subscription users. So like the [00:36:00] creators of the space where we have millions of people coming for our platform are going to be using the platform as the interface of combining those components.
And then as you shift. Further up to, to either small, medium businesses or enterprises, it will be majority of API users. Now within those groups, there are like very clear distinctions that we will see. So for example, gaming companies that we work with will mostly use the API. Now any media entertainment company that's creating ads, voiceovers dubs them will use the UI.
Conversational AI, this is an interesting one because conversational AI is at the end of the day. You configure things in the interface, but you're an API user, like you need to configure that in your application connected to your knowledge base.
**Guy Podjarny:** It is a solution though, it is not, it's not a building block.
It is, you might need to build on top of it, but it's done the lion's share of your conversational AI functionality,
**Mati Staniszewski:** 100 percent 100 percent but then you do need that link at the end of connecting to your CRM or knowledge bases [00:37:00] to go through for the API pieces. But it is like end to end solution where you can configure all those components for the platform.
And what are the set of constraints that you need to put in place? How do you do the end tooling? What are some of the security parameters you want to set? Voices, latency configuration. So there's a pretty wide set that that we provide, which is like common across, across the problem set that, that we see.
And of course open so much and incredible use cases too.
**Guy Podjarny:** And I guess maybe this is a slightly tricky question, but there are a lot of AI applications today that are being built, all sorts of startups, innovative teams and if they need an audio component of it, they might go to ElevenLabs as their solution for it.
And oftentimes for companies building like that, it's, there's a bit of a scary relationship with the foundation models, including, maybe a little bit less scary in audio. I don't know, maybe just less discussed. But there's a bit of a concern. This is okay, if I'm building something that is audio related will ElevenLabs or I'm just using you as an example of any of the foundation models expand [00:38:00] to cover this space and how do I how do I kind of balance or manage that risk?
Do you have any guidance? People must ask you that question of, Hey, I'm building this on top of your platform. Are you gonna expand into my space? How, what type of answer do you give them?
**Mati Staniszewski:** The so it's interesting because we also have seen that the opposite of that problem and I will answer your question too, but we've seen exactly the opposite in the approach we've taken, which probably some of the hardest moments in the, in 2023 at the end of 2023 were due to that where we kept our APIs extremely open and we wanted to solve set of use cases and we've seen people solve the use cases that we wanted to solve before us and take all the take all the users and market opportunity.
Exactly. The closest one, and they're probably like one of the moments, which was emotionally frustrating was with dubbing where we started with dubbing, we created text to speech, we created all those components. And then in 2023 we we were offering all those APIs openly, and we were planning to launch our own new dubbing solution or first like true dubbing package solution with our [00:39:00] APIs in the product and like roughly Q3 of 2023.
We told it to one of our clients, maybe it was unrelated and this client released their own dubbing two weeks before us. And took all the users that we wanted, took our vision away from us, which was like, okay not happy. Especially how close this was in our origin.
Of course, we doubled down and are continuing to work on that. But the guidance in general for us is like. We will keep the horizontal solution out there. We are really focusing on very small subset of the work. The one is within conversational AI space, how we can create more immersive media experiences, more interactive media experiences, how we can work in call centers and build that, that audio element there and potentially potentially some of the close to call center elements, but not entirely a customer support.
And then the rest, we are a partner we want to effectively wherever the research allows us to provide you an advantage from the product perspective. That's where we'll go, the rest [00:40:00] we'll keep you open and on the media entertainment side that the thing that we started with building like an entire process for creating.
Dubbing is going to be important and maybe to take a, another element, what we want to do is focus on on video, on focus on text manipulation to large extent. So any of the use cases building with that have the natural advantage. And similarly, what we've seen is that if you work with both of the companies or users probably know this better than I do.
It's like there's so much end integration that will need to happen for most most of use cases apart from the ones mentioned. We are not planning to cover and that's where startups or companies building on top of ElevenLabs go. Yeah, that's good.
**Guy Podjarny:** I love the clarity. I think probably when you're building on top of a platform, there's no guarantees in life and company strategies change and opportunities arise.
But fundamentally you want to the challenge I think that happens in the foundation world is more when they're just, there isn't a declarative statement of, look, these are the domains. If you're going to be in these domains and you're building on ElevenLabs, you might get yourself in [00:41:00] trouble.
And so you can do that, you might be competing on the platform, with the platform that you are building on top of.
**Mati Staniszewski:** A hundred percent. Although we saw interesting parts, sorry to interrupt you but we saw the same. There is a, we and now we. It's something that we think about too.
It's like we want to allow so many companies and individuals to build. How do we make sure that they are they feel that transparency too. And when we were entering conversational space for some of the companies that were closer to that we let them know before that this is something that we are happening.
We had a conversation and an interesting thing happened where then for most of them, They effectively saw this as a potential advantage where they can now provide a conversational components inside their platform and focus on the additional parts during the end customer with the specific integration for healthcare and how do you connect to the systems on the hospital.
So that was like an interesting. When the ocean rises, all the boats rise too.
**Guy Podjarny:** Yeah. You want to know which parts, I guess one aspect of it is where is it that you might compete with me? And the other aspect is how do I know that you will continue [00:42:00] investing in this space? And so the declaration helps on both fronts.
So I think maybe just continuing a little bit on this sort of app and product path and we'll talk a little bit about org. I guess if you cast your eyes five years out. And when you think about where is the value that you are providing, how does the sort of the company look, I think today I would characterize you as still like predominantly a core model you are able to do this text to speech now, speech to text, better than than anyone.
And you're, you're building that out and people build on top of you and you have those solutions. You, first of all do you agree with that characterization today? And when you cast your eyes three, five years out, which is more than double the company's lifespan. Do you, do you think that shifts?
Do you think the foundation model gave you like a right to play and and becomes a critical feature, but no longer a defining feature?
**Mati Staniszewski:** Yeah, the world will be interesting in the next two years. So on the characterization, I like, what we started the company with and still so much believe this is that you need both.
You [00:43:00] need to innovate. If you are to create like a company of the scale we want, you need to do research side and you need to do the product side. And I think we've been able to build some of the major innovations on the research side. We haven't yet built some of the major innovations on the product side.
Although I think the conversational AI solution is the first one that brings some of that to reality, where you can really have the full package workflow and the dubbing is similarly that product aspect. But I think it's still early in terms of the adoption of that product as we think about the world.
So maybe a spin on that first characterization pieces. I think from company perspective, we've built incredible research. We've built great set of products in terms of the adoption. The research is already heavily adopted as we think about our work. The product is just getting started of where we all will go.
Now to the second question. As you think about the five years from now I think the things we would love to to continue being characterized that is that we were able to push that envelope of the breakthrough research forward, the multi modal AI [00:44:00] audio experiences and really being the ones that fixed speech as a category.
Audio intelligence is a category where you can understand who was saying what you can recreate that to the human or better performance and whether that's the interactive experience or a multilingual experience, we can do that across all languages. So we want to be like known as one of the research hubs.
If you want to join, this is the place to go. But the second one is have the full fledged audio AI platform where regardless of the workflow or use case you want to build on top you have the components. And the solutions ready for that and I hope it's going to be much quicker than next five years.
We think it's going to be done next year or two is exactly like we spoke about a bit earlier. It's have those few cases where we go super deep. Let's take dubbing as an example, it's we want to be able for you to create a dub on our platform. We want to connect it to the live streaming events.
So you could dub it on a live with the work. So like next time, so we worked with Lex Friedman, for example, who dubs his conversation when he was with [00:45:00] president Volodymyr Zelenskyy. Of course we dubbed that in the process afterwards, but what if in the future you could have that conversation between them where they speak original languages?
**Guy Podjarny:** Yeah, and that is generally like a holy grail of be able to communicate cross languages live in real time.
**Mati Staniszewski:** 100 percent like the cultural location of bias is incredible. And then the like, how can we help in the conversational space go the deep solution for call centers where if you are building that, that application like recently as part of Series C we brought in Deutsche Telecom on one of the backers.
How would that look in the future where you are taking those calls at scale where it's not like it is today where you're trying to call calls any customer support usually are waiting for a long time. You're frustrated. They don't understand you. They don't have the right details. And and swapping that immediate.
Great experience on the fly.
**Guy Podjarny:** And that's a good example where you'd need to think, do you go all the way to also knowing what it is that you're supposed to say? So understanding the support cases and the needs. Or are you the enabler of the conversation aspect of it? But. [00:46:00] Someone else comes along, you plug in with whatever solution of choice it is that sort of provides the the actual knowledge base.
Interesting. And I, so I take from that you say, okay, we still think that the fundamentally it would be a defining feature that we're at the cutting edge, but we will also continue to invest in making it more accessible and aligned for the specific use cases that we focus on.
**Mati Staniszewski:** 100%.
**Guy Podjarny:** And you think like when you envision the sort of this world of foundation models and everybody's, throwing multiple billions and many kind of resources at it of all types. Do you still envision the audio focused, even with the multi model as an audio to audio the different definitions, maybe then maybe some others would have come to mind when they say multi model.
You're, how do you think about that as a discrete slice of the foundational model world versus the all knowing brain, maybe that some sort of AGI, ASI type path that we're trying to get to that the OpenAIs and the likes are trying to reach.
**Mati Staniszewski:** It's a complex one. We do, at the core, we [00:47:00] know and believe that the set of use cases where having the highest audio quality and the highest audio controllability is going to continue to, to both grow and be the slice that we want to solve.
And there's so many other companies that of course are building, but they usually falter in terms of the specific quality or building the solution for a specific use case. So almost the, as we think about. Those two components is like research will allow you a set of breakthroughs.
They're still the ones optimized for audio will be where we can build that advantage with the specific data sets that we create and how we create them and set of breakthroughs. But then the second part will be increasingly important in the next five years. It will be like, how do you provide that solution, that platform element to use the research at scale and in a good way.
Now, what is interesting, like the LLM space now is so much about scale and getting more data, getting more compute. In audio space, we still think it's a little bit earlier in terms [00:48:00] of the breakthroughs that you need to create to to compress or decompress audio. So that's a good advantage for us where we can focus the research resources.
And now it will be complex and that's the part which will be interesting is today we'll have growing size of our audio models where we see those breakthroughs. And we, as you think about the multimodal experience, we'll use the LLMs that are out there and fuse them in.
**Guy Podjarny:** Yeah,
**Mati Staniszewski:** what would happen if the open source stop?
That would be a potential problem where we don't have the access to the LLMs that are state of the art in that same way. But of course, you have the, you like. In audio, I think you are still earlier than LLMs in terms of diminishing returns in terms of use cases that are possible. So hopefully by the time potentially companies stop doing the open source, the, if they ever stop it, that's unlikely.
**Guy Podjarny:** That's a different sort of conversation.
**Mati Staniszewski:** Exactly. But that would I think for what we can bring to the table, it's audio expertise that, that none of those companies ever focus on and playing that with LLMs will help. And there's so many components in their research work.
So we're talking about breakthroughs, but even we [00:49:00] now have a team a big team of data labelers that have voice experience. We have voice coaches that tell them how to understand audio, how to annotate the emotions in the right way. So we can then use that.
**Guy Podjarny:** I think that's the virtual cycle is around, you get the product, you get more users, but that in turn gets you more data, which in turn allows you to do it.
And I think the more specific the domain is. The more you can accumulate data, that is unique. So very cool. I would love to see this sort of the innovations on it and I have all sorts of like deep thoughts about the open weights model, even open sources like a little, I guess some of them in audio, some of them are actually truly open source, but a lot of them anyways, that's its own rattle here.
Yeah, they are usually pretty, pretty bad conversations. And also the, yeah, and the strategic kind of aspect of it. One of the key distinctions making them not to rattle too much here, but key distinctions making them my beefs with them is that, they're not really open. Even the open weights, they're more freeware.
But because in part you can't, the community can't fork it and continue it. So if the company behind them chooses not to not to continue releasing, but some of them [00:50:00] indeed in the audio space are truly open source.
**Mati Staniszewski:** True. The audio ones are, there are some recent good ones, especially on the architecture that, that we've seen that we created in 2024, there is like now good examples, but, and they are truly open. But there's always that lag, but, and the lag is an interesting one, even as we think about our approach, it's like, what is the lag, which you can still build the advantage on? And we think we can, especially because we are at the forefront of audio and our goal is to roughly be a year in front of anything that's possible out there. So to make, to continue making this true, we need to release our model in this quarter, the new model, which I think is possible. That's the kind of the guiding light for the research work. But it'll be interesting. Five years is a tricky question in general.
**Guy Podjarny:** Yeah, it's hard. It's in this world. It's in this space. It's hard to imagine it. Hey, time flies. And it was like, I've got a million more sort of subsequent questions, but I want to spend maybe the last sort of a few minutes here talking a little bit about organization. And you already described the sort of focused [00:51:00] research organization on it, the tiny and high impact team and the product. You've published a controversial decision that you've that you've managed just unusual on it in which you issued titles. You have no titles in at ElevenLabs yeah, I guess just open up like, can you tell us a bit more about what it is that you decided to do there and what was what drove it?
**Mati Staniszewski:** So we are now a team of over 100, soon approaching the magic number of 150, but this behind the scenes, really it's more like a team of 20 teams of five or like 10 teams of five to 10 depending on the specific period of time. And that's what we try to do. It's like extremely small, high degree of ownership, height of degree of independence, and they can just run, which helps us move with building on one side, of course, the research, but a very specific solutions that go deep, very specific markets where we try to get through the go to market part. And that responsibility is there that high degree of both impact and ownership is there now because it's and that's how we think [00:52:00] about effectively nurturing the flat organization where almost everybody is an IC like you are expected to do a lot of that IC IC work and where we are today is down to ICs pushing the barrier of what's possible.
**Guy Podjarny:** So for ICs being individual contributors, if anybody's off the acronym here.
**Mati Staniszewski:** Sorry. Yes, that's right. And the thing that we wanted to create is an environment where people feel that when they join this organization, especially at this stage where things are moving so quickly they can have, any size of impact in any amount of short period of time where they join, they can have that impact and the title shouldn't be a limiting piece in terms of how much or how you are, what's the perspective on you from the others in the organization.
So that was the first one, like impact shouldn't be defined by the title. It should be defined by individuals. No titles helps that you can get anywhere in the organization and and bring those ideas forward. Second there's this concept of the. And I this cost some questions in social media, but we are trying to keep the best idea wins where you have new people, they [00:53:00] are expected.
And also the we know there are great people that can contribute on day one. And we want to keep that that, that idea that if you join others are there that you can that you can compete on those ideas. Of course, If there is like a really no, no decision making, we have team leads for those each five teams, five people teams that will make that end decision.
**Guy Podjarny:** And then there are identified leads even though they are,
**Mati Staniszewski:** yeah, they don't have a
**Guy Podjarny:** title, but everybody knows who's the sort of the lead.
**Mati Staniszewski:** Exactly. And the structure that we have is as you think about those teams, that structure is heavily evolving still. So we wanted to keep a little bit of the, of the dynamism of that higher level of like how we are thinking about restructuring. And that is all to say that as we think about the future those structures will solidify in some ways, and maybe there will be a good point to introduce those titles that give that that, that element of additional recognition and understanding for new joiners of who is where and who is doing what, but at this stage impact matters. Ownership [00:54:00] matters. The best idea wins. And and it is a distraction for climbing a ladder that, that we don't want to nurture where now you are building for the entire company, not building for a specific component. You probably heard about all those Okay ways of like how the frequently when the companies build themselves, they have a team and the individuals built for their team rather than for the company. And of course those should be aligned, but not always are. So that, that hopefully helps avoid this. But then on the other side which is the tricky thing of of your question of like on how that's on some of the thinking is that you do want to keep the some form of recognition and we are now investing in different ways of recognizing the individuals by being much more open in terms of our releases of the work we do on who created that release, giving them credit and giving that element out there.
And then, there's always this interesting thing of how the titles can inflate the the inflation of titles in the organization.
**Guy Podjarny:** There are many problems that might exist with the titles, but I think like behind the scenes not everybody gets paid the [00:55:00] same. And there's another of seniority, whether within ElevenLabs or past experience on it that you would put a different type of sort of expectations, right? If you're paying someone 100 grand, you'd have different expectations than someone you're paying 200 grands from them. So you want, if the person getting the lower salary there has an amazing idea.
You want to allow them to unleash it and drive it but at the same time you're not really looking at them the same as an employer or as a boss I guess how do you think about because the risk with this type of change which I absolutely see the appeal of it. Although I am I have doubts around its ability to scale and so maybe it's also right for now and not right for later but I think the risk really is that there's still seniority.
It's just hidden, instead of putting it on people's titles. I guess how do I assume I'm not wrong about the salary comment. And so how do you. control for a kind of hidden seniority versus explicit one.
**Mati Staniszewski:** Yeah. And you mentioned also like a line which I will [00:56:00] echo, which is like right for now maybe not right for later.
And that's exactly how we approach many of those decisions. It's here we think it's the right moment for current state where that the organization is evolving as a whole. And likely we'll need some other component or structure in the future. Second, on your structure piece our structure is actually very clear and transparent. So we know who the teams are, who is the person responsible, how they roll up further up. Most of those teams are small. So they are on those equal levels, but it starts to now create that. So we still have a clear and identified leads that everybody knows.
There is no, you don't have the title attribute that is going to stay and be there forever because we expect
**Guy Podjarny:** that's more an assignment than it is a than it is a title. You're not a manager. You are currently managing this team.
**Mati Staniszewski:** Exactly. Exactly. So that's that structure is still very transparent and the well the compensation is an interesting one because given the nature of our work, which we do have that research, engineering [00:57:00] some of the individuals in those teams or even in the product teams are exceptional and wants to do those more of the individual contributing work.
Of course you, you do pay a lot for that caliber of talent. So maybe where I'm going with this is standard organization, like if you're a manager, you are paid usually more as you think about the progression than the individual contributor path. And in our case we see there's like some of the people that are ICs can be paid more or as much as any of the other people and want to make sure that this is something that people know that if you are a brilliant mind, you can have that in them in any part. Of course, it's harder because you need to be truly exceptional but definitely possible.
**Guy Podjarny:** I think look, I think it's very interesting.
I've toyed with it as a thought, in the past, never as bold a statement as is what you've you've done here. And I think I do think the sort of the stage question is is an interesting one. So it's interesting to evolve. I have only I guess the one more immediate sort of challenge I have for you is just as you think, if you harken back to the product versus research type [00:58:00] lens on it or what's the center as you build up a bigger, sales organization, a bigger just support organization, product organizations, it, are you hoping I appreciate the sort of humility of maybe not right for later, maybe yes, but is your anticipation that those would also fit here or do you think, hey, this works as long as we, as our sort of center is this type of activity and.
**Mati Staniszewski:** That's also an interesting one.
And, like one thing that I think is also valuable for us is that what it, one, we did have a number of people that really wanted us well, that Hey we don't think titles are useful. There's a good internal support for making the move. But then also externally, as we think about hiring talent we want people that will join for for that impact that they can bring to organization, which allows us actually to hire a lot more people that traditionally are in the senior levels of the organizations because they're really joining to, to execute for the company rather for.
**Guy Podjarny:** They don't need a title of it though.
**Mati Staniszewski:** Exactly. Which [00:59:00] actually helped. And like we have now number of people, incredible people that are reaching with that expectation that if they are doing well and rightly they will have high degree of the ownership in the company. You did ask the right thing, which is, of course there's a lot of external roles where the title itself can be helpful as people interact with the clients as they get the sale.
There are some markets where title is the way of gaining and understanding the respect. And here, we we are avoiding this as well. So you don't still have a title because then everybody would have a high title. But in a transient exchanges, if you think that the title can help you in a specific business activity that you do, you are allowed to use it.
You just, we have like roughly of pre agreed of what is the right,
What is legitimate. But but like good example is Japanese market where people really need to know who you are interacting with, where do you do use that? And in some of the business context, or if you are outbounding to a client we do potentially allow that.
We've actually not seen [01:00:00] any of the positive of if you have a title, it like leads to any better outbounding conversion rates. In presentations too. It's if you are saying you're leading a region that's as good enough as, I don't know, VP or director or head of. Yes.
**Guy Podjarny:** And I think that part I wholeheartedly agree with.
And we do the same. We don't have any sort of, VP or director or such. We have head of, we have lead of a team on it here at Tessl. It was harder to do it 1200 people at Snyk. It's it's interesting to. Hey, we're really coming tight on time here. Just one kind of final question.
Beyond the amazing future that ElevenLabs, I'm sure, has ahead of it, and you'll build a lot. What other aspect of AI excites you the most when you cast your eyes out?
**Mati Staniszewski:** I think the I feel like this is a common answer you get. But it is extremely exciting of how the LLM or the multimodal space will evolve. And like how can you have even higher degree of understanding, like effectively that infinite context window or like a pretty large context window.
And as I think about education space as a whole and [01:01:00] how AI and education space can come together, I think this would be incredible. If somebody knew all the knowledge I gained over the last. Since I was born can be my personal tutor through that, through those years, be my like mathematics voice when I'm solving mathematical problem and then help me on this sales deal that that they haven't known about the process, which will be an interesting innovation in terms of can the LLM space and multimodal space carry that, that very large context across my life and provide that personalized experience.
So I think what's, what companies like on Anthropics, OpenAIs are doing on the on, on that side or Google is it's just incredible for the world and for what's possible.
**Guy Podjarny:** Yeah. Oh exciting to see that as well. Mati ElevenLabs is amazing. I really wish you all the best on it.
And I really love the insights that you shared over here. So huge thanks for coming on to the show.
**Mati Staniszewski:** Thank you Guy. Thanks for having me. It was a pleasure speaking together.
**Guy Podjarny:** And thanks everybody for tuning in and I hope you join us for the next one.
**Simon Maple:** You're listening to the AI native dev brought to you by [01:02:00] Tessl.