Podcast

Podcast

The $2B Nvidia Backed AI Video Platform

With

Victor Riparbelli

10 Jun 2025

The AI Video Platform Doing 100M In ARR
The AI Video Platform Doing 100M In ARR
The AI Video Platform Doing 100M In ARR

Episode Description

In this episode of AI Native Dev, Guy Podjarny is joined by Victor Riparbelli, Co-founder and CEO of Synthesia, who is leading the charge towards AI driven video that lets anyone become a video creation pro with no prior field experience. On the docket: • how Synthesia enables full video editability even after generation • Victor’s vision for fully coded videos that enable personalized, customizable viewing • Victor’s belief that future generations will prefer visuals over any kind of text

Overview

The Synthesia Story

Synthesia enables anyone to create professional videos without cameras or video expertise. Users type scripts, select AI avatars, and use a PowerPoint-like interface to produce videos that once required entire production teams. The platform doesn't just generate video—it helps users understand video storytelling, offering AI assistance that can transform PDFs or presentations into properly paced video content.

The company's journey is a Silicon Valley classic. After 80-100 investor rejections during the "AI winter," a cold email to Mark Cuban led to their first $1M investment. Cuban had actually implemented their underlying technology at home and immediately understood the vision. But the real breakthrough came from a strategic pivot: instead of targeting Hollywood with AI dubbing, they focused on corporate users who compared AI video quality to PowerPoint, not blockbuster films.

Democratization Through Design

What sets Synthesia apart is their focus on editability and control. Unlike pure prompt-based systems, everything remains editable after generation. As Victor explains: "We invested a lot in the editor... because if you have that, everything is fully editable, right? And that's great because then you can start to help people."

This approach mirrors the evolution in software development tools. Just as no-code platforms opened up app creation, Synthesia is making video accessible to non-specialists. The key insight: different use cases require different tools. A personal to-do app and enterprise CRM have different stakes—similarly, corporate training videos and Super Bowl ads need different approaches.

The Future of Content

Victor's provocative prediction—"Your kids' kids won't be reading and writing"—reflects deeper trends. As video creation becomes as easy as writing, why compress ideas into text? He points to his own behavior: "I'll buy a book on Amazon and spend a day reading it, feeling like half the time they're just rehashing the same examples... when I could watch a 25-minute YouTube video."

But the future isn't just video replacing text. It's about adaptive content that personalizes to viewers—educational videos that adjust pacing, use relevant examples, and answer questions in real-time. Just as websites evolved from digital newspapers into dynamic experiences, AI video will evolve beyond traditional formats into interactive, personalized media we can't yet imagine.

Key Takeaways

The conversation reveals important parallels between AI video and AI-assisted software development. In both fields, experienced practitioners who embrace AI tools will thrive by focusing on higher-level decisions—storytelling and user experience in video, architecture and product design in software. The quality of human input matters: expert users write better prompts and get superior results.

For businesses, the message is clear: video creation is becoming as fundamental as document creation. The tools are democratizing rapidly, and early adopters will have significant advantages. We're witnessing transformative technology following its predictable pattern—first replicating existing formats more efficiently, then evolving into entirely new mediums that reshape how we create, share, and consume information.

Resources

Synthesia - AI video generation platform discussed throughout the episode

Victor Riparbelli on LinkedIn - CEO and Co-founder of Synthesia

AI Native Dev Podcast - Subscribe for more episodes

Face2Face Paper (2016) - The foundational research that inspired Synthesia

Guy Podjarny on Twitter - Host of AI Native Dev

MAYA Principle - Most Advanced Yet Acceptable design principle

Chapters

0:00:00 Trailer
0:00:58 Introduction
0:01:45 Largest leading AI video platform
0:05:14 Experience and parallels
0:08:50 Fine-tuning automation
0:13:22 The craft really matters
0:25:22 Creativity and control
0:33:56 AI winter-type period
0:39:30 The founder story
0:50:22 Power users
0:55:04 Societal predictions
1:10:05 Outro

Full Script

[00:00:00] Guy: Hello everyone. Welcome back to the AI Native Dev. Uh, today we have Victor Riparbelli with us on it. I've known Victor for maybe, what, five years now. And he is the CEO and co-founder of Synthesia, which is a super exciting company and the leading company today when it comes to AI native video.

[00:00:27] Guy: Uh, and we'll talk a lot about kind of creation with AI and, uh, and decisions made along the journey. And also a little bit about the journey itself, which is really interesting in its own right. Uh, and apply, you know, what type of learnings, uh, we can take from that to the world of that. So Victor, thanks a lot for coming onto the show.

Victor: Of course. Happy to be here.

[00:00:45] Guy: So Victor, maybe just to get us started a little bit, tell us, uh, a little bit about what Synthesia is, uh, and what it does.

[00:00:55] Victor: So Synthesia is the, the world's largest leading, um, platform for AI video for businesses. Essentially, what we do is we help our customers land their messages with the highest degree of effectiveness, right?

[00:01:05] Victor: And the highest degree of understanding of whoever is on the other end of that message. And in 2025, that's not with slide decks or text; that is with video. Today, people want to watch and listen to their content, and everyone is different. Of course, I'm sure you read a lot. I also still read a lot. But if you’re the average consumer, right?

[00:01:20] Victor: People really want to watch and listen. That's what they do in their free time when no one is watching. They listen to podcasts, watch TikTok videos, YouTube, Netflix, and so on. But when it comes to our work life, most of the content most of us consume is still in text format. It's in slide format. And we've essentially built a platform, um, that kind of started with this avatar technology, which really was a way to replace a camera that you use in the real world with a sensor,

[00:01:44] Victor: where you go around and you film things. With AI, we could generate the footage, if you will, and of people talking specifically. Um, but since then, we've now built an entire platform around us. So we have a video editor that enables you to actually finalize the video. You can add in your, you know, screen recordings, text animations, a whole bunch of other things to actually finalize the video.

[00:02:02] Victor:Uh, it's a content management system. It's a publishing platform with our own proprietary video player. It essentially is one platform that helps you go from an idea all the way to delivering that video to the end consumer. And understanding, uh, how that, how this is impacting, uh, them. One of the key things around our platform...

[00:02:20] Victor: Which I'm sure we'll talk more about today, right? Really is this, this aspect of enabling people who had never made a video before to make a video, right? And so when you enter the app today and you begin making a video, if you've ever made a PowerPoint in your life before, you should be able to pick it up in kind of five minutes.

[00:02:36] Victor: And that means that what seems to be done is in some of the world's largest companies, we've created thousands of video creators who used to only be able to create documents and slides, but now they're, they're making videos. And that's kind of like where we are today.

[00:02:51] Victor: One of the things that's always been a huge part of our thesis is that the first sort of chapter of AI video is going to be around essentially using new fancy AI technology to do old media formats as in linear video you watch from A to B, right, broadcasting. Um, but of course, once you could generate video with code, you open up so many more new possibilities.

[00:03:12] Victor: Videos can be interactive, they could be two way conversational and they could do a whole bunch of other things we couldn't do with normal video, right. And so I think the next couple of years, that's really what's going to define the next chapters and features.

[00:03:25] Guy: Yeah, no, super exciting. And we're going to drill a lot more because like each of these things is, is sort of a whole topic to unpack.

[00:03:32] Guy: Um, maybe so starting, so I love how the sort of the platform has this ability for someone who hasn't created a video to create it. Today, I use it as the kind of really kind of a great example of how you can kind of enter, disrupt the business world by really taking something in the world of AI that allows, you know, powerful creation but requires a lot of change.

[00:03:54] Guy: You know, a lot of. Uh, it, uh, it's very different, it's an entirely different way of creating video as compared to whatever, taking a camera and starting to shoot it. Uh, so I think that's really powerful, but it's also probably like a reasonably high friction type environment on it. So maybe first of all, like, describe a little bit more like what is involved when someone does create a video in Synthesia.

[00:04:22] Guy: Like what is that experience, just for context, and then maybe let's talk a little bit about who's like the ideal creator in this context. Does it even matter if they have previous video creation skills?

[00:04:34] Victor: I think there's so many, um, you know, parallels to co-generate. So think when you make a video, right?

[00:04:43] Victor: We tend to think a lot about the production of the video, right, which is like technically how are each of the frames kind of created, right? Is it with a camera? Is it with AI? Mm hmm. You sort of technically make the video. Um, and that part I think, you know, we've made super easy, right?

[00:04:58] Victor: That's essentially you go in, you select an avatar, you type a script, you use this like basic PowerPoint editing functionality to put something together. And most people who are office workers, I think, should be proficient at figuring out how this works. You know, that's not really not that difficult.

[00:05:12] Victor: Especially if it compares to using Adobe Premier, like some expert tool where most people would go in and just blank, right. They wouldn't know what to do.

[00:05:19] Guy: Right. This is a, is it fair to sort of think about this as the storyboard or this like, like it's the it's the it's the script. It's the narrative.

[00:05:25] Guy: Like whatever the chapters of the video.

[00:05:28] Victor: Yeah, it's, it's kind of, if you think of a power like making a slide deck, right? Like you have different scenes, um, you have the scripts. We made a whole bunch of like design decisions around not giving you a timeline, for example, which are different layers in the video.

[00:05:41] Victor: Um, because as soon as people see that they begin to freak out, they don't know what a video is, right? So we intentionally kept it almost like PowerPoint with speaker notes and kind of a relatively simplistic UX. Kind of, I would say from a technical standpoint, most people can create a video, right? But the thing is that a video is not just a technical part of it.

[00:05:58] Victor: The video is also storytelling, right? It's how we actually put together what is a good video, right? How long is a good video? What should we include in the video? What should you not include in the video? And I think what we found, generally speaking, right, is that actually, that's actually the thing that people struggle with the most.

[00:06:13] Victor: By now most of us know how to write a document, like we know how to write detail, we know how to text with people, but video is a new format, right? You can't just write a PowerPoint doc and then just copy, paste that into some detail and make a video. That's not going to be a great video. In a video you have a visual language that you want to use to communicate, right?

[00:06:28] Victor: You have animated all these other things. Um, so there's like the technical part of actually making a video, and then it's kind of more the creative aspect of actually putting a good video together. And that's actually what people struggle with the more Synthesia. So we build a bunch of tech to to to ease this, right?

[00:06:44] Victor: Um, we've built what we call our assist creation, which is essentially kind of like an agent that we've taught how to use the editor, on your behalf. So what that could do is you can give it, either you can give it just a script you've just written, kind of freeform, but you can also take a URL, you can take a PowerPoint deck, you can take a PDF document, basically any content, existing content you have.

[00:07:02] Victor: Mm hmm. And they'll take it, they'll parse it, then it'll turn it into a video. So they'll write a script for you. It'll delineate into different scenes. It allows them to do the design of those different scenes. You know, some of those scenes may have a big image in it, and they'll pull images from stock.

[00:07:17] Victor: Stock writers. You'll use some of your own assets, and it gives you kind of an outline. I would say from where it is today, it's not like a final video end to end, good to go. But it essentially changes your role as someone who makes a video from coming up with something from scratch to being an editor.

[00:07:32] Victor: Right? The draft is maybe like 80 percent there, 85 percent. You still want to go around, change part of the design. You want to add something, you want to remove something. And it's been pretty interesting to see the progression of how many more people we can kind of bring on as creators when you give them a draft to work from with their content as opposed to asking them to come up with something from the draft.

[00:07:53] Guy: That's awesome. If I kind of echo back a little bit of the pieces here, the first part is okay. You use the lowest common denominator or maybe the core piece in which you want the human to decide, or the creator to decide just what it is that you want to say. And I guess the storytelling piece and the video formation, or how to tell that story in video format is the part that, if you want to democratize it, then you need to help people.

[00:08:20] Guy: And so you assist in that. But you have also mentioned another aspect. It's not like you can imagine that type of assistance happening as someone types, like auto-completion or a little reviewer or something like that. And I think you even had, and have probably still, some of those capabilities.

[00:08:36] Guy: But you're describing something that is more than that, which is just kind of point me at sources of information and I will help you form that. Are those just onboarding exercises like, give me a PDF and I will create a thing, but now you will engage with something that is back into that format of I've got the slides, like you just sort of pre-create the slides, but then from here on, you work with the slide?

[00:09:00] Guy: Or are these actual degrees of freedom where you're leaving it to the LLM, saying here's a PDF, extract info from here, extract it from there. And maybe if you run it again, it would extract it a little bit.

[00:09:11] Victor: It is like an entry point to making a video, which a lot of people use as the starting point.

[00:09:14] Victor: It is just like an onboarding exercise. Um, you can, you can kind of tune, Basically, when you upload your content, you can tune the agent in different ways. You can tell it how long you want the video to be, what audience you are making it for, and it will give you first an outline of the content. Then you can actually, in bullet form.

[00:09:32] Victor: So it'll give you like a bulleted formatted list of what narrative it will be. And then from that you can actually edit the bullet yourself. So you can add another bullet saying, add in something about blah blah. You can take a bullet out. Then after the outline, which is sort of the easiest way for most people to sort of have an overview of like, what's gonna be this video?

[00:09:48] Victor: Then from there, it'll expand that into individual scenes, which is an actual video.

[00:09:57] Victor: It's less iterative than say, like a completely free form texting where you just keep prompting it over and again, just spitting out new versions. We do ask you to kind of like fine tune it a little bit in the beginning to like what specifically that you're after. Obviously this is like, did you know this is, this is like work in progress and I think, I think in, in all modalities right now, people are trying to figure out what is the right degree of automation for different types of tasks, right?

[00:10:22] Victor: For example, I think if you're making a Super Bowl ad, which is not our target market at all. There, I wouldn't expect automation to be, uh, just a low degree of automation, right? Because this is a really important piece of content. Like those 30 seconds, every second, uh, really, really, really matters so you want that to be absolutely perfect, and that probably means you wanna have a human being very much in charge of every single pixel, right?

[00:10:45] Victor: On the other hand, making a corporate training video, and your main goal is to understand, help people understand, like, I know some, your vacation policy in your company, or something like that, right? They really want a high degree of automation because it's a much lower stakes piece of content.

[00:10:58] Victor: Um, and we operate more in that category than we operate in the ML category. So what we're kind of figuring out is like, what's the right UX for people to get to that draft, but then also iterate on the draft after, right? We have a bunch of tools now, so you, you know, once you're in your video, you're editing it, you can of course use LLMs to kind of help you,

[00:11:14] Victor: extend the scripts, uh, make, you know, some more happy, more sad, like do a whole bunch of things in there. But, uh, we, we very much believe in, in augmenting humans as opposed to like trying to kind of replace, uh, like creating a fully automatic content generation machine. I'm not saying that'll happen, I think that's definitely the direction the world is moving in, but we very much believe that putting the human at the center of all this just nets you the best content, right?

[00:11:38] Victor: Like there's, so I, I think we've seen so many startups both in our space, but also in other spaces the last two years, who try to do completely like end-to-end one shot generation of X, Y, Z. And it's a cool demo, but very often, right, it just doesn't actually work if you wanna use it in production, right? If you get a video that's kind of like 70% there, but it's not your brand colors, uh, it doesn't really speak about the product the way you thought it should speak about it.

[00:12:03] Victor: Then, then these things are just, they, they kind of remain like very cool demos. Um. But I don't have a lot of utility. Right. I think that's, that's one thing we've always been like, obsessed about is really working backwards from like utility, which, which sounds very banal and obvious, but I think in, uh, when we have these hype cycles, right, it could be easy to, to, to forget that sometimes.

[00:12:23] Guy: Yeah. Because it might introduce a bit more friction to sort of the initial creation versus just sort of saying a thing and to creating it, which I think I, I'd love to draw some analogies to sort of the world of software development right now, which, you know, feels fairly clear here. So you have, uh, you know, for starters, you have code that describes a video.

[00:12:41] Guy: So you have a sort of PowerPoint structure that transforms into video. And that is, in that case, the sort of the, the text or whatever sort of guidance or configuration that you have about every slide or every one of those portions, uh, uh, gets compiled if you will, into, into, into kind of a video format.

[00:12:58] Guy: Um, and then you have other helpers that you know are more similar to our, you know, current world coding assistance in which you can text your way or auto complete your way or such into, into creating it. And so that lowers the barrier, 'cause you might engage with the system with less video creation, uh, familiarity or less, you know, even code familiarity.

[00:13:19] Guy: I think in that world it's, it's a little bit easier. 'Cause even after you do all that process, eventually you get words so everybody can engage, you know, with that sort of text that describes what will be said. Uh, but still you might not have video familiarity. So you have more of those, uh, interactive LLM powered, like, uh, tap the intelligence that is built into the system to be able to create the narrative.

[00:13:41] Guy: Um, and I guess both of those are like avenues for improvement, but eventually you, you create code so you're not, you're not hiding, you know, in a or you're not entirely delegating video creation to the AI, you're, you're assisting.

[00:13:57] Victor: Yeah, for sure. And I, and I think one of the, one of, one of the things that was always very important to us, right, is you could, you could approach video generation as kind of like a completely end-to-end task, right?

[00:14:06] Victor: Like you put in text and on the other end you get our pixels basically, which is how a lot of the, like, you know, uh, like the very generalized video, um, generation model work. Like if you look at Veo 3 for example, or Runway, that approach certainly has a, like a lot of great things about it.

[00:14:22] Victor: Um, but we always knew that for our customers and for our workflows, right, we needed to have an intermediate layer of editability, right? So for example, in in, in kinda

our videos, our videos are composed of the avatar models, right? Which produces pixels and footage essentially, like you would get it off a camera.

[00:14:41] Victor: Then you have all the other elements of the video, which would be, you know, you have like a lower third, like I'm like, I'm having right here, right? That's a piece of text on screen. There's some animations. There's, um, a transition to the next scene, which right wipes out everything and fades in, or like, there's all these things that make a video video, right?

[00:14:56] Victor: Um, and we were always of the opinion that we need to actually build that as like representations we can control ourselves, right? So that you can just simply move a shape around the screen. You can change the text, whatever. If you had to prompt that into the video, right? That clearly just doesn't work.

[00:15:11] Victor: And so what we did was we said, okay, we have the footage and we have the editor, and we invested a lot in the editor. It's really, uh, you know, that's a pretty hard piece of software to build. Well, because the craft really matters like what it's like to use an editor, right? 

[00:15:26] Guy: How do you represent, how do you sort of, uh, convey, uh, all of these, all of these desires.

[00:15:30] Victor: Exactly, it is like, you know, building a pitch like all these kind of cradle tools, right?

[00:15:34] Victor: That's actually, you know, it's pretty hard. Um, building a great product here, but you need that because if you have that, everything is fully editable, right? And that's great because then you can start to help people again. Like you can prompt something and if they can edit everything after the fact, then it's great, right?

[00:15:48] Victor: We've seen some products like, you know, there's like slide deck generators and video generators, etc., that try to do everything end to end. And when things aren't editable and you just have to keep re-prompting and re-prompting and re-prompting—it works for some use cases, you know. Like if you're using some of these general video models to tape clips for a music video or B-roll footage for an ad or something like that.

[00:16:12] Victor: That's probably okay, right? 'Cause you're not trying—you just need a ten second clip or something. And what you want there is like a lot of creativity in the model. The ability to just like, you know, throw in 40 different prompts, get 40 different outputs, and just pick the one that you like.

[00:16:25] Victor: But for the use case we're really targeting, it's very different, right? Like here people don't wanna prompt five times to get a video that actually works, right? Because this is like a volume game. People are creating lots of videos—able to edit everything out, which is a fact. And so when we think about automation, the important thing about automation was always that the automation, whatever the agent spits out, right?

[00:16:44] Victor: Is something you as a user can sit down to correct yourself and edit. And that also gives us a great signal in terms of like, what's the agent good at and what's the agent less good at? Right? So that kind of data that we start to collect there is actually pretty interesting. It's almost like our HF data um, in, in, in some, some way.

[00:16:58] Guy: Yeah. In what people edit. So I love that. And I guess that implies a certain level of determinism, uh, between, you know, taking, taking that, those instructions, I don't wanna say that text because it's more than text, but that representation of the video that you have in every, every scene, every slide, uh, in the deck and creating it, I guess.

[00:17:15] Guy: How do you think about, um, uh. I know the room for creativity or, or like the degrees of freedom that you want to leave. Uh, the, uh, the LLM, I'm, I'm certain that, you know, if you take the same description that you've sort of written today, uh, and you, you, you generate or something that you've created a year ago and you've generated a video from it, that video is not gonna be the same one that probably gets generated if you kind of run the same thing today, a year later.

[00:17:43] Guy: Just by virtue of the system getting better, right? The platform is improving, I guess. How do you envision today and tomorrow this kind of determinism versus kind of freedom to evolve on the other side?

[00:18:03] Victor: Yeah, so there's like two components of this.

[00:18:05] Victor: There's one, the first one is like actual, you know, video models that think of those as like generating the footage, right? That would come off camera. And then the other part of, it's kinda like the video editing, which again is like all the, the lower thirds, the shapes, the animations, the transitions, all this stuff. These are like today, the two kinds of distinct things, the two different models, right?

[00:18:23] Victor:The first one replaces the cameras and it gives you a footage of someone talking to the camera and the other one takes that footage and edit it into something that is uh a coherent video end to end right? The first one on the avatars. Um the thing that's that's unique about uh and I think the way when you look at like the AI video model market there's there's a bunch of different ways you can approach AI video and of course it is again super use case dependent right?

[00:18:46] Victor: So if you really try and and cook it or really try and like distill it down I think they're the two main and most important dimensions of ai video models are creativity and controllability. So creativity is a bit as said before right? How creative can the model get? And creative models are super fun right?

[00:19:05] Victor: It's like when you use Deli, whatever, you can type in literally anything in the model and something will come up, right? You can have dog skateboarding on the beach. You can have an elephant walking around in blue. You can have all these amazing things, right? And that's super important that you're creating creative content, storytelling content, theme content, viral content, whatever.

[00:19:22] Victor: Then you really think that creative dimension is incredibly important right? On the other hand you have controllability. Controllability is um in our case for example does the avatar look exactly the same every time you make a video right? But does it change slightly depending on the prompt? Like maybe sometimes you know you'll have a beard, maybe sometimes you won't have a beard.

[00:19:41] Victor: Um, does it say exactly what you put in the script? Can you even actually decide what the person says? Um, how fast is it to run? Um you know there's all these things that are essentially around like control it out right? How deterministic is it? Does it work every single time? Are we in the kind of slot machine world of using big video models where you just put something you hope something gets out.

[00:20:02] Victor: And here there's a very clear distinction of what people are focusing on. We're very clearly focused on controllability reliability costs ease of use uh right? Our avatars need to look exactly the same every single time. They need to say exactly what we put in the script box and all this stuff. And then there's other models that started this creativity.

[00:20:19] Victor: Of course we are moving towards more creativity in our models, the new flagship models we're releasing soon. They'll let you know you'll be able to prompt yourself into new environments and prompt your avatars. We're getting like we're taking some of that magic but it's all kind of built around this idea of like controllable models because that's what our users need right?

[00:20:36] Victor: On the other hand we have the other models who are built around creativity and sort of ideas right? This is where you get more than your creative directors like um agencies etc right. Who doesn't mind prompting something 50 times to get to the right scene. Because if you have to prompt something 50 times it's still arranged a hundred thousand times more affordable than actually going on.

[00:20:59] Victor: Shooting an elephant on the moon in the studio somewhere right? Yeah. But of course everyone wants to have controllable models that are also very creative. And I think what we'll see is that eventually probably most of all AI video will converge in like a model at some point. And the different products will be, you know, kind of ruled in the same models but the use cases will kind of determine what the products look like and the UX will be the thing that differentiates different products right.

[00:21:27] Victor: On the a limb side of like actually taking that footage and then putting it into videos um that is essentially a task of like generating um uh creating a model right? That understands like our editor that can move shapes around, write text etc. And the main vector of improvement here is just like can we teach a model to go from today's more templatized, right?

[00:21:52] Victor: So the way it works, sort of like you, you write the scripts, of course, like mm-hmm. LLM’s to do that. And then based on the scripts we try and figure out like, okay, what kind of design, which is basically like a whole bunch of.

[00:22:04] Guy: What's the sentiment? Do you sound angry? Do sound sort of laughing. 

[00:22:07] Victor: Well this is more like around like, shouldn't there be, should it be like a process diagram or should it be like a big picture or something like that?

[00:22:12] Victor: Right. And then the model will figure out okay from this scene. I think it makes the most sense to have like three bullet points. I think it makes the most sense to have a big image here or a process diagram or something like that. Um and that works pretty well actually. You know like because ultimately when you're making simple videos that can be it's a bit like think of how you big have like.

[00:22:32] Victor: 150 PowerPoint templates and the model sort of picks which one works the best for your go to market slide, which one works the best for your product vision slide and so on right? And it fills those in with your brand colors, your fonts and a whole bunch of other things that make it look real. The thing we're working on now right is though how do you go the way you kind of go above that is you actually teach a model the language of video right?

[00:22:53] Victor: How could you do that? Well you could have a model that watches a lot of video and tries to figure out you know uh not just like what kind of template should I pick in but like what should this video actually look like? What does great editing look like right? Mm hmm Well that's something like you know you'll sue a little you you are talking the camera kind of slightly zooms in and animation pops up on the screen.

[00:23:09] Victor: This is much harder of course but obviously this is going to be possible right? Like I'm sure if you sat that model down and watched a lot of videos and we have a whole whole bank of videos ourselves that would make for us right? Can we learn to instead of just using kind of a template approach to like taking you from that you know

[00:23:27] Victor: idea to a video to actually just learn what your brand looks like? What does your font look like? What is the language of video like? For example if you look at modern video people cut every eight seconds right? It's like this very ADHD way of making videos. But people like it so the model would probably learn to automatically cut that you know to a side angle to a straight on angle to like for cool animations.

[00:23:47] Victor:This is like you know definitely more in the kind of research phase at the moment. But if you imagine that you at some point right? You will have models that can spit out content and footage that looks as good as Hollywood films and it's as controllable. You can maintain that end. You can do really whatever you want it to do and to have a model that understands the language of video right?

[00:24:05] Victor: That you could combine the footage you can generate with the kind of video with the world's best video editor that can take that and turn it into something. And I think that's the path for Um. That's the two main vectors of how you get to kind of uh Steven Spielberg as one AI product.

[00:24:23] Guy: Yeah, of creating.

[00:24:23] Guy: I love a lot of these uh a lot of this sort of analogy. And I think actually it might be sort of uh you know a little bit when all you have as a hammer you know you sort of uh everything looks like a nail but uh I think of it a lot as being spec centric like you are defining to a level of specification.

Victor: Yes. 

[00:24:38] Guy: What is it that is in the video? And I think what you've built in Synthesia is you've built a mechanism that allows a fairly substantial level of control probably as compared to code still low level of control right? You still don't say Hey uh uh increase or sort of raise the corner of the mouth by sort of uh whatever you know two centimeters you know at this point you're still.

[00:24:59] Guy: You're actually still allowed a decent amount of interpretability uh to the uh to the model because it's just kind of unfeasible to describe the video at that level of granularity. So as compared to code it probably still already has some built in degrees of freedom but still you control it uh uh substantially.

[00:25:19] Guy: But then the other side of that is adaptability because I think you talk about creativity and that's great. But but and and I agree with that but that in turn enables adaptability so that you know in if you were to generate a video 10 years ago or for whatever like 40 year olds like me you know and and you might uh go easy on the ADHD and sort of you know allow a moment uh for set of scenes to change while if you're sort of doing it for a teenager uh you might be much more sort of snappy in the modifications.

[00:25:47] Guy: So the same video can be generated in a way that is adaptable to that context. Right. And I think a lot about that in the context of code because you know code has its own stacks. Like sometimes it's uh financial needs sometimes it comes back again to sort of user needs and sort of preference

[00:26:02] Guy: And do people want to know get a clean page or a reach page? They want it corporate looking or you want it modern looking? And even on the functionality right? Do you wanna optimize for latency? Do you wanna optimize for costs? Do you want to uh you know user specific stacks? All of those are different contexts right?

[00:26:19] Guy: And you will adapt uh the code to that. But you need that spec that allows you to control that sort of slider between predictability and adaptability. Uh and and I think you're using the better words for the world of video which is uh controllability and uh and uh creativity. Um but is that am I on track here in terms of describing

[00:26:40] Victor: It's, it's essentially all the kind of knobs you can tune, right?

[00:26:43] Victor: And, different knobs will work differently. Like different people, different users will have different needs. Um, as I said, I do think all these things eventually will convert, but it's kind of a matter of like, what's your wedge into the market and, and, and how do you, how do you position yourself?

[00:26:57] Victor:  Um, the exciting thing as to your point here, right, is of course that, and I kind of also said that a bit in the, in the intro, I think is, um, I think we're right at the brink of, I think right now, like video and AI video look very similar, right? It's basically just like how it was produced. That's, that's different, but that's going to change pretty dramatically, right?

[00:27:17] Victor: Um, as to your point, right, what we're going to be able to do when every video is generated with code and there's no sensors involved, everything is literally just like zeros and ones. Then we can start to make an entirely new type of video experience where maybe, um, you don't like the ADHD experience of TikTok cutting like every eight seconds.

[00:27:37] Victor: That the light should really annoy you. If you had to watch a video that's built like that mm hmm you could just not watch it like that. You get a version to a video that is maybe a bit, you know, more kind of slow paced. It's not like TikTok, but then maybe like, you know, your daughter or someone else who's younger than you, prefers to watch that style of information.

[00:27:53] Victor: And they can and they can do that. Right. There's a lot of parallels right. The one I usually use is like, you know, when the first websites kind of were conceived, uh, back in the day, they kind of looked like newspapers and screens to some extent because that was kind of like what we imagined back then, right?

[00:28:07] Victor: That's like the most obvious thing to do because everyone knows what newspaper is a page with some information. Um, let's, let's replicate that on our screens. And of course that was a great product and I think that's kind of like where we are right now in AI video. But then eventually, like we figured out, well actually kind of our website and a newspaper are two very different things, right?

[00:28:23] Victor: A newspaper like you printed every single morning, you send it out to 10 million people. They all read exactly the same thing. Um, on a website, you can update the information in real time, right? You can have a feed that's personalized to whoever's actually watching it. You can drive interactivity between users, with one click you can access a gazillion pages on the internet, um, and all the other things that makes our website very different from a newspaper, right?

[00:28:44] Victor: And so I think what will happen is, we'll, like these mediums, will diverge and I think what we think about as AI video right now, right, will eventually be that. Right. It'll be more about a system that has been instructed by someone like me, for example, a user to communicate a message, and then the system can kind of figure out how itself, like, well, maybe Guy wants it delivered this way, Victor wants it delivered this way.

[00:29:05] Victor: That's both across languages, body languages, what avatar we want to use, etc. But that's very clearly the way all, all, all of this stuff is going to go. Right. And, and that's always actually been the thing that excites me the most. I think in our space there's going to be two main directions. It's going to be one direction, which is going to be the Hollywood film, uh, entertainment thing.

[00:29:24] Victor: There's so much to do there still, just like enabling someone to create.

[00:29:27] Guy: Yeah. Just to sort of replicate the current status quo. Yeah. Or like, you know, produce the thing that can be done, uh, today with sort of physical, uh, means, you know, or digital, but, uh, but production means, but in a much, much cheaper and sort of, uh, iterated way, but eventually producing the same type of like output.

[00:29:44] Victor: Exactly. It's kind of like when people, you know, use a computer to write a book, right? You write the book and you publish it. And of course that's like, I mean, way easier than sitting with a typewriter back in the day, right? Or at some point carving in a stone or something like that. Like it's, it's much, much easier and much faster and has a huge kind of value added to it.

[00:30:02] Victor: But if you're someone with a, with a message, right? A lot of young people take that, they'll start a blog or there'll be a blog and, and Twitter, right? That's like constantly evolving and it's not really like the form of approval, it's something very different. And I think it's like, it's all the, all these trends that we see with every time we've been through media technologies, this exact same thing happens.

[00:30:19] Victor: And that's about to happen for AI video as well. And that's definitely the direction we're more excited about. I think there'll be, other companies will be focused on like, you know, entertainment fictional content like, video as kind of no video today. Um, what Rhythm was passionate about is like, how do we evolve video into something new, right?

[00:30:35] Victor: I think that's, that's, that's just both on a personal level, I think it's very, very exciting and fun to do. Um, but I also think from a business opportunity perspective, um, there's just so much, so much knowledge to be driven here. If you take training and education, which is something that's, you know, very close to our use cases today, um, when you can watch a video, when you can ask the video a question, or the video can quiz you or examine you on the content you've just consumed, um, change the content, uh, depending on what you're interested in, right?

[00:31:06] Victor: You may have a kid learning mathematics and this particular kid loves football, right? So let's make all the examples that make everything that you're learning about math be about football. That makes me emotional when I'm learning math, it would be about music, because that's the thing that I'm the most passionate about.

[00:31:22] Victor: If it's you, it's something else, right? Yes. Just the, the, the societal value we can create with this kind of personalized learning. It's essentially like everyone will have kind of a, a personalized tutor in some way. Right? That's so exciting. And there's, I guess, certain other examples, you know, in, in, in the corporate world, in the business world where, where a lot of value will be created. It's just going to be, it's going to be immense.

[00:31:42] Guy:Uh, I love, uh, I love that both kind of, you know, with the sort of the, the mission aspect of it, uh, but also, uh, the, the technology and the kind of lens on creation. Uh, and I guess in that sense, when you're creating things of that nature, you, the user is co-creating, uh, with, uh, with a platform and you end up having sort of these three way, uh, creation, uh, whatever combo here, right?

[00:32:03] Guy: In which there's the original creator that scaffolded the system and defined, you know, these things have to be there, and they are, um, they're set in stone. There's no, there's no degrees of freedom over here. Like it would always be these brand colors. It would always be, you know, these are the topics that need to be examined on, etc.

[00:32:21] Guy: Um, then you have the, the AI platform itself, uh, or let's sort of this easy platform in this context, uh, filling in the gaps and sort of, uh, uh, uh, implementing simplifying creation and bringing it. And then you have the user kind of, with whatever their inputs is. Maybe it is whatever the favorite, uh, step that interacts with the platform.

[00:32:41] Guy: So it's a, it's a new type of creation, not just delivery, but you need to enable the delivery of that. Otherwise, you’re hindered.

[00:32:50] Guy: Yeah. Super excited. I've got like a million more sort of, uh, topics to sort of ask about, uh, about the creation side. Maybe we'll come back to that, but I wanna, uh, switch a little bit to talk about the journey of, uh, Synthesia because it's just super interesting. So you started the company back in 2017, uh, uh, kind of the AI winter type period.

[00:33:09] Guy: At the time it wasn't, uh, uh, it wasn't like the best fundraising strategy, uh, to come, uh, and come up with AI. And you know, today you're this sort of, you know, sort of high flyer, you know, I think you announced that you're over a hundred million, uh, in ARR and you grew there like massively fast. And, uh, and that's incredible.

[00:33:28] Guy: Uh, but it didn't look like that in the early years of it. Can you tell us a little bit about, just a bit of an abbreviated story of the, uh, of the early years? 

[00:33:38] Victor: Yeah, for sure. For sure.

[00:33:41] Victor: Yeah, I, I think in many ways, I think we're a kind of a, a classical story of, you know, looking, looking like fools for three or four years.

[00:33:49] Victor:  Um, but then, you know, eventually it turned out that, that the kind of bet we made back in 2017, when we founded the company was, uh, was, was right. But there are stories that I, I'm from Copenhagen, Denmark, like, I grew up there, um, did a bunch of things by myself, worked at a daily startup ecosystem, uh, you know, knew that I wanted to build a company, knew I didn't want to build like accounting tools, um, and, and kind of business process things.

[00:34:12] Victor: I love science fiction, fiction. 

Guy: Say you haven't, you've sort of built something a bit more creative than that. Yeah, uh, uh, and, 

Victor: and back then I was like building like HR tools and it was like, I learned a lot from it, right? But I was just like, on a personal level, I love sci fi, I love Frontier tech, and I wanted to do something that. So I moved to London and I spent a year trying to figure out what I wanted to build.

[00:34:30] Victor: This was just when AR and VR were kind of like in, in, in kind of a hype cycle. The Oculus had just come out and everybody thought that by now we would definitely all be in VR headsets all the time. I, I, I love VR still to this day, but I, I kind of back then felt it's, it's not really there yet. Like there's, there's just like, there's like too many obstacles that's out of my control to build a big company around it.

[00:34:50] Victor: But I met Professor Matthias Niessner, my co-founder today, who, um, had also looked at VR, but he'd come to look at more of the content creation angle. And he'd done this, this paper called Face to Face, which really was the first time the world saw a neural network producing photorealistic video frames automatically.

[00:35:08] Victor: And this paper made quite a lot of, quite a big, big stir. And, and, and I saw it as well, and I just got obsessed with it. You know, I felt like this is super early, and of course it, you know, it barely worked. And we look back today, it doesn't look very impressive, but I just felt like something very magical here, right?

[00:35:23] Victor: I, I, this is gonna change first how we create content, and then it's gonna change content itself, as we've just discussed. And I just felt there were so many parallels. You know, like, uh, I, I love music. Uh, music is my big passion when I'm, when I'm not working, and especially electronic music. Uh, and I saw so many parallels to how we, you know, like,

[00:35:39] Victor: by now, like a lot of years ago, we started to digitize how you make music, right? You can sample things, right? You're having to use those, and with which you can build synthesizers, you can create sounds yourself out of nothing. Right? And everything that brought to us in terms of like new music genres, democratization of access to music production and so on.

[00:35:55] Victor: And it felt like, you know, what's, what's happening now is just that we're digitizing, um, video, right? And there's so many analogies to all the creative music tools that we're using. So I got very obsessed with that idea.

[00:36:06] Guy: Yeah. By the way, I didn't clue into that analogy. That makes a lot of sense.

[00:36:10] Guy: Which you've gone from whatever, a physical guitar to being able to reproduce that sound on, uh, on the computer as an analogy to you could produce a thing with a camera, uh, and now you can generate that on screen, so good job on that, like I love that analogy.

[00:36:24] Victor: Yeah, I, I, I think there was actually so much of like, um, my experience like thinking around these tools, uh, in my, in my, in my childhood that really, you know, it's one of the things that it's like in the back of your head that you're not like consciously thinking about all the time, but understanding how you make content digitally.

[00:36:40] Victor: I think it was definitely helpful. Anyway, um, we, we felt like, you know, let's try and start a company around this, uh, it took me a very long time to convince, uh, Matthias. 'cause at this point in time, I was 25 years old. I had no PI and had absolutely zero qualifications to start this company at all, but I was just obsessed with the idea.

[00:36:55] Victor: Yep. Um, we managed to get a, get kind of a team together. Um, we tried to raise some money, totally impossible, got turned down by like 80 - 100 investors, something like that. Everybody just thought, uh, you know, the idea was kind of crazy, which it was like, it, it was, you had to extrapolate a lot to, to see this actually panning out.

[00:37:11] Victor: But also at that point in time, especially in London, where everybody wanted to fund like PhDs, right? Who would go out and they would hire like the best three PhD friends and they would like to go on some journey to create some big AI company. And some of those companies, I, I think our strength actually ended up becoming the fact that I'm not a PhD and that I'm a person, someone who thinks commercially, but people definitely didn't like that. 

[00:37:33] Victor: At some point, um, my co-founder Steffen sends an email to Mark Cuban, cold. We found this email in an elite Sony hack. He sent a very short email just saying, "Hey, we're building this thing. We think you'd fit very well. You know, he understands technology, he understands media," to which he replies within three minutes, very interested, and sends a bunch of questions.

[00:37:51] Victor: We email with him for 10 hours nonstop, never get on a call with him. Um, and he ends up investing a million dollars, with which the company was founded. And the lesson there was that Mark had himself implemented the Face to Face paper at home with a tutor. He—we did not have to convince him at all that in 10 years time you can make a Hollywood movie from your laptop.

[00:38:10] Victor: He completely bought into that, he was essentially just evaluating the team. Right. Um, and I think that was, uh, like if you're doing something crazy, you want to find someone who shares your crazy vision of the future. Because with VCs, right, it’s so hard to convince them of something.

[00:38:25] Victor: They're not, they don't already believe. If it's not like, uh, if it's, if it's not like part of the current dark mouth, like the tech music.

[00:38:32] Guy: Yeah. And the VC world is very much, you only need to find one, but sometimes you need to talk to so many. I mean, uh, exactly. Uh, the Mark Cuban email is like, it's like a part of startup law now.

[00:38:42] Guy: You know, sort of a, sort of, I've heard that story now being told not by you, you know, of people just sort of in part as like, they'll hold on sort of faith on it, uh, and part sort of the craze of it. But I, I guess what I don't know is how many such emails, like the ones, it's not that you sort of picked, hey there, one person we're gonna email is Mark Cuban.

[00:39:00] Guy: Yeah. 

[00:39:02] Victor: We sent, I mean a hundred, probably definitely in the hundreds, maybe within the thousands, right? We're just desperately trying to find someone to, um, 

Guy: kKinda help you get going on the journey. 

Victor: Um, and, and I think, you know, looking back like it's, I understood why it was a difficult pitch. Like I had no, I was 25 years old, had no qualifications to start the company at all, at least on paper.

[00:39:24] Victor: Um, it was a crazy idea. It wasn't clear what the real use case was, if the technology was even gonna work. Um, I'm very grateful that, you know, that Mark took the bet on us. Um, and as it always is, right? Like raising money is like a milestone. But the real hard part began like after we got the money and we're like, okay, great.

[00:39:42] Victor: So no one believes in us. We have a million dollars. What can be realistically like build and sell? 

[00:39:48] Guy: And this is, and this is what year is this? Like 2017, you started the company. When are we now?

[00:39:51] Victor: This is 17. So we, we kind of, uh, we, we kind of in spirit, the idea originated in 16, but we raised the first money, in October, 2017.

[00:40:02] Victor: Um, and I think kind of like what we, what we essentially did when we founded the company, we did what you're supposed to not do, which is we bet on our technology, not on a specific use case or product. Right? Um, and so we had to kind of figure out, okay, what can we build backwards from? So we ended up with this idea of AI dubbing, which is like, you give us a real video, not a generated video.

[00:40:19] Victor: We'll translate that to a different language for you, um, but will not just replace the voiceover, we'll also reanimate the face. So it looks like you're actually speaking in Spanish, Italian, whatever. And I think the ease we had there was like, you know, makes a lot of sense. It's like you're an advertiser and if you live in Europe where you've definitely seen really bad ads, right?

[00:40:38] Victor: Where the mouth is completely out of sync in the German versions, if we can, you know, for 10K or whatever, fix that for you. And all your apps look like they were shot natively in whatever language you’re putting in. That makes sense. If you’re Netflix and you invest a lot of money creating content, and you could take all those films and make them look like they were shot natively in German or French, there should be some kind of economic—there are plenty of economic opportunities there.

[00:41:01] Victor: Right.

[00:41:02] Guy: I, I love how this is a European, uh, pain point, uh, or a pain point felt more in Europe, uh, on it. We had Matty on the, uh, from Eleven Labs on the, uh, on the podcast as well, and they also a bunch of years later started, uh, in that pain. But I think there's just a, you feel it more closely when you live in a, in a non single language, uh, ecosystem compared to the,

[00:41:21] Victor: It's very true because like the first, the first two funding rounds were very, very, very difficult.

[00:41:25] Victor: Um, not just because of this, but it was definitely also a thing that the American VC ecosystem didn’t understand the pain. Right. They kind of, like, you could explain it to them, yeah, kind of guess that makes sense, but they haven’t actually felt it firsthand. Like, you know, if you’re a company and you’re trying to penetrate many different markets, it’s actually really hard, right?

[00:41:42] Victor: And it's a big value proposition of the platform today. Like it's for, for Eleven Labs. But, but essentially that was the thesis. We built this thing, this is back when, you know, this is like GANs, where, where, like the thing back then, the technology basically, it did work if you look straight at the camera without looking away.

[00:41:58] Victor: And we had two PhDs sitting for like two weeks to create a 32 second clip. So it was more kind of like a visual effect. We were all kinda like visual effects, um, you know, a studio with proprietary technology. Um, and after having gone it for a few years, we did a, we did a million dollars in revenues like that.

[00:42:15] Victor: Um, which is, which is not terrible at our stage, right? It kept us alive. But it was very clear to us that first of all what we're building was a painkiller. And I'm sorry, it was a vitamin, not a painkiller. Um, people thought it was cool, but no one would scream and yell if we disappeared the day after.

[00:42:29] Victor: Right. We were not in control of the go to market because we were selling this to video professionals, like agencies, production companies, Hollywood studios. And we were such a small part of, like, we did this very famous David Beckham ad, right. And obviously, you know, 98% of the bulk of the work of making that ad was like someone comes up with the idea that David Beckham should do this ad, someone pitches it to David Beckham, someone arranges to film it.

[00:42:52] Victor: And then we were like the very small part at the end where we translated to different languages. Right. So we were not in control of our go to market at all. Like, we were basically at the whim of all these agencies deciding to work with us. Right. Um, but more importantly, it just didn’t scale.

[00:43:03] Victor: Like the economics just didn’t make any sense. And so we kind of went back to the drawing board. We just went out and spoke to hundreds, maybe even thousands of people, trying to understand like, what is video? You know, why do people make videos, um, who’s not making video but wants to make video? And I think that was one of the advantages we had being very young and being very naive, is that you can really approach a space with zero baggage, right?

[00:43:25] Victor: Um, and what we discovered was all these people in the corporate world, and like smaller companies, they're contacting us because they think our technology does something it doesn't actually do, but they really are very intrigued by it, right? And we can tell these people, like, they're all telling us they wanna make a video, but they can't today.

[00:43:40] Victor: They don't have a camera, they don't have a user camera, they can't get budget from their boss. And they're all saying roughly the same thing. Like, I'm making all this text content and this slide deck content, and I know that no one reads it. And if they do, they forget everything immediately after.

[00:43:54] Victor:  I wanna make videos. I've tried to make videos and it works much better. But the scalability of video production. It's just not there. Right. It's, it's, it's so much of a pain to make a video. You can't edit them after. And so we, we, we, we spoke to these people and we figured out pretty quickly that these people are not trying to make Steven Spielberg films, right.

[00:44:10] Victor: They're generally making very simple videos, like corporate video of someone talking to the camera interspersed with a screen recording or someone showing how to do something, picture of a product. It's a very simple concept, right? And of course for us it was like, well, that's actually pretty good, right?

[00:44:25] Victor: Because there's two things that are, uh, that are interesting here. First is that the domain of video they want us to generate is much, much smaller. It's basically a person talking to the camera, right? And secondly, the quality threshold is much, much lower for these people. If you go to a Hollywood director or some drones video agency, they will not use your AI tooling unless it looks exactly as good as the real thing because right.

[00:44:45] Victor: That's the bar, right? For these people. The bar is not real video, the bar is a PowerPoint. So they're valuing the outputs very differently. And, and that was the thing that we kind of latched onto. We thought those were very interesting and that got us to this idea of avatars. That you create this sort of digital persona of yourself.

[00:45:01] Victor: Or you take one of our off-the-shelf avatars, you just type in the script and you might get a video out. And then, you know, we launched that, and that started working really well. Clearly, we hit something there. We built the editor around it so that you could finalize the videos yourself inside this sort of PowerPoint-type UX, little collaboration platform or content management system and publishing and mm-hmm.

[00:45:20] Victor: How we're today. So yeah, I think it took us three or four years of pain to really get through. It was also one of those, I think it's a classical story of like, you, you'd rather be too early than too late, right? we managed like

Guy: As long as it's arrived. 

Victor: When, when the wave sort of hit us, we were just so well positioned, right?

[00:45:39] Victor: And we're years ahead of, uh, any of our competitors.

[00:45:42] Guy: Amazing story from a journey perspective. Also, I think super instructional for, for the world of sort of software development today. Because I think when you look at, you know, agentic creation today with software development and you look at the revolutions, I think that distinction about who it is for is so critical.

[00:45:58] Guy: You, you get such extremely, uh, sort of conflicting opinions for people about sort of coding assistance and about the vibe coding and those creations. Uh, and I think what people often overlook is. There's a key distinction between someone saying, Hey, hey, hey, I could have written this code. You know, in fact I'm very good at it.

[00:46:19] Guy: Uh, and this is just about sort of optimization. So does it produce code that really thinks about all the different small details that I think about, uh, and I'm amazing and that's a very, very high bar. Clearly something you'd want to attend. Um, but also you kind of start doubting, you know, is it really producing value here or am I just, you know, babysitting more time than I would if I had written it?

[00:46:39] Guy: And then you've got people on the other side who are using whatever it is, Bolt or Lovable. Uh, and you know, they, before they couldn't produce software, couldn't create software just like, you know, the user base you're talking about, which is they just couldn't create a video. So now you make it available.

[00:46:57] Guy: So like anything that is functional is, is already magic, uh, is already something that was not available. It still needs to be good enough to be something that they will embrace. Um, and I guess those worlds eventually converge, uh, or they don't really converge. They, they became, they become a gradient of the same, uh, domain because I guess we also anticipate that eventually AI part creation will, will permeate the stack and, you know, all the way up to quality.

[00:47:27] Guy: But there will still be different tools that you would need at different kinds of, uh, quality caliber and skillset orientation. Sure.

[00:47:34] Victor: I think, I think there's like the, in our world we talk about like, it's this like high stakes content or low stakes content, right? And I think it's like the same in, in, in the code dev world where, well, I think it's gonna happen, right?

[00:47:47] Victor: Is that we're gonna see a lot of this low-stakes stuff, like for personalized things. Like I make the to-do lists app. I feel like everyone always thinks they have the perfect recipe for a to-do lists app, right? And there’s nothing out there that really suits exactly what you want.

[00:48:00] Victor: That’s probably one of the things for, like, vibe coding it, uh, for someone like me who’s not, like, uh, like I can sort of duct tape things together—that might be really cool for me, right? And maybe I’ll even pay, like, I dunno, $5, $10 a month to just, like, run this somewhere without any kind of pain.

[00:48:14] Victor: But that’s very different from me saying, Hey, let’s, like, vibe code, um, a CRM system for Synthesia, right? Like, that is such a high stakes environment where the quality of the software really matters. The security really matters. So, it also feels like there’ll just be, as to your point and what we talked about before, right?

[00:48:30] Victor: Mine is like Hollywood films or like, uh, you know, a corporate person just like creating product marketing content. And in your world it's like, are you vibe coding a to-do list app for yourself or something like small fun for your family or maybe like an individual tool for your team that does them like data transformation.

[00:48:47] Victor: Are you like building a CRM system? And it feels to me at least a lot of their conversation the last two years, people kind of conflate the two or they think of it as one. Right? And I think it's just, it's not one like it's, it's, it's two like a very different use case. And of course there are many others and but only think it's like where we, where we've seen a lot of the, the new AI tools that work well are actually delivering value.

[00:49:06] Victor: Is this kind of more like low stakes environments where you enable people who haven't been able to do it before to do something new? Right. But in, in the kind of high stakes, no matter if you're building like enterprise CRM software or if you're making like Hollywood films, it's the. The tools are just like, not there yet.

[00:49:21] Victor: Right. But, they obviously will be at some point.

[00:49:23] Guy: I think, uh, maybe indeed. Let's sort of use this moment to, to talk a little bit about the people, uh, involved in, in creation here. Um, and so I I, I asked you before, you know, whether, you know, video skill sets, uh, or video editing skillset sets of today are relevant.

[00:49:38] Guy: And as I understood it, they're sort of like, they're helpful in the kind of crafting of the visual storytelling. Uh, so look, they're, they're a little bit available, but you're mostly sort of making video accessible to people that are, that are not, uh, um, not requiring that skill. I guess, how do you, you know, maybe two questions in one, you know, one is like the, the response when you do engage with people that are maybe stretching kind of your tools capabilities that do come from some, like, they're in charge of videos in the system.

[00:50:07] Guy: They do have some video editing, uh, capabilities, even if they're not the producers. Um, you know, how is, kind of their response to it. And then maybe we, we kind of go from there a little bit to say like, how do you see this playing out? Do you, do you, do you still see a, a place for a role for kind of expert video creators?

[00:50:26] Guy: Uh, or, or is this really meant to be like, quite substantially offloaded? Uh, from, uh, from, from the world of, you know, of, of jobs.

[00:50:36] Victor: Alright. So what we definitely see is that people with video experience are much, much better at creating Synthesia. They become power users much faster and they have a suite of tools they use depending on what they're doing, right?

[00:50:48] Victor: So obviously, I mean, they’re not trying to create Super Bowl ads inside Synthesia, but they really appreciate the tool for its ease of use and for the speed at which they can kind of churn out content, right? Um, I think what sets them apart is there’s some degree of technical ability—like understanding how video works and, like, you know, how you should place things, et cetera.

[00:51:08] Victor: But really actually think a lot about something as statutory as taste right. It's like mm-hmm. How do you make a video that catches your attention the first 10 seconds, right? That's not an, that's not a science. That's, that's kinda like an arc, right? Um, what avatar should you pick for this market?

[00:51:23] Victor: Like, well, how do you write a script that's like fun and engaging? How do we make sure the video's not like, way too long or like way too short? All this is like taste and that stuff that, that's taste that you build because you spend a lot of time thinking about videos and making videos. And I don't think, uh. The more videos you make, of course you'll get better at it at Synthesia.

[00:51:39] Victor: But it’s very clear to me that the people who understand the language of video from the start are much better at making videos. And I actually don’t think that will go away. I think that’ll remain true for, for, for like a very long time. The other end of the scale, with like these kind of new users, that’s something that they had to learn, right?

[00:51:55] Victor: So we helped them with toolings. We talk about this AI agent we have that can kind of, you know, make content on behalf of you. And we've come up with that kind of our own framework that we call FOCA for. Like how do you take a message that was sort of delivered in text the first time, but they didn't wanna make it into video?

[00:52:09] Victor: What are some good ground rules like doing that? Um. I'm very much of the opinion that I think, uh, people with, with video skills. Right, of course. You know, assuming that they kind of migrate onto this new tool stack, no matter if it's Synthesia if it's like more created tool, 

Guy: That they don't resist it too much.

[00:52:23] Victor: Yeah, exactly. They will be, they will be the best at doing this stuff. They would have to learn new skills, such as, what does it mean when a video is not just broadcasting? How do you personalize a video meaningfully different — mm-hmm, mm-hmm — just like a slot filled with words like saying, “Hey Guy, hey Victor, love Synthesia, um, you know, love Tessl,” et cetera.

[00:52:41] Victor: Like that's not gonna cut it. How, what can you do that's creative and using LLMs and agents to kind of, um, create this experience, right? Because there's so many open-ended questions there. I think video editors will be much better at doing it, at like kind of evolving into that for the right. For the casual users

[00:52:56] Victor: It’s, uh, it’s more a journey of getting up to speed with this. Um, and in designing the app, I love this design principle called the Maya, which is the most advanced yet applicable. Which essentially means you kind of have this sort of window, right, or like the things that are like, too scary for people to do.

[00:53:11] Victor: And then there's things people are very familiar with and you wanna figure out where you can, where can you sit in under this kind of spectrum, right? Where you can, in our case, this is said many times now, we kind of anchored on, on PowerPoint users actually. So like, okay, make it 75% like PowerPoint because that makes people feel familiar.

[00:53:26] Victor: But then we also push the boundaries of what you can do, right? Like there's some new functionalities we have to learn, and then slowly, as more and more people onboard, more, more people try some Synthesia, we move that window, right? So I think in, in, in, probably in, in like five years time or something like that, maybe, maybe less, probably the app will look very different than it does today.

[00:53:42] Victor: It won't be as, as anchored on, on PowerPoint. But we've taken the user base from something that we're familiar with to push and into something that's new and exciting. And, and it should almost for the people who've been with us for a long time, feel so seamless that you don't really think about it.

[00:53:55] Victor: It's like all of a sudden now you're just making conversational videos that are interactive. That are personalized by default. Um, and so on and so forth. Uh, right. So I think both those users are really, really, really important.

[00:54:04] Guy: I think, uh, I love the Maya principle as well, and I, I think probably the best, uh, uh, kind of example of that is the gestures on the, on the smartphones, right?

[00:54:14] Guy: Where initially it’s like, what’s the gesture? And at some point, there were some big changes where you can actually start being demanding there. And people want some form of like, you know, funky way you would twist three fingers too, uh, sort of achieve some specific, uh, examples, and that might be okay.

[00:54:28] Guy: Um, I think so. So with that evolution, I guess, do, do you still see, uh, kind of, I guess this remains the notion of like video creation, it sounds like you still see that as a craft, uh, and it's just a different craft than it is today. What about the, the, the, the production, I guess sort of the whatever, the person with the camera sort of shooting, uh, shooting a video.

[00:54:49] Guy: Do you see that mostly disappearing? Take a longer timeline, right. 10 years out?

[00:54:55] Victor: I, I do think that's, that's probably like, uh, more at risk for sure. Um. I would still say that something like video, I mean, I came to video having never done video before, right? So I learned everything from starting a company.

[00:55:11] Victor:I think it is one of those disciplines that when you get into the weeds of it, most people don’t understand how much work it actually takes to create, even a good looking corporate video of someone just talking to the camera. When you watch the video back on LinkedIn, it looks like, oh, whatever, someone put up their smartphone and hit play and said something.

[00:55:27] Victor: in reality like there's, how does the light fall in, right? How does the camera move? What should the angle be? What should the framing be? Um, I guess all these are kind of like stylistic decisions that you make. And of course those are not usually made by camera operators, but there are degrees of like choices camera operators have.

[00:55:42] Victor: So I would still say, I think a lot of these people are like, you know, have a much better foundation for being the one who picks up these AI tools and then become the experts. But I think it will require a migration of their skillset. And I definitely think that the job of, uh, carrying a camera and filming things in the real world, that will probably be a less attractive, uh, business to be in, in, in five or 10 years time.

[00:56:05] Victor: Right. Just like being a typewriter, um, is also not very, um, a very true business to be in today. It's always very hard to, I think, you know, predict exactly what's gonna happen. And I actually don't think that a real video is gonna die and all of a sudden everything is gonna be generated. I think we'll put a lot of value on real video in some sense.

[00:56:22] Victor: Just like we still like to see people playing piano or guitar or, um, we just listen to Vinyl. Like there's, there's an emotional part of it and there's also something I think we just, as humans, we, we, we ascribe some, some value to that, right? We'll still read physical books despite the fact you could read everything on an e-book. I think it’s not going to go away, it’s going to coexist.

[00:56:43] Victor: Um, probably what is gonna be more risky is a lot of the business content, you know, where that, that's sort of per definition. There's less romanticism, nostalgia, creativity around it. Um, so I think it is a market that will shrink, but I don't think it's gonna disappear completely.

[00:56:59] Guy: Yes. No, I love that.

[00:57:00] Guy: Sort of super succinct on it and I, uh, um, I love how you keep coming back to kind of, I guess, historical references of music. Here is a great analogy of, you know, what, what has happened there. And, uh, I do that for a bit and I think clearly we kind of face that question in the world of software development of what will happen to software developers.

[00:57:17] Guy: And I think it's hard to predict the specificity, but like software development as a craft, hard to sort of see that going away or becoming any less important. Uh, coding as a skill, as a sort of the means of doing it probably changes, uh, or, or diminishes and maybe becomes a bit more the, you know, artisanal carpentry versus the factory line, uh, production, uh, example.

[00:57:43] Victor: I think also one thing we see is that, we did an analysis recently of how people prompt like our AI agent to help them make videos. Right? And I don't think it's a surprise and I'm, I'm curious what you're seeing, but I'm pretty sure it's probably the same in software development, right? Which is just, um, maybe an expression of like this idea of like, taste.

[00:57:59] Victor: It's just like how good people are prompting these things, right? Like if you take someone who's never made a video before and you ask them to like, prompt a video, describe what you want in the video, they're clueless most of the time, right? It's like, make me a video about our vacation policy. It's like if you get someone who knows video and who's, uh, an expert on video, right?

[00:58:17] Victor: You get this beautiful prose. They clearly have a vision in their mind, like what they want or what they think the video should look like. And so they can help guide the system to actually create that for them, right? And I think that's the thing that's pretty hard to automate, right? Like we still can't go into brains and, like, you know, kind of hijack internal LLM to, like, come up with the best idea for how they should put something out in the real world.

[00:58:40] Victor: And if you give a very limited prompt, like that is, that is what the system has to work with most. I mean, you can add a lot of context, right? But I think that that's definitely something that's pretty interesting, right? Like you, you see this in, in kind of people's ability to prompt, um, that the more taste they have, the more experts they are, the much better likely  these systems.

[00:58:57] Victor: I have a lot of friends that are assistant illustrators, and like when I go in and try to use Midjourney, for example, I face the same problem. Like, I don't know what to write. Like, I want a picture in a jungle, but when I have my friends, the illustrators, do this, right? It's like a free paragraph thing, and that generates, like, you know, 10 times cooler than whatever I can do.

[00:59:15] Guy: Yeah, yeah. As well as their assessment of what gets created. And uh, I think that's a really good spot. And I think in the world of software that translates to kind of architectural and product skills, uh, probably more so they're still kind of elevated at a higher resolution than, you know, use this library or write that line of code, uh, much more about, you know, the, the priorities and the, the bigger picture. So just Victor, before I, uh, again, so many more questions.

[00:59:41] Guy: So step I sort of, uh, key to ask, but I think we're running outta time maybe just to, uh, uh, move out of all of these and really talk a bit more about society. You've had an interesting, uh, uh, sort of statement in a TED talk recently where you kinda made this bold claim where you think maybe even as a society, as video creation becomes easier, uh, text will play a smaller and smaller role in our, in our world.

[01:00:07] Guy: And switch to video. I guess, what's your, what's your prediction around the societal change that this type of video generation will bring.

[01:00:14] Victor: Yeah. So, uh, I had this, uh, very provocative one-liner, right? Which is that, uh, your kids' kids are not gonna be reading and writing. They're gonna be watching or listening almost exclusively, which definitely like, set people off, uh, which is also, uh, 

Guy: it definitely gets people going.

[01:00:28] Victor: Yeah. It's really going, which is also intentional, right? But I think the core idea just is, is, is pretty simple. If you look at the history of media technology, right? It's very clear that we're always in this, you know, kind of trajectory towards more visual content and more interactive content, which essentially is content or like information consumption.

[01:00:46] Victor: It feels more like the real world, you know? Um. There has been a huge barrier to that historically speaking in the fact that, like making a video, for example, is way more work and cost and, you know, time et cetera and skill than this, to write a document. So most of the things we've kind of been consuming has just been text.

[01:01:02] Victor: Text is an amazing technology. Um, text is sort of like a compression, um, algorithm in some sense, right? Like, we have something that happens to us in the real world and we can kind of like, put that down and we can store that in a scalable way. And of course, you know, the world is built on textbooks. We've been here without, without text.

[01:01:16] Victor: But the question is, if the friction to storing information, ideas, or knowledge in a video format versus a text format, if it literally is as easy as that, right? What would that mean for text? Will it mean we'll still be reading and writing? Or will it mean that we'll just default consume almost everything as, as, uh, as, as visual or kind of like real life content, right?

[01:01:38] Victor: It's a freaky idea. But if you imagine that, if you believe that AR and VR in 50 years or a hundred years, the firm facts will be that we're just wearing a lens or something, and we have screens at all times. I don't think it's a completely preposterous claim to say that we probably will be consuming most of the information, um, as, uh, as as video and audio and, and in interactive formats.

[01:02:02] Victor: A lot of people have, um, very strong feelings about that. And, and I, I kind of get that right. I think we, we, we, we, we kind of really value reading in society. Mm-hmm. Like sit down with a book is like, is it probably represents a lot

[01:02:13] Guy: It represents the, the focus, uh, area, the, the exactly explicit attempted learning or sort of, you know, immersing in a story.

[01:02:23] Victor: But I think we have to ask ourselves the question, right? Of like, if you're learning something, um, I think few people would say, oh, we think it's better to sit down and like read a book than you're in a classroom with the world's best tutor who can talk to you and interact with you and also has a whiteboard, but they can show you relevant pictures or images or graphs or flow diagrams, whether the thing you're learning, right?

[01:02:43] Victor:Like most people would probably agree that that's a better way of learning. Um. That's probably gonna be possible in the not so distant future. Right. Um, and so I think it'll be interesting to see how all this stuff pans out. Right. Of course. I'm being a bit tongue in cheek when I say that. I think there'll probably still be a use for text, like texting, et cetera.

[01:02:59] Victor: But I do think a majority of the way we consume information is going to be video driven. Um, and I kind of ended on this, I think the thought that I have a lot, which is it's, it's so interesting how I put so much small value on reading books, including myself, right. I read less than I did five years ago for sure, because I've just found that off.

[01:03:16] Victor:I'll buy a book on Amazon and it's 200 pages and I'll spend like a day reading it and feeling half the time reading it, feeling like I'm, they're just rehashing the same examples over and over with so much filler and a lot of these book fight. Right. Which makes sense. 'cause you can't, you can't, you can't publish a book that's 50 pages long.

[01:03:32] Victor: Right. You can't sell Exactly. 

Guy: You can’t sell it for sort of any real amount. Yeah. You'd feel cheated. 

Victor: Yeah, exactly. And so often I've just found that when I wanna learn something new, right? I go on YouTube and I watch a 25 minute video, but I feel almost guilty for not reading the book and watching the YouTube video.

[01:03:46] Victor: Um, and that's, so that's, that's, that's, that's something that's just like, I think philosophically kind of interesting. Like why that clearly, like if it's, again, if it's music or something, right. Obviously much better to learn that on a YouTube video where we can hear something and see someone playing as opposed to sitting and reading sheet music or reading a book.

[01:04:01] Victor: It's just. It makes no sense to try and read music from a book. Right. And I think a lot of cases like that. Um, and then maybe the last, uh, thought I had on this was, I think there's this, uh, there's always this sort of moral panic about like, new technologies that's been consistent throughout all world history.

[01:04:18] Victor: And one of the things we're hearing a lot about now is that only young people can't focus, the attention spans are decreasing. Uh, they just need, like, you know, quick dopamine hits from like TikTok videos. Um, and you know, maybe that's right, but I have this prerogative idea, like, what if young people are just becoming so good at consuming information because they consume at such high velocity all the time, that they're actually just really bored by a 200 page book?

[01:04:41] Victor: You can't, you can't keep our attention because why would you, you know, you can watch a YouTube video that's super dense, super interesting, it's exciting, it communicates extremely well. And then now you're forced to sit and read this like, 200 page book, which is, is, is not very exciting. Right? So the question is like, are we getting better at consuming information?

[01:04:57] Victor: And that's kinda reflected in the content that we consume. Or becoming lazier and our ADHD brains are just like taken over and we can't focus on anything. I don't have the answer, but I feel like that's an angle that's very rarely discussed, right? Like, everyone always goes to kind of like the negative impact of saying like, people can normally watch like 30 second TikTok videos.

[01:05:15] Victor: And I just, I, I just, I thought that was kind of an interesting idea to try around with.

[01:05:16] Guy: First of all. Like, I love that you're sort of, uh, challenging the conversation. I think like as long as, uh, as long as you're sort of doing it to sort of facilitate a conversation versus, you know, I don't think there was any like, uh, agenda in sort of, you know, sharing this.

[01:05:30] Guy: It's more about sort of the, uh, the narrative. And I, you know, I initially was dismissive when I sort of saw this like, oh, Victor's just being, uh, just sort of, you know, stirring, uh, stirring a mess here. Uh, but then I did think about, uh, emojis, you know, are actually like a good example probably of a live, uh mm-hmm.

[01:05:45] Guy: A live example of something that, that literally is happening. People are switching to a visual. Uh, and, and I even like an old person like me sort of still sort of, uh. Appreciates sort the value of conveying an emotion in that fashion versus a bunch of words. Um, and I guess the, the part that I, uh, am still kind of uncomfortable with is, is almost thinking about the degrees of freedom that we leave ourselves.

[01:06:10] Guy: Uh, and so I think the difference between watching a video or having someone illustrate to you versus reading a book is that it leaves room for imagination, right? When you read a book, then, then you actually get a lot less, you get it takes more time and, but to an extent you get a lot less information.

[01:06:25] Guy: 'cause there's only so much that can be described, uh, in the book, especially when you're talking about, uh, uh, uh, maybe in in business books it's a little bit different, but like in a storytelling type setting. And so it leaves you the degrees of freedom, uh, you know, that, that we're talking about leaving it to the LLM.

[01:06:41] Guy: Uh, and I guess the question also is how do we, how do we balance those on it? But, I agree with you that society will move towards. Kind the path of least resistance, uh, and, you know, what is, uh, what is sort of a, a means of sort of running and and scale. Um, so it's interesting to sort of see how things play out.

[01:06:58] Guy: Well,

[01:06:59] Victor:I totally agree and I think the document around, um, your imagination, right, is the one that I’ve met with most often, and I think that's very true for sure. Um, in a video, you're generally served everything and you have to do less imagining, um, but on the other end, right? Like a lot of that book could be consumed as like a podcast, for example, or like audio, get that imagination.

[01:07:19] Guy: Yeah. And, and that, that's an example. Many people listening to this or doing precisely that and, you know, I'm a big audio book fan and so,

[01:07:26] Victor: yeah. And, and I mean, I'm, I don't have kids, right? But, uh, but if I did, I'm also pretty sure I wouldn't be like, Hey guys, you know, just, just stay on TikTok, you know, let's ditch the books.

[01:07:34] Victor: It’s not that I'm kind of like, you know, uh, that I'm, we'll look back at this period in time and this evolution, very different than we are right now. 'Cause we always do, right? Like my favorite example of this is, if you go back to like the forties, um, when radios kind of became mainstream, something everyone had, right?

[01:07:51] Victor: he parents, you, you could find all these like old newspapers, uh, clips, right? Where parents are outraged that their kids are like sitting in their rooms and they're listening to these like cheap low-quality radio novels, right? They never leave their room. They don't wanna go out, they just wanna sit in their room and listen to these novels, which is exactly what people say about things like, TikTok videos or computer games, whatever today.

[01:08:10] Victor:And, you know, just in my childhood, I grew up obviously spending a ton of time on my computer playing lots of computer games. I remember this was when I was, uh, I guess like around 10, 11, 12, 13 years old, something like that, right? And there was a very real fear that you would become a school shooter if you played too much Counter-Strike, right?

[01:08:28] Victor: Or that you become a violent person. Um, there’s famously a Senate hearing around Mortal Kombat, right? 2D model, which, um, people were outraged 'cause they were convinced that, you know, this would create a generation of young violent men that had learned to fight from playing too much 2D Mortal Kombat.

[01:08:47] Victor: And it’s like, we look back at this stuff today, right? It is absolutely ridiculous. There’s a kernel of truth to some of it for sure. But I think, well, I, I think we’re, I’m very curious to see like in 30 or 40, 50 years’ time when I’m gonna be sitting scrolling TikTok in the retirement home and I’m gonna be like, I was right at the time.

[01:09:03] Guy: We’ll see if that arrives. Uh, Victor, this has been excellent. Thanks a lot for, uh, coming on the show. Congrats on sort of the amazing journey and looking forward to seeing it continue both, uh, society-wise and as the business evolves.

Subscribe to our podcasts here

Welcome to the AI Native Dev Podcast, hosted by Guy Podjarny and Simon Maple. If you're a developer or dev leader, join us as we explore and help shape the future of software development in the AI era.

Subscribe to our podcasts here

Welcome to the AI Native Dev Podcast, hosted by Guy Podjarny and Simon Maple. If you're a developer or dev leader, join us as we explore and help shape the future of software development in the AI era.

Subscribe to our podcasts here

Welcome to the AI Native Dev Podcast, hosted by Guy Podjarny and Simon Maple. If you're a developer or dev leader, join us as we explore and help shape the future of software development in the AI era.

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

JOIN US ON

Discord

Come and join the discussion.

Join

JOIN US ON

Discord

Come and join the discussion.

Join

JOIN US ON

Discord

Come and join the discussion.

Join