Sar's Scatter Brain
Posts
Rise of Generative AI will be comparable to the rise of CGI in the early 90s

Rise of Generative AI will be comparable to the rise of CGI in the early 90s

My chat with Cristóbal Valenzuela, Cofounder & CEO of Runway

November 01, 2022

Today’s Scatter Brain is brought to you by AngelList!

AngelList Stack is for startups that want faster fundraising, cleaner cap tables, and high-interest banking all in one place. Scaling companies such as Abound, Harness Wealth and Syndicate migrated from other vendors in less than a week with zero legal fees to gain an unfair advantage in managing their back-office. Learn more by signing up here.

Runway, the video-editing software startup, has gotten much attention on Twitter over the past few weeks. It has shared product demo videos of really cool and magical features for editing videos. That has recently captured the zeitgeist amidst waves of recent machine learning developments in content creation. What many might not know is Runway is not a new startup. It raised a seed round from Lux Capital and Amplify Ventures in 2018 and has gone through an evolution. I reached out to Runway’s CEO, Cris, for a deep dive into their journey.

In this conversation, Cris and I talked about:

Being in the limelight after years of toiling
Early sentiment around AI in content creation
Runway’s evolution from a models directory to video-editing
Video editing as a commodity feature
Importance of video-based storytelling to companies
Product focus on speed
Web browser as a tool-making platform
Command line as most important UI of next decade
High product engineering velocity
Underrated ideas in AI
Ethical dilemmas around generative AI

Sar: How is your team feeling? You spent years under the radar working on technical problems and are suddenly in the limelight.

Cris: It’s been really rewarding to see people get excited about what we’re building and start to understand not only what’s possible but what’s coming very soon in this space. As someone more used to building behind the scenes vs. being in the spotlight, it’s definitely been an adjustment. But while that buzz has been fun to see and be a part of, it’s really just a manifestation of what we’ve always believed has been possible and a manifestation of all the hard work, the team, has done to make it a reality. Seeing the world at large come to the same realization we’ve always believed to be true, and embrace it the way they have, helps fuel us to keep building products and features that push the boundaries of what’s possible. So the team is feeling excited about what we’re building.

Sar: You did research at NYU and were around emerging technologies used for art and expression. The state of machine learning in creative domains wasn’t nearly as talked about outside of research circles back then. What were the ideas you were most excited about? Talk about those days.

Cris: Before studying at NYU, I spent much time understanding computer vision applied to image and video understanding. I was fascinated by new paradigms of computational creativity and what neural techniques might enable and mean to artists. As you said, at the time, the application of machine learning in creative domains was pretty nascent, and we had a lot to learn about what was possible in that space. It’s something that I wanted to investigate. Long story short, I went down a rabbit hole and became so obsessed with the topic that I eventually quit my job, left Chile, and enrolled in NYU's ITP to study computational creativity full-time. While I was at NYU, I was surrounded by artists. So naturally, I spent significant time building tools for them. This has heavily inspired how we build Runway today. The first iteration of Runway was built on the same principles as how we build Runway today: AI is a new computing platform that requires new expressive tools.

Initially, market maturity was the biggest challenge we set out to build Runway. This tech was such a new concept that people either weren’t fully aware of or were struggling to grasp, so we had to make sure there would be a market for what we were trying to build once we built it. It was something we had to define ourselves since the application of AI in various industries, especially for content creators, was either brand new or didn’t really exist yet. It seems like a no-brainer today when you look at how many companies across so many industries are incorporating AI, but that wasn’t always a given.

Once we determined market viability, the challenge became overcoming a lower-quality model to build our vision. Initially, we were working with models and techniques that were extremely nascent. Getting our desired outputs from these early models was something that we worked to improve every day.

Sar: You started Runway in 2018. It was a directory for others to experiment with machine learning models. Over time, people started asking for models that would help them simplify and speed up the workflows involved in video editing.

Cris: As you said, in our first iteration of Runway, we built thin abstraction layers on top of models. The product enabled users to deploy and run machine learning models easily. Abstracting the inference and training process with a visual interface. We started here because we knew a fundamental shift was happening around a nascent technology that needed to be experimented with and tested.

As the model directory and user base kept growing, we started seeing a usage pattern emerge in video creation. Video editors and filmmakers were coming to Runway because they saw huge potential in leveraging models to help them reduce their manual workflows. An example would be manually rotoscoping objects frame by frame. And so, a regular request we started to get was around workflow optimizations: "Hey guys, I saw you have a model to do X. Can you also do Y and Z? I hate having to do this by hand”. This is how our Magic Tools like Rotoscoping, Inpainting, and Clean Audio were born, and it just made sense to continue creating for the video space.

Sar: How did the vision for what Runway is today come about?

Cris: As we started to build the first prototype of this new approach to video editing using all the learning from the model directory phase, we quickly realized the more expansive impact of this approach. By simplifying the editing process, besides cutting the time and cost of making video, we are democratizing video editing and filmmaking at large, opening the door to a whole new generation of creators to use software in a more intuitive, collaborative, and powerful way. We started by building models, which we now call AI Magic Tools, specific for video editing tools like Green Screen and Inpainting. We realized we needed to serve the entire video production process.

Sar: What was the top challenge as you made that transition?

Cris: The biggest challenge in transitioning from the model directory to a video editing platform wasn’t on the AI front. We already knew how our models would help enhance and expedite our users’ workflows. The challenge was building out the fundamental editing features to ensure we provided enough baseline value. We had to understand those core features, which were the most important, and how to build them.

Sar: Right; you have to get to some level of feature parity in a well-defined market to win over professional users who have spent years using other tools. The novelty is often not sufficient to drive adoption. How do you think about the market?

Cris: Considering video editing as a commodity, it's not enough to build a video engine with marginal improvements that come in a pretty package and expect to convert users away from the legacy tools. It's a very saturated market, and having classic editing feature parity is the baseline; it’s not a differentiator. To actually leapfrog the incumbents, you need to have a strategy to consistently build innovative products that are net new to the market. If you build those great products, everything else will follow.

At Runway, we don’t even consider incremental improvements to maintain the industry’s standard and be marginally better. We constantly think about how to optimize that process and create brand-new tools for that new generation of creators and companies. That’s where these new tools, like text-to-video, come into play. These tools will unlock the world of video creation for a much wider audience of people who don’t know how to use these complicated and technical legacy products.

Sar: Let’s talk about the text-to-video creation you mentioned. People can type prompts to produce a video fitting that description. I, like everyone else, had a “holy shit” reaction to your demo video! Up until now, Runway has been about the video editing process. That meant your users have had to capture videos using other tools and then go to Runway to edit them. You mentioned that video editing is a commodity. Talk about how you think about differentiation. Has creation always been part of the vision?

Cris: When we think about video editing in Runway, we consider it a feature inside a general-purpose tool, not a core differentiator. That’s where video and image synthesis generation and content automation research comes in. That is what ultimately differentiates Runway. To date, the application of that technology has largely been in the editing sphere, but we’re seeing more and more what’s possible in the creative sphere and are leaning into it with new tools like Text-to-Video. These tools help automate creative processes and allow our users to bring their visions to life better. That lean into creation is not a new vision for Runway. It’s something we’ve always envisioned; it's just been a matter of building the tools to make it possible. Technology is now in a place that allows us to expand in this area.

Sar: Your homepage emphasizes speed. Why do you care about speed?

Cris: One of the core moats of Runway is speed, but another important moat is accessibility. The emphasis on these two areas is intentional, and they help simplify the entire video editing & creation processes.

More and more companies realize the need to make and produce videos on the professional front. Being able to build consistent narratives is extremely important for a lot of businesses to reach their audiences and clients. The truth is that video converts much better than any other format regarding customer engagement.

The obvious question, then, is how do you create good content? How do you create professional videos without necessarily engaging with an agency or a professional studio to make them? Most importantly, how can you keep making new content daily? That’s where speed is paramount. Hiring a production house or an agency might yield a video a month at a high cost. And that’s true for ads and creative videos. If you need to edit a webinar or an interview to post on social media, engaging with an editor might be too expensive and lengthy to consider in the first place. You need video, and you need it fast. This explains why video continues to see strong tailwinds. Many customers across segments consistently rank video among the most important creative mediums to invest in and work in the future.

Sar: The Adobe-Figma deal has put the ideas of the multiplayer mode and browser-based creative software front and center in the zeitgeist again. Talk about how you think about them in making creative tools.

Cris: The browser has become the new tool-making platform. We’ve always felt strongly about it as a foundation for our products. Today, you need to build on the web. It’s that simple. The browser unlocks game-changing functions like multiplayer, which increases the speed and expands the scope of collaboration. From a technical standpoint, a few relevant factors have contributed to the consolidation of the browser as a sandbox on which you can build almost any application. WebAssembly has matured remarkably, enabling tools like Figma to feel native. Talent also matters. Hiring and finding talent to build on the web is easier than any other stack. The browser is becoming a full-featured OS. We can do things on the Web that weren't possible a couple of years ago.

The idea of opening a browser tab and immediately having access to countless creative tools is very powerful. For Runway, making a video should feel similar to how you collaborate on a Google Doc. There’s no need to update versions, share assets among your collaborators, or come up with a complicated version control system. This is becoming table stakes for any meaningful tool today.

Sar: You recently tweeted “the most important user interface of the next decade” with a picture of a text input box. On the one hand, it is a thought-provoking take that appreciates how far we have come with machine learning models that convert text inputs into images and video outputs. Text, as a medium of expression, is more accessible than pictures and videos. On the other hand, it’s really funny that we have come full circle in the long arc of computing that started with command-line interfaces. Can you expand on your thinking on this topic?

Cris: Making professional content should be fast and easy, both on the creation front and editing front. To date, the world of content creation and editing has been locked behind the complexity of legacy tools and platforms that either take a lot of time to learn and use or a lot of money to find & outsource to other people who already know how to use them (and then they still take a long time to create). Creative tools should be available to everyone, which we’re trying to help unlock. Using text and natural language as the engine to create video levels the playing field to anyone who has an idea and can express that idea in words.

But even with all of these advancements, we’re only scratching the surface of what will be possible. I believe we will be able to generate a film in the near future completely. When I say generate, I mean the actors, voices, scores, b-rolls, and sound effects. We will essentially be able to generate every scene in a full-length film. The rise of Generative AI, among other techniques, will enable a new class of filmmaking and video-making possibilities. It will be comparable to what happened with the rise of CGI in the early 90s. If you pair this generative content with the automation tools we’re already seeing, the next generation of creators will be equipped with a unique toolbox of expressive tools that were previously unavailable. It will revolutionize industries forever.

Sar: I cannot wait for a future where I can co-create everything from the script to the video for a TV show with my computer! Your team has research and creative backgrounds. When I look at the backgrounds of early folks at similar companies in similar domains, I rarely see a team similar to yours.

Cris: We have an incredible team, and that’s because we have an intentional focus on outcomes rather than social statuses in our hiring. We place an outsized weight on hiring people with diverse backgrounds because it’s critical to our product evolution. AI is a new platform, and new things have to be invented. The best people to invent new things are deeply creative people who come from unique backgrounds and can problem-solve in ways domain experts simply can’t. Our team isn’t afraid to try and fail and learn in the quest to build what we want to build, and that flexibility and comfort in unchartered territories are paramount in this space. Because of this inherent mindset, our team had no problem pivoting our product along the way.

Sar: Runway does seem to have a high shipping velocity engineering culture.

Cris: Startups have to move quickly to survive. They have to innovate and build quickly. Maintaining our team culture that values speed is incredibly important, and it’s something we look for and value in our hiring process. Occasionally things are shipped before they’re 100% ready, but part of our team culture is about failing fast and learning fast, so we take those moments as learning opportunities and iterate/optimize our process going forward.

Sar: What do you think we are not paying enough attention to in the domain of creative AI?

Cris: Multimodal systems. Large language models and models with shared text-image latent spaces, such as CLIP, enable new ways of interacting with software and synthesizing media. Diffusion models are a prime example of the power of such approaches. Runway Research is at the forefront of these developments, and we ensure that the future of content creation is accessible, controllable, and empowering for users.

Sar: There’s a raging debate around thorny issues like creator ownership rights and whether we are rooting for a world where we have lesser and lesser original creative work done by humans. I’m curious about your thoughts on those debates.

Cris: Like any emerging technology that challenges long-standing paradigms, there are questions. We don’t believe this new technology will replace human creativity; we view it as a tool to help enhance human creativity. And not only will we see human creativity enhanced, but we’ll see so many new people able to express themselves through creative multimodal systems: 3D, image, video, and audio as a more accessible medium. Some skeptics will focus on the perceived negative impacts, but the benefits are vast, which is why we build what we build.