From Letters to Lights, Camera, Action: Text-to-Video AI Generation

Written by Ivana Kotorchevikj AI for Business & Education

Published 2 years ago / updated 2 years ago - 12 min read

You know how AI is pretty much-changing everything we know about the world, right? Self-driving cars, voice assistants that almost seem human, you name it. However, one frontier that AI is currently shaping with unprecedented excitement is the realm of text-to-video generation.

This powerful concept allows us to transform mere strings of text into a series of moving images, enabling a seamless journey from a writer's mind to the viewer's screen. Imagine typing a script or a story on your computer, and then watching as an AI tool magically brings your words to life as a video. This is no longer a subject of science fiction. Instead, it is a rapidly emerging reality, where prose is pixelated, and narratives get a new dimension.

Through this article, we aim to journey into this fascinating world of text-to-video AI generation. We'll delve into the underpinnings of how these AI models work, the tools that are leading the charge, and the transformative potential of these advancements in storytelling and content creation.

What is a text-to-video generation?

Text-to-video AI generation is an advanced area of artificial intelligence that deals with the automatic creation of video content from written text. In essence, this technology can take a piece of written text, understand its context, and generate a video that visually represents the ideas and narratives within the text.

This process involves several different areas of AI, including natural language processing (NLP), computer vision, and machine learning.

Natural Language Processing helps the AI understand the content, context, and structure of the written text. It allows the AI to interpret the text much like a human would, understanding not just individual words but also sentences and the relationships between different pieces of information.

Computer Vision and Machine Learning techniques are then used to create the video. These might include object detection and recognition (to understand what objects need to be in the video), image synthesis (to generate images from scratch that match the text), and temporal modeling (to understand how the video should change over time).

🔎 Peep under the AI image generators' hood and explore how they work.

Text-to-video tools

Now that we have understood the workings behind text-to-video AI, let’s meet the key players in this exciting field. These are the tools and technologies that are making it happen - the rockstars of text-to-image AI.

Runway’s Gen-2

Runway, a Google-backed AI startup and the creator of the Stable Diffusion image generator, recently created a buzz in the generative AI community.

In early June 2023, the AI startup launched the Gen-2 - a model that generates videos from text or images. Runway’s Gen-2 model is the descendant of their previous model Gen-1 released in February 2023. However, Gen-2 makes headlines as the first “commercially available” text-to-video model, unlike many other models still in the research phase.

https://www.youtube.com/watch?v=jWXiMBwkfwc

Runway’s platform has the usual interface as image generators. Signing up and accessing the video generation dashboard from the Runway home page is fairly easy. When you enter the app, you can easily find the option to generate videos with a text prompt or by uploading an image; you can choose to use Gen-2 or the previous Gen-1.

Runway

For the sake of this article, I tried Gen-2 to generate a video with a text prompt. After entering the prompt, you have the option to preview 4 image variations and choose 1 to generate a video. It takes up to several minutes for Gen-2 to generate the video.

Enter the prompt in Runway

Choose a preview to generate a video

Gen-2 generated video

The free Runway version offers 101 seconds of video generation. If you want to get more seconds, and generate videos faster, you can upgrade your plan and get additional benefits, such as shorter wait time, upscale resolution, removed watermark, unlimited video editor projects, more asset storage space, and more.

However, as with all other pioneering AI breakthroughs, Runway’s Gen-2 videos leave much space for improvement. Users that have tested Gen-2 point to some gawking shortcomings.

Kyle Wiggers from TechCrunch points out several limitations of Gen-2. The first and most obvious is the 4-second frame, which gives off a slide-show-like look. He also highlights the fuzziness and graininess of the AI-generated videos. Another Gen-2 shortcoming is its inconsistency with human anatomy and physics, customary to many AI models. Kyle points out that Gen-2-generated videos contain human faces that look doll-like, glossy, emotionless eyes, with pasty skin that looks like plastic.

Nevertheless, we have to give Runway some credit for breaking new frontiers in text-to-video AI generation. Gen-2 is still an amazing breakthrough in generative AI, and is only the beginning of AI-video models and what they’ll be capable of in the future.

More importantly, Gen-2 helps people create videos only with their imagination and a computer, without the need for fancy equipment, expensive cameras, and painstaking editing - just like Runway’s slogan says, “No lights. No camera. Just action.”

After all, Gen-2 may not be taking on the Hollywood blockbusters or putting animators, CGI artists, and filmmakers out of jobs (thank God!), but it can be used to create some amazing videos. Check out these 12 incredible videos generated with Gen-2:

https://twitter.com/heyBarsee/status/1651961767810179072

🚀 If you are looking to create AI images, check out how Leonardo AI works and find out example prompts to use.

Vimeo’s AI-powered video editor

The video platform Vimeo announced it integrates AI-powered editing features on its platform. Vimeo’s AI tool suite is already available in their Standard Plan from July. Vimeo states that its AI features will allow customers to create a complete video in a matter of minutes - everything from creating scripts, making reels, and announcement videos to extracting quotes for marketing snippets to hosting virtual events and meetings.

Vimeo AI video-editing features Image source: Vimeo

In essence, Vimeo added three new features to its service:

AI script generator that creates scripts based on text prompts, including parameters such as tone and length.
Built-in teleprompter that allows users to record footage and adjust font size and timing.
AI-text editor that automatically detects long pauses, filler words, and awkward breaks such as ‘ah’ and ‘um.’

The new Vimeo features aim to break the barrier to entry-level video creators or non-skilled users who want to create videos, but don’t have the necessary video editing skills and knowledge.

Although Vimeo doesn’t offer to create a video with a text prompt from scratch, its AI features help users in crucial aspects of the video-making process.

The new AI capabilities are similar to what Adobe did with their Generative Fill feature, which allows users to add, extend, or remove content from images using simple text prompts without damaging the original image.

Although video is the most effective method to communicate a message, video production sets a barrier for people to use it most effectively. Vimeo tries to make video editing easy and bring the process closer to all video creators.

Google’s Dreamix

Google’s Dreamix is a cutting-edge AI video editor rooted in diffusion techniques. It has the power to alter existing videos guided by text prompts or even craft entirely new videos from just one existing image.

Dreamix Image source: Dreamix

This innovative video editor is a major leap forward in the realm of AI, advancing the achievements of generative AI models like OpenAI's DALL-E 2, Stable Diffusion, and Google's own Imagen, all of which convert text descriptions into tangible images or videos. While earlier methods like Prompt-to-Prompt or InstructPix2Pix use Stable Diffusion for image processing, AI video models are typically restricted to synthesis. Dreamix is breaking this limitation by offering video editing features.

Here's how Dreamix works: it infuses source images with what is called 'noise,' which are then run through a video diffusion model. This model takes the modified source images and produces new ones, guided by the provided text prompts.

These fresh images are then strung together into a video. The source images essentially serve as a loose blueprint, capturing the basic form or motion of an object but still allowing for changes.

And Dreamix doesn't stop at editing existing videos - it's also capable of generating completely new ones. It can either tweak a single image in various ways and assemble them into a video or use several images of a certain object to create a video centered around a theme.

While there's potential for improvement in quality and computational efficiency, Google's main goal with Dreamix is to push forward research on tools that enable users to breathe life into their personal content.

✨ Explore 8 Midjourney alternatives for generating AI images.

Conclusion

In the rapidly changing world of AI, the frontier of text-to-video generation is drawing widespread attention and excitement. Text-to-video AI generators transform simple lines of text into moving visuals, bringing the written word to life on screen.

Leading the charge are tools such as Runway's Gen-2, Vimeo's AI-powered video editor, and Google's Dreamix. Gen-2, described as the first "commercially available" text-to-video model, creates videos from text or an image, breaking new ground despite some limitations in video quality and realism. Vimeo, meanwhile, is catering to entry-level video creators, introducing AI features such as an AI script generator, a built-in teleprompter, and an AI text editor to help ease the video-making process.

Dreamix, on the other hand, is a game-changer, with the ability to alter existing videos or create completely new ones from a single image. Despite areas for improvement, its introduction signifies a significant step in AI, driving research on tools that allow users to animate their own content.

We also mentioned some of the notable players in the landscape of AI video generators. Each of these platforms offers unique capabilities and contributes to a future where creating engaging video content becomes increasingly accessible to anyone armed with imagination and a computer.