So what if OpenAI Sora didn’t create the mind-blowing Balloon Head video without assistance – I still think it’s incredible

Missing the point

When you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.

Sora fans just learned a hard lesson: filmmakers will be filmmakers and will do what’s necessary to make their creations as convincing and eye-popping as possible. But if this made them think less ofOpenAI’s generative AI video platform, they’re wrong.

When OpenAI handed an early version of the generative Video AI platform to a bunch of creatives, one team – Shy Kids – created an unforgettable video of a man with a yellow balloon for a head. Many declaredAir Headto bea weird and powerful breakthrough, but abehind-the-scenes videohas cast a rather different spin on it. And it turns out that as good as Sora is at generating video from test prompts, there were many things that the platform either couldn’t do or didn’t produce just as the filmmakers wanted.

The video’s post-production editor Patrick Cederberg offered, inan interview with FxGuide, a lengthy list of changes Cederberg’s team made to Sora’s output to create the stunning effects we saw in the final, 1-minute, 22-secondAir Headvideo.

Sora’s developers, for instance, included no understanding of typical film shots like panning, tracking, and zooming, so the team sometimes had to create a pan and tilt shot out of the existing more static clip.

Plus, whileSorais capable of outputting lengthy videos based on long text prompts, there is no guarantee that the subjects in each prompt will remain consistent from one output clip to another. It took considerable work and experimentation in prompts to get videos that connected disparate shots into a semi-connected whole.

As Cederberg notes in an Air Head Behind the Scenes video “What ultimately you’re seeing took work time and human hands to get it looking semi-consistent.”

The balloon head sounds particularly challenging, as Sora understands the idea of a balloon but doesn’t base its output on, say, an individual video or photo of a balloon. In Sora’s original idea, every balloon had a sting attached; Cederberg’s team had to paint that out of each frame. More frustratingly, Sora often wanted to put the impression (see above), outline, or drawing of a face on the balloons. And while the final video features a yellow balloon in each shot, the Sora output usually had different balloon colors that Shy Kids would adjust in post.

Shy Kids told FxGuide that all the video they used is Sora output, it’s just that if they had used the video untouched, the film would’ve lacked the continuity and cohesion of the final, wistful product.

This is good news

This is good news

Does this news turn the charming Shy Kids video into Sora’sMilkshake Duck? Not necessarily.

If you look at some of the unretouched videos and images in the Behind the Scenes video, they’re still remarkable and while post-production was necessary, Shy Kids never shot a single bit of real film to produce the initial images and video.

Even as AI innovation races forward and we see huge generational leaps as often as every three months, AI of almost any stripe is far from perfect.ChatGPT’sresponses are usually accurate, but can still miss the context and get basic facts wrong. With text-to-imagery, the results are even more varied because, unlike AI-generated text response – which can use fact-based sources and mostly predicts the right next word – generative imaging base their output on a representation of that idea or concept. That’s particularly true of diffusion models that use training information to figure out what something should look like, which means that output can vary wildly from image to image.

“It’s not as easy as a magic trick: type something in and get exactly what you’re hoping for,” Shy Kids Producer Syndey Leeder says in the Behind the Scenes video.

These models may have a general idea of what a balloon or person looks like. Asking such a system to imagine a man on a bike six times will get you six different results. They may all look good, but it’s unlikely the man or bicycle will be the same in every image. Video generation likely compounds the issue, with the odds of maintaining scene and image consistency across thousands of frames and from clip to clip extremely low.

With that in mind, Shy Kids' accomplishment is even more noteworthy.Air Headsmanages to maintain both the otherworldliness of an AI video and a cinematic essence.

This is how AI should work

This is how AI should work

Automation doesn’t mean the complete removal of human intervention. This is as true for videos as it is on the factory floor, where the introduction of robots has not meant people-free production. I vividly recall Elon Musk’s efforts to automate as much of the Tesla Model 3’s production as possible. It was a near disaster and production went more smoothly when he added back the humanity.

A creative process such as filmmaking or production will always require the human touch. Shy Kids needed an idea before they could start feeding it to Sora. And when Sora didn’t understand their intentions, they had to adjust the output by hand. As most creative endeavors do, it became a partnership, one where the accomplished Sora AI provided a tremendous shortcut, but one that still didn’t take the project to completion.

Instead of burstingAir Head’s bubble, these revelations remind us that the marriage of traditional media and AI still requires a human’s guiding hand and that’s unlikely to change – at least for the time being.

You might also like

Get the best Black Friday deals direct to your inbox, plus news, reviews, and more.

Sign up to be the first to know about unmissable Black Friday deals on top tech, plus get all your favorite TechRadar content.

A 38-year industry veteran andaward-winning journalist, Lance has covered technology since PCs were the size of suitcases and “on line” meant “waiting.” He’s a former Lifewire Editor-in-Chief, Mashable Editor-in-Chief, and, before that, Editor in Chief of PCMag.com and Senior Vice President of Content for Ziff Davis, Inc. He also wrote a popular, weekly tech column for Medium called The Upgrade.

Lance Ulanoffmakes frequent appearances on national, international, and local news programs including Live with Kelly and Mark, theToday Show, Good Morning America, CNBC, CNN, and the BBC.

ChatGPT o1 model briefly escapes preview mode

Gemini will yada yada your Google Chat into a neat summary

Latest Google Pixel update includes surprise launch of Android 15’s best battery feature