"Roommate From Hell", a short film made entirely with AI

A discussion with Tyler B. Cohen, by François Reumont

20 janvier 2025 - Modifié le 18/01 Contre-Champ AFC n°363

[ English ] [ français ]

Tyler B. Cohen is a renowned Creative Director celebrated for his expertise in both physical and digital realms. At Tool, he co-leads AI initiatives, revolutionizing how some of the biggest brands in the world create and deliver narratives and experiences through bespoke generative AI systems. Tyler’s dynamic body of work includes collaborative projects helmed under his direction with globally recognized brands, chart-topping musicians, superstar athletes, major sports leagues, and all those brave marketers interested in embracing what’s next. Google gave him the opportunity to test his latest VEO2 text to video generator, and he made Roommate From Hell with it. This funny short movie is online since early january. (FR)

How was this film conceived ?

Tyler B. Cohen : I started with the constraints of the tool and worked backward. Once I identified Jesus Christ as a consistently depicted character—for better or worse—the idea practically wrote itself : what if Jesus had a room for lease and, in a moment of divine folly, rented it to the devil ? The premise was absurd but clicked immediately—simple, funny, and packed with visual potential. There’s also a sly lesson in there about creatives learning to embrace AI.

From there, I structured the narrative around a few key beats—how we get from point A to B to C—ensuring every step felt purposeful. Jesus became the anchor, not for any lofty reason at first, but because his image is universally recognizable. The robe, the beard, the hair—it’s shorthand everyone gets. The devil, as his foil, gave me the freedom to play, which only made it more fun.
ChatGPT handled the grunt work, but the ideas ? Those were all mine, from start to finish. This tool only works with text-to-video, so everything had to be written like a narrative—no storyboards, no visuals, just meticulously crafted text. I built the entire framework upfront : the apartment, the characters, the tone—everything that defined the world. The tool’s role was straightforward : take the structure I created and help fill in the gaps. I’d say, “Here’s the setting, here’s the scene, here’s the scenario—now give me something that fits my vision.” It spat out tight, 200–300-word prompts, keeping the process efficient and consistent while leaving the creative direction firmly in my hands.
Ultimately, AI’s role was to handle the busywork. The humor, the story, the chaos—that was all human. It just made the process faster, more flexible, and allowed it to come together as a solo effort—top to bottom.

Did you have shots in mind from the beginning ?

TBC : The water-into-wine scene at the party was locked in from the start. I knew it was the film’s turning point, so everything else was built to lead up to and flow from that moment. It gave me a clear structure—beginning, middle, and end—with that scene acting as the axis.

From there, it became a process of generating and layering. AI makes that absurdly flexible and cheap, so once I had three or four anchor shots, I started generating batches. Tons of options, ready to be dropped into the edit. It wasn’t a linear process, like shooting everything first and then editing. If something didn’t land, I’d cut it, generate a replacement, and keep moving.
That’s the beauty of working this way. It’s not rigid ; it’s fluid, like jazz improvisation. But you need structure—those key moments—to keep the chaos from spiraling out of control.

Moviemaking is a team effort by nature. How do you see the AI moviemaking experience ?

TBC : AI empowers the generalist. If you’ve got a decent eye, a good sense of timing, and can write a halfway decent line of copy, you suddenly have the tools to pull off a coherent film without leaning on an entire crew. For this project, 75% of the sound design was AI-generated, and the editing—just a day and a half—was all about nailing the pacing. The real challenge wasn’t the creation itself ; it was knowing what to cut. When you can generate anything and everything, budget stops being the limiter—restraint does. Letting go of shots you love but don’t need is harder than it sounds.
That said, let’s not ignore the human element. My wife was the MVP here. She has no problem calling out what doesn’t work—“Drop that shot ; swap this one”—and she was always right. Her feedback made the whole thing better. So while AI made it possible to handle the technical work solo, the process still had collaboration at its core. It might look like a one-man show, but it’s always better with a second set of eyes.

Is AI difficult to control ?

TBC : The idea that AI is uncontrollable is a myth. For still images, the workflows are advanced enough that you can dial in nearly every detail if you know what you’re doing. However, when it comes to video, it’s still a bit like pulling the lever on a slot machine. You can nudge it in the right direction, but there’s always an element of unpredictability.
Some of the best moments come from happy accidents—those unplanned surprises that elevate a shot beyond what you imagined. But there’s a flip side. You’ll get a great shot, love 90% of it, and that remaining 10% is just wrong. Fixing that one piece can be a special kind of torture, and sometimes you’re stuck either living with it or starting over.
To Google’s credit, VEO 2 is miles ahead of anything else out there. Light years, honestly. When I tested it, it felt like the tool actually understood what I was asking for. That’s no accident—they’re clearly leveraging billions of data points from YouTube, and it shows. The results are sharper, more consistent, and overall just better than anything the competition offers.
So, still images ? Plenty of control. Video ? It’s not there yet, but VEO 2 is a huge step forward. For now, you work with the chaos, shaping it as best you can. And sometimes, the chaos shapes you back.

What was the most challenging shot ?

TBC : Without question, the Medusa shot. Picture this : a desperate demon trying to chat up Medusa while she casually smokes a cigarette, totally uninterested, and every single one of her snakes is doing the same. Hilarious, but a nightmare in execution. Getting the AI to understand and deliver on that level of absurd detail was like trying to teach a cat calculus. Iteration after iteration—it ate up so much time that I’d call it the most “expensive” shot in the film. But it had to work, even if only on screen for a second.
Then there was Jesus walking Satan’s three-headed dog. Simple enough, right ? Except I fixated on the dog poop. It had to look just right, but the system kept flagging it as something obscene. The solution ? A transparent plastic bag with copious mounds of steaming chocolate. Ridiculous in hindsight.

As for happy accidents, they were everywhere—peppered throughout the entire project. When you’re working with prompts capped at 200–300 words, you’re forced to leave gaps. And those gaps are where the chaos creeps in. You don’t get every object or movement exactly as you envisioned, but what you do get can be surprising, even serendipitous. The film itself is essentially a collage of these accidents. My job was to spot the ones with potential, double down on them, and let the unplanned moments shape parts of the narrative. It’s less about control and more about harnessing the storm.

Do you sometimes feel like you’re running after technology ?   

TBC : In some ways, yes. The pace is relentless—new tools drop weekly, sometimes daily. I’m fortunate to work in an agency surrounded by big-brained people who eat, sleep, and breathe this stuff. But outside that bubble, I can see how overwhelming it must feel. The flood of content, most of it garbage, is endless. Spotting the gems takes constant vigilance.
Miss a week, and it’s like missing a dog year. The challenge isn’t just keeping up—it’s cutting through the noise to find the signal. It’s an exhausting loop, sure, but if you’re up for it, the chaos is also what makes it exhilarating.

Do you think humans might get tired of AI images, like we got tired of 3D movies ?

TBC : The problem isn’t AI imagery itself—it’s the glut of mediocre output giving it a bad reputation. There’s an assumption that AI-generated art has a specific "look," but that’s only true when the tools are leveraged ineffectively. When AI is in the hands of someone who knows what they’re doing, the results just work. The line blurs and you stop thinking about how it was made and start focusing on how it makes you feel.
We’re heading toward a moment where the tools are so advanced, and the curation so nuanced, that distinguishing between AI and human-crafted visuals will be irrelevant. The flexibility will be there to create everything—animation, film, stop-motion, hybrids—and the audience won’t know, because they won’t need to. What matters is the result, not the process.
Right now, you’ve got filmmakers slapping disclaimers on their work—“No AI was used” or, conversely, “This was made with AI”—like some kind of ethical virtue signaling or a warning label on a pack of smokes. That’s a phase. Once AI becomes ubiquitous, those labels will disappear, and we’ll stop caring whether a shot was hand-drawn, machine-learned, or both. The economics are inescapable. AI tools aren’t just faster—they’re cheaper. And in industries like advertising, where cost-cutting is effectively a religion, AI is the new messiah. It’s not a question of if ; it’s a matter of when.
Think about typesetting with metal blocks or how Photoshop workflows felt twenty years ago—laborious, slow, inefficient. The difference between then and now is staggering. I’ve used Photoshop for two decades, and what used to take hours can now be done in seconds. AI is the next logical step in that evolution. It’s not here to replace creativity but to democratize it, offering creators more tools, more speed, and more reach. It’s not going away, and it’s not something we’ll tire of—it’s just something we’ll learn to expect. The cream will always rise to the top.

I discovered your film on the 10th anniversary of Charlie Hebdo killing in Paris. Of course cartoons and religion came immediately to my mind. As a creator, are you free to do anything with AI ?

TBC : First off, what a wild coincidence. The choice to depict Jesus wasn’t about making a statement—at least not initially. It boiled down to practicality. Jesus is baked into the visual lexicon : the white robe, the red shawl, the wavy hair, the beard. These are shorthand symbols that ensure consistency, which is critical when you’re working with generative tools that sometimes like to, well, wander.
The devil, though—that was another story entirely. He’s a character with a thousand faces : the horned beast, the sharp-suited dealmaker, the shadow lurking in the corner. I had to lay down strict rules from the start to keep things cohesive. VEO 2 did an impressive job connecting the dots where I hadn’t spelled things out explicitly, but even with that, it was a balancing act to maintain visual alignment across scenes.
As for Muhammad ? Yes, I’d consider exploring any subject. But here’s the thing : there’s no data. It’s a void, a cultural black hole. That absence isn’t about the limitations of the technology—it’s a reflection of the data shaping it. These tools can only pull from the underlying data they’ve been fed.
And that leads us to the bigger question of freedom in AI. Right now, we’re in a golden moment—a creative sweet spot where control largely rests in the hands of creators. But cracks are forming. Some tools already flag prompts involving political figures or sensitive topics. Others outright block you from generating certain content. We’re creeping into the era of nanny-state AI, and while I get why it’s happening, it’s deeply troubling.
Open Source models offer a lifeline, though their quality often trails behind the polished outputs of proprietary systems. Still, the ethos of open source feels truer to the spirit of creativity—scrappy, democratic, and deeply human. It’s David versus Goliath, and I’ll always bet on David, even if the slingshot needs an upgrade.
The root of the issue is data. AI reflects the biases of its training sets, and those are shaped by the institutions behind them. Take Jesus, for example. He almost always comes back as a white man—an algorithmic echo of cultural bias. And if you’re using a model trained in China, typing “Taiwan” or “Tiananmen Square” is an exercise in futility. These tools are only as free as their datasets allow them to be.
At my core, I’m a troublemaker. I don’t want guardrails dictating what I can or can’t create. Creative freedom isn’t a luxury ; it’s the engine of innovation. The challenge is protecting that freedom as these tools become more regulated and commodified. For now, we’re living in the Wild West of AI. I intend to ride this wave for as long as I can.

So Open Source could be the solution ?

TBC : It’s a gross generalization, but Open Source does seem to attract people who care about giving credit where it’s due. These are the ones who believe creators should be recognized and compensated for the work that feeds these datasets. Compare that to the big players profiting off pilfered content, then slapping a license fee on it with a straight face. Laughable.
This is David versus Goliath, plain and simple. Open Source isn’t perfect, but it’s principled. It’s a necessary counterbalance to a system that too often rewards exploitation over fairness.

Technical infos :

Total Shots : 125 made it into the final cut.
Total Generations : 700 clips were generated.
Timeline : 5 days total—3.5 days of image generation and 1.5 days of editing.
Dialogue : While dialogue is currently possible with other tools, I chose not to use it. The story’s humor and relatability shine through visual storytelling alone, as shitty roommates are a universal language.

This project was created entirely using Google VEO 2, leveraging text-to-video generation only. Here’s a step-by-step breakdown of the process :

1. Structuring the Story
The project called for three acts, so I outlined them at the start and identified a few key shots early on.
The scenarios were inspired by personal experiences with bad roommates, but with a twist of applying classic tropes through a biblical lens.
There were no storyboards, only a sequence of detailed written prompts and outlines.

2. Leveraging ChatGPT for Efficiency
I wrote the story out in ChatGPT, outlining the act structure and refining the narrative flow.
Then, I added character details for consistency, ensuring they were injected into every text prompt to save time and add consistency during iterations.
Text prompts were 250-300 words on average.
I kept the work to a single thread, where I described scenes and scenarios, which GPT optimized into efficient prompts. This streamlined the text-to-video process significantly.

Here’s an example of a prompt :
“A gritty, medium-wide shot captures Jesus walking away through a rundown urban neighborhood. Dressed in his white robe with a red shawl, his glowing halo ring softly lights the cracked sidewalks and graffiti-covered rowhouses. He carries a massive semi-transparent green plastic bag bulging with steaming chunky chocolate fudge.
Beside him, Cerberus strides calmly—a massive three-headed dog, each head scanning the surroundings with intense, synchronized gazes. Jesus holds a golden leash attached to intricate silver chains as they jingle faintly with each step.
In the background, neighbors peek cautiously through their windows, while a stray cat perched on a chain-link fence arches its back and hisses. The contrast between Jesus’s calm presence and Cerberus’s intimidating three-headed form highlights the surreal humor of the scene.”

3. Generating with Google Veo 2
I generated visuals with a text-to-video workflow, producing 4 clips of 8 seconds each per prompt.
The tool allowed for batch generation and iterative refinement. Of the roughly 700 total clips generated, 125 shots made it into the final cut.

4. Editing in Premiere Pro
After generating the clips, I imported them into Premiere Pro for structuring and editing. This revealed gaps, which I filled by generating additional clips to help tell the story.
Editing took 1.5 days, during which I fine-tuned pacing, transitions, and visual consistency.
One minor visual adjustment was made—a single shot had the shawl color changed for better continuity.
No color grading was applied to the project, only modest lighting cues via prompting.

5. Audio Design
After the picture was locked, I moved to audio. I used a mix of licensed audio and generated sound effects, with about 75% of the SFX created using ElevenLabs.

6. Final Touches and Upload
Après le mixage final audio, j’ai ajouté le titre sur le plan d’ouverture de début et je l’ai mis en ligne sur les réseaux sociaux.

7. Why Google VEO 2 Stands Out
Google VEO 2 was unmatched for text-to-video generation. No other tool came close in quality, consistency or prompt adhesion as of today.

(Interview by François Reumont for the AFC)

–

Roommate From Hell
par Tyler Cohen

https://www.youtube.com/watch?v=oMu48Mt8vHM