Gemini Omni: Clone yourself with AI in under 15 minutes

Claire Vo takes listeners on a real-time experiment to clone herself using artificial intelligence. She dives into Google Flow and the new Gemini Omni video generation model to create a personal AI avatar in under 15 minutes.

The discussion details every step of this cutting-edge process. Claire walks through scanning her face, developing a full video storyboard with AI as a creative partner, generating multiple video scenes with her new avatar, and stitching them together into a complete hype video for the podcast.

This demonstrates how powerful AI video tools open creative avenues for anyone, regardless of their production background. While navigating moments where the AI clone doesn't quite capture every nuance, the episode showcases the remarkable speed and accessibility of current generative AI for content creation.

Key takeaways

AI avatar creation can be achieved through a simple phone scanning process, often by scanning a QR code to access the camera.
The process involves taking multiple photos and following prompts for head movements to ensure a comprehensive facial scan.
Google Flow acts as a full creative suite, integrating avatar generation with other content creation capabilities like storyboarding.
Generative AI models, especially multimodal ones, enable individuals to create content they might otherwise lack the specialized skills or experience to produce independently.
Providing specific aesthetic and thematic details to AI tools helps refine creative outputs, translating general ideas into detailed visual plans.
AI-generated storyboards can capture specific, detailed visual concepts like multi-keyboard hacking and glowing mechanical keyboards, often exceeding expectations for creative fidelity.
Integrating a specific user avatar into AI video generation remains a technical challenge, requiring explicit referencing in prompts to ensure the AI uses the intended character.
Iterative testing and explicit prompting are necessary when working with AI to achieve desired outcomes, especially when the AI struggles with specific references like user avatars.
Users may inadvertently generate still images instead of video if an incorrect setting, such as "image generation," is selected.
Video generation often requires more processing time, typically minutes, compared to image generation, which takes seconds.
Queuing multiple scenes can be an efficient way to experiment with different avatar looks or camera angles.
Integrated video editing timelines simplify the process of stitching together AI-generated clips directly within the creation platform.
The entire production, from concept to assembled video, was completed in under fifteen minutes, demonstrating the speed of current AI content generation tools.
Despite minimal effort, Claire found the AI-produced hype video surprisingly effective and compelling, rating it 50% professional quality with 100% satisfaction for the effort-to-outcome ratio.
Avatar generation models frequently struggle with character consistency, producing varied facial accuracy, inconsistent hair length, and shifting backgrounds within a single generated video.
AI's understanding and depiction of "impressive technology" in generated avatars are often outdated and nonsensical, showing generic 2000s-era tech concepts rather than current realities.
While static facial features can be accurately replicated, dynamic emotional expressions in generated avatars often fall into the "uncanny valley," making the character appear unnatural or distorted.
Despite current limitations in expressing emotions and advanced graphics, AI avatar video tools can produce surprisingly good results with minimal effort.
Creating a basic one-minute AI avatar video can take as little as 10-15 minutes, even for a beginner with no prior experience with the tool.
With additional effort in prompting and input, these AI video generation tools show significant potential for creating high-quality, convincing promotional or "hype" videos.

00:59 - 02:00

Claire plans an experiment to create and animate an AI avatar of herself using Google Flow and Gemini Omni.

Claire initiates an ambitious experiment leveraging Google Flow and the new Gemini Omni video generation model. Her primary objective is to produce an AI avatar of herself that can then be animated for various cinematic video creations.

Google Flow, in combination with the Omni model, is promoted as having the capability to generate personal avatars. However, an initial attempt to use this specific feature upon its release was unsuccessful, prompting a second dedicated effort.

The successful creation of a fully functional avatar would enable the development of consistent character videos, which could significantly streamline future content production workflows.

I'm gonna try really Hard to create an AI avatar of myself that we can animate or I guess cinematically create using AI.

02:00 - 03:56

Creating an AI Avatar using Phone Scans in Google Flow

To create an AI avatar within Google Flow, the process starts by scanning a QR code with a phone, granting camera access. This initiates a sequence where the system captures a user's likeness for digital rendering.

The avatar creation involves taking numerous photos and following specific prompts to turn the head left and right. The application confirms each step, ensuring a comprehensive capture of facial features from various angles to build the avatar.

After completing the capture process, the system generates a 'fisheye lens version' of the user, which becomes their AI avatar. Although the process can sometimes require multiple attempts, the aim is to produce a functional digital representation.

The generated avatar can then be used within Google Flow's broader creative suite. The user plans to leverage her new avatar to storyboard and create a hype video for her podcast, demonstrating how the platform supports a range of creative functions beyond just video generation.

there's this fisheye lens version of me that is now an avatar

03:56 - 06:01

Using AI to Brainstorm a Podcast Hype Video Storyboard

Claire uses an AI tool to brainstorm a seven-scene storyboard for a podcast hype video. She provides specific creative direction, describing a dark home office with dark green walls, books about AI, and fun posters. The goal is an authentic, lifestyle-focused video that also feels high-tech and has a hacker vibe, centered around coding. The AI acts as a creative producer, asking guiding questions about the desired feel and style.

Claire highlights how new generative AI models, particularly multimodal ones for image and video, unlock new creative abilities. She admits she would struggle to solo produce a hype video, finding it difficult to brainstorm, frame, or block it. With the AI acting as a producer, she can overcome these creative hurdles and bring her vision to life.

The AI then outlines specific shots for the video based on her input. These include an extreme close-up of Claire typing on a mechanical keyboard, a wide shot of her office, and a reveal of her in an ergonomic chair. This structured output helps transform a broad concept into a concrete, scene-by-scene plan for the video.

I would have a hard time brainstorming it, I wouldn't know how to frame it, I wouldn't know how to block it. But now I have this AI producer here that can help me with this effort.

06:01 - 08:02

Claire reviews AI storyboard frames and attempts video generation with her avatar

Claire described the envisioned storyboard for her "How I AI" hype video, detailing elements like a digital heads-up display, an AI montage, a lifestyle shot, a call to action with a podcast microphone, and the final slogan. She anticipated a potentially cheesy outcome but was ready to proceed if the initial frames looked good.

Upon reviewing the AI-generated storyboard grid, Claire expressed delight at the visuals. She noted specific details like a "glowy mechanical keyboard," "hacking on three keyboards," and an image of her making eye contact with trendy glasses, dragging and dropping a file, and speaking into a podcast mic.

A recurring challenge Claire observed was the AI's difficulty in accurately referencing her pre-created "me character" or avatar in initial tests. Despite this, she planned to generate the first video scene by specifically referencing her avatar in the prompt, hoping to see an improved result with her likeness incorporated into the video.

Oh, I mean, this is delightful. Look at this glowy mechanical keyboard. Look at how I am hacking. On three keyboards, I'm gonna make a little eyes at you with my, my fake glasses, my very trendy glasses.

08:02 - 10:04

Correcting a Setting to Generate First Avatar Video

Claire initially attempted to generate an avatar scene but mistakenly produced still images instead of video. This was due to an incorrect setting where "image generation" was selected instead of "video generation" in the bottom right corner of the interface.

After identifying the error, Claire corrected the setting and re-entered the scene description, which included details like her "me avatar" with fingers flying across a mechanical keyboard. Video generation typically takes a couple minutes, unlike image generation which completes in seconds, because the platform generates two versions.

While waiting for the first video to process, Claire also queued up a second scene, specifically "Frame three," to further experiment with the face avatar. The first video successfully generated, featuring the avatar with blue nail polish, marking a successful step in the video creation process.

Okay, I got that wrong. I actually generated images instead of videos. Totally messed up. Didn't click the right thing down here in the bottom right. I had image generation instead of video generation.

13:00 - 14:05

Assembling AI-Generated Avatar Scenes into a Hype Video

Claire is progressing with creating a hype video for her podcast, using AI-generated avatar scenes. She reviews the quality of the avatar's appearance in different takes, specifically commenting on how her hair was rendered.

The next step involves assembling these individual video segments. Claire plans to stitch all the clips together into a complete hype video, following a recommended form factor provided by the AI tool itself.

She demonstrates the platform's integrated video editor, which features a timeline interface. This allows her to easily combine the various avatar-generated clips directly within the tool, arranging them in the suggested order to form the final promotional video, a process that took her about five minutes.

I'm gonna stitch all these videos together. In the form factor that Gem and I told me I should, that Flo told me I should, we're gonna bring this hype video together.

14:05 - 16:05

Claire Assembles and Debuts the 'How I AI' Hype Video

Claire stitched together seven AI-generated avatar scenes using an in-browser editor to create a complete hype video for 'How I AI'. This process, including recording her face as an avatar and generating all the videos, was completed rapidly, taking less than fifteen minutes.

The resulting video opened with a provocative statement about AI replacing humanity, then introduced Claire and the podcast's focus on tools that change how we live and work, inviting listeners to deconstruct the future one prompt at a time. It concluded with a call to subscribe to 'How I AI'.

Claire expressed her obsession with the final product, highlighting that it required zero time and effort on her part. She critically assessed the video, stating that while it wasn't 80% professional quality, it was definitely 50% there, making it incredibly effective given the minimal effort invested. She planned to immediately tweet the video.

What I love, this took zero time and effort, and it is, I wouldn't say it's like eighty percent there, but is it fifty percent there? A hundred percent, yes.

16:05 - 18:05

Successes and Uncanny Moments in Avatar Generation

Claire notes that the avatar generation only accurately represents her face about 50% of the time, often creating an "uncanny" version. The avatars exhibit significant character inconsistencies, such as giving her long wavy hair despite her recent haircut and showing backgrounds that shift between different colors, books, and plants.

The AI's portrayal of technology is notably outdated and comical. It depicts Claire holding a large 24-inch iPad while examining a schematic that resembles a church, or coding a robot in Gemini with a heads-up display. These examples highlight how current AI models interpret impressive technology with a sensibility reminiscent of the early 2000s.

Despite these inconsistencies, some individual frames achieve a high degree of facial accuracy, capturing details like sun damage and precise side profiles as Claire turns her head. However, emotional expressions pose a significant challenge; for instance, a laughing scene resulted in a "hundred percent uncanny valley" effect, making her appear "very strange."

Overall, the technology is "ninety percent there," showing significant progress in replicating facial features. Yet, the persistent issues with character consistency, outdated technological representations, and particularly the failure to convincingly convey emotions prevent it from being fully realistic or authentic.

I look very strange, like I'm on some sort of, medication perhaps.

18:05 - 20:05

Claire Shares Final Impressions on AI Avatar Video Creation

Claire Vo discusses her final thoughts on the AI avatar video she produced, expressing genuine amazement at the results despite encountering some limitations. She notes that the avatar struggled with conveying emotions effectively, and there were minor timing issues where she spoke over herself.

While admitting that the typography and overall graphics in the video's ending were "lame," Claire highlighted that a specific scene in the video was "legitimately pretty good." She believes that with minimal additional effort, such as more consistent background prompting and extra images fed into the Google Omni model, she could create a highly convincing "hype video."

A key takeaway was the incredibly short time investment required. Claire mentioned it took her roughly 10 to 15 minutes, from knowing nothing about the tool to having a one-minute video ready. She found herself "pretty blown away" and considers exploring these AI video models, like Google Omni and Flow, as a new hobby project.

Claire encourages listeners to experiment with these "incredible new video models" themselves. She invites them to try generating consistent characters and share their experiences, noting that achieving such a good outcome with very little prior knowledge is a testament to the tool's accessibility and potential.

I really think I got an outcome that was much better than I expected with very little knowledge of the tool.

Follow the shows you care about.

Podbrew watches new episodes and turns them into concise briefs you can read in minutes.

Get your own briefs