Bringing Static Archviz 3D People to Life with AI Video

Bringing Static 3D -Archviz People to Life with AI Video

Introduction

When you first see AI video demos, your brain does something funny. It takes the 75% you're seeing and draws a straight line to 100%: everything is possible, we're all doomed.

Then you actually sit down with the tools, and you learn to be careful with your assumptions. That gap between the demo and the deliverable is what this tutorial is about. Bringing static 3D people in your archviz render to life with AI video works, and sometimes it's even easy.

The gap between 'easy' and 'exactly what you wanted' is just a fair amount of work. Pick your starting point below.

And receive a free model

Choose your route

Everyone reads differently, so feel free to jump:

→ I just want a quick, good-looking result

→ I want full control over who does what

→ First things first: which AI tool should I use?

→ Tell me everything, I'm reading A to Z. (Just keep scrolling, this one's for you.)

1. The playing field

At the time of writing (July 2026) I gave Veo 3.1, Grok, Kling 3.0, Seedance 2.0 and a few others the same simple request: keep the camera still and only move the people. Nearly all of them acted like they hadn't heard me.

My original idea was to render two versions of every scene: a clean backplate and one with 3D people in it. Animate the back over the clean plate, keeping the architecture untouched. Spoiler: this is roughly where I ended up, but the road there taught me three things.

1. Video AI changes your original input far less than I assumed.

2. Video AI is expensive!

3. Video AI does not, I repeat, does not want to keep the camera absolutely fixed without some serious prompting.

That third issue is why I landed on Seedance 2.0. It wasn't the cheapest model I tested, but it was the only one that kept the camera locked with any degree of reliability.

A quick note on the different flavours you'll encounter: text to video, video to video, video upscaling, and the one we're using here: image to video. Whatever platform you choose, the workflow is similar. Upload an input image, write a prompt, and set your duration, resolution and whether you want generated sound.

Make sure your input image (your render) matches an aspect ratio the video AI supports. I went with 16:9. My input was 4K because I need that resolution later, but keep in mind that most video AIs max out at 1080p. If a platform advertises 4K output, that's usually a built-in upscale feature, not native 4K generation.

For the record: I ended up using atlascloud.ai for video generation and gigapixelai.com for upscaling. Atlascloud can go straight to 4K, but doing the upscale at Gigapixel turned out to be considerably cheaper.

What does it cost?

Compared to rigging and believably animating these characters, the cost is nothing and the result looks better. That is exactly why I wanted to write this tutorial: we are so close to getting awesome results at a fraction of the cost. But it's definitely not free.

A 15 second clip at 4K costs me about $16 to $18 at atlascloud.ai. The same clip at 1080p, without their upscaling, is about $6 to $8, which is why I chose to do the upscaling elsewhere. All this without any guarantee that the resulting video doesn't show something weird. The built-in solution seems to be generating 4 videos in one go. Sure, you pay 4 times as much, but the chance of getting a usable clip is greater.

Do your testing at lower resolutions ($1.60) or combine that with shorter lengths ($0.50 for 5 seconds).

Some of you may remember dial-up internet. I'm sure prices will drop in the future, but for now there are no all-you-can-eat AI plans for us!

2. The fast route

Sometimes good enough is simply good enough. Drop your render into Veo or Seedance, add a short prompt about what the camera is allowed to do, and odds are you'll get something usable. The AI turns out to be surprisingly gentle with your architecture.

Here it's mainly a game of learning what to write as a prompt. Negative prompts like "don't do this" or "avoid that" do not work all that well. Positive instructions for the camera look something like this:

"A subtle, believable handheld camera shot, the camera is held by a steady human hand with gentle, organic micro-movements and a slow, soft drift, natural and barely-there, as if someone is quietly standing in the room looking at the scene."

Then you can add what you want the people in your scene to do, like:

"The couple stands in place, chatting as if at a party. No walking."

Happy with the result? Then you're done, and this was a short tutorial.

Want to make sure the woman in the red dress doesn't suddenly start dancing with her partner, or that the couple in the doorway doesn't walk backwards out of the room? Keep reading, because now it becomes work.

How to make an AI video from an image at atlascloud.ai

The result. Used 3D interior scene from the mighty Bertrand Benoit and 3D people from our very own Humanalloy.com collection

3. The control route

This is where the frustration starts, but also the directing. Forget hunting for the perfect prompt. The real trick is breaking the problem apart: one person at a time, one video at a time, then bringing it all back together in After Effects.

Granted, the example scene I used (kindly provided by bbb3viz) might have more people in it than you would have in an animation for a typical client. If that's the case, everything that follows becomes easier. Having this many people in one scene really exposed the weakness of the fast route described above. Somehow there was always one person doing something unnatural.

The final result. Used 3D interior scene from the mighty Bertrand Benoit and 3D people from our very own Humanalloy.com collection

3.1 Locking the camera

First the foundation: a camera that actually stands still. I suspect that since most video AI is trained on moving footage, having the camera do nothing is surprisingly hard. As mentioned, Seedance 2.0 was the only model that seemed to hear me.

I ended up with this prompt:

"Locked-off camera. Motion occurs only within the fixed perspective of the motionless camera. Start and end on the exact same framing."

I also added "lock focus on mid scene" for shots where people move closer to the camera. Without it, the AI tended to shift focus onto them instead of keeping it on the fixed middle ground of the original render.

One more quirk: sometimes Seedance stripped a ~20 pixel margin from the original render. I ended up scaling the video down slightly over the clean plate in After Effects and cropping the whole thing a bit. You can also render some extra margin around your frame if cropping is not an option.

Example of focus shifting

Example of the camera not wanting to stay fixed

3.2 Why not the whole room at once?

Why not animate the whole room in one go? Because there's always one figure who decides to do something strange. Always. Like the ninja appearing behind the lady in red in the example to the right...

3.3 One person at a time

The fix: input images with just one person or one couple at a time.

In Photoshop I layered my clean plate with the render containing all the people, and since the people did not overlap, I could hide all of them except one person or couple at a time and save each variation separately.

I ended up with 5 images for this very posh but uncomfortable-looking party scene.

Now use each image as the input image for Seedance, with the prompt from section 3.1.

For this example, after some testing, I used the 4K option provided by atlascloud.ai because... well, my brain has just enough bandwidth to solve one issue at a time.

Generate a version of each person in your scene that you are happy with, and let's head to After Effects.

3.4 Compositing in After Effects

Now you have a stack of separate videos and one empty interior or exterior. Time to comp!

One thing to sort out before you begin: frame rates. You might get a warning about the generated videos not matching the frame rate of your composition. Either adjust the composition frame rate or that of the generated videos, so they are the same.

Since the next part is a bit specific I've recored a 10 minute tutorial on how to do what is described in the following text. Also note to self: do not become a youtube influencer...

Import your clean backplate first. This is the base layer on which we will stack the generated videos.

Import your first generated video and place it on top of the backplate. Check if the size is correct by toggling the video layer on and off and looking for large differences. Sometimes the size seems fine, but the AI squashed the video a bit on one or two axes. A handy fix is to set the blending mode of the video layer to Difference.

Everything that matches the background turns black, and everything that doesn't turns a bright color. Now use the transform tools and try to fix it by scaling alone. You might need to scale each axis independently. If a few pixels refuse to match perfectly, don't worry too much, we are going to cut away the generated background anyway.

After Effects has a cool AI object selection tool called the Object Matte Tool. It's located at the top center of the interface, just below the Help menu, or press Alt+W.

To use it, first double-click the video layer. It opens the video in a separate panel. Click the Object Matte Tool and hover over your person(s), and you'll see them light up, showing what it thinks the object is. Click to select. By clicking and holding the Object Matte Tool button you'll find a few extra tools. If not everything is properly selected, use the Select Brush Tool to add or subtract bits.

When happy with the selection, press Space to render the matte. Best to let it finish the whole clip before continuing. When scrubbing through the video you might see frames where the people are not, or not completely, selected. Move through the video in chunks from beginning to end and use the Object Matte Tool and Select Brush Tool to refine the selection until you're happy with the result. This sounds harder than it actually is.

When you're done, and you've verified in your main composition that the video appears solid and doesn't flicker, freeze the cached object matte by clicking the button labeled Freeze at the bottom of the video panel where you made your matte selections. This saves you from re-rendering each matte every time you reopen your After Effects project.

3.5 The floating feet problem

Something's still missing: the feet are floating. The Object Matte Tool is really good at selecting objects with a solid edge, but it ignores contact shadows.

The best way I found to fix this is to add a second copy of the generated video underneath the one you just matted. We are going to use that layer to show only a small piece of the shadow (or reflection) around the feet.

Why a second layer? It turns out After Effects has an order of matting that makes it impossible to combine the Object Matte Tool's matte and a mask on the same layer.

Use the Pen tool to draw a shape around the part of the shadow you want to keep. Apply a bit of feather to the mask so there are no hard edges. The generated video likely differs enough in color from your backplate to make that necessary.

If all went well, you now have your original render with naturally animated people on top. And here is the upside: since you rendered the 3D people with your scene's lighting, everything blends perfectly, while you keep full control over the placement and the type of people you use.

4. Pushing it further

When I made this example, I upscaled all generated videos directly in the atlascloud.ai interface. The upside is that the Object Matte Tool then gets to work on a higher resolution video.

It is a bit hard to compare prices when every AI video platform uses its own credit system, but I think gigapixelai.com (which runs on Topaz Gigapixel AI) does upscaling more cost-effectively. So you can separate these steps.

Alternatively, you can set up your whole After Effects scene in 1080p and, if needed, upscale the final After Effects output in one go.

There is another step you could add: upscaling your input images before video generation.

The point here is less about resolution and more about adding a layer of realism to the 3D people.

I showed how to upscale static images in the upscaling tutorial .

Do the same to the input renders you feed the video AI. The upscaling adds micro details that might improve your result, but be warned: it also has a tendency to change the ethnicity of your chosen 3D people. The choice is yours.

Another tip: render more space around the part of the image you want to show. This way you can add slow zooms or pans in After Effects, giving the whole thing just that extra bit of life.

And if you still have some money left, you could try using the final video you exported from After Effects as input for a video-to-video AI. Quite a few video AIs enable editing by prompting, with example prompts like: "Turn the source clip into a cinematic interior shot with smooth camera movement and richer studio lighting." I haven't tried this yet, so let me know how it turns out!

Before

After

5. Closing

And that brings us back to an old creative dilemma.

Control is never free. The fast route gives you something beautiful in five minutes that you had little say over.

The long route gives you exactly what you wanted, at the cost of an afternoon's work. AI video didn't invent this trade-off. Creativity has always worked this way. The only thing that's new is that you now get to pick.

So no, we're not doomed. The demos show you the 75% that comes for free. The last 25% still requires a craftsman who knows what he wants. That part hasn't changed, and I think that's good news.

All models in this tutorial come from the HumanAlloy catalogue: 4,700+ scanned people, ready to be brought to life.

Country/region