Game vs Model: How VTuber Clip Layouts Should Be Framed

Every generic clip tool does the same thing to make a vertical short: it finds a human face and crops the frame around it. That works beautifully for a talking-head streamer — and badly for a VTuber, whose on-screen presence is an avatar, not a face. The fix isn't a better face tracker. It's framing the clip around the right layout.

If you've ever auto-clipped a VTuber stream and watched the short cut your model in half, center on the gameplay, or zoom into empty background, you've hit the core problem. Most auto-reframing is tuned to track a human face and re-center the vertical crop on it. A VTuber's Live2D or 3D avatar doesn't behave like a face. It's a placed element — often a small webcam-style box parked in a corner over a 16:9 game capture. Face detection either misses it, locks onto a character on screen, or shoves the box out of frame entirely.

This page is about the vocabulary nobody else uses: the two layouts a VTuber clip should actually be in, when to use each, and how a layout-aware tool decides. One disclosure up front — we make VTubeClip, and these two layouts are exactly what it's built around. We'll keep the explanation tool-agnostic so it's useful no matter what you clip with.

Why face-cropping misframes a VTuber

A normal vertical-clip pipeline assumes the interesting thing on screen is a person, and that the person's face is the anchor. So it runs face detection, picks the dominant face, and slides a 9:16 window to keep that face centered. For a webcam streamer this is exactly right.

A VTuber breaks every assumption in that chain:

There is no real face. A Live2D rig or 3D model is rendered art. A face tracker may not register it at all, or may register it weakly and drift.
The avatar is often tiny and off-center. A typical gaming layout is full-screen gameplay with the model in a 300×300-ish corner box. Center the crop on "the action" and the avatar falls off the edge.
There can be other faces on screen. NPCs, character portraits, a game's cutscene — the tracker may lock onto those instead of the streamer's model.
The story is split between two things. In gameplay, the clip-worthy moment is usually the reaction (the avatar) plus the cause (what happened in the game). Crop to one and you lose the joke.

The result is the familiar failure: a vertical short that's technically "auto-reframed" but frames the wrong subject. The answer is to stop asking "where is the face?" and start asking "which layout is this moment?"

The two correct VTuber layouts

Almost every VTuber clip belongs in one of two framings. Naming them is half the battle, because once you can name them you can choose deliberately instead of letting a face tracker decide by accident.

1. Game layout — keep both the gameplay and the avatar

The Game layout is a vertical split that fits both the gameplay and the avatar box into the 9:16 frame at once. The gameplay takes the larger region, the avatar sits in a dedicated band (commonly top or bottom), and neither one gets cropped away. It's the layout for any moment where the game and the streamer's reaction are both part of the payoff.

Use it when: the stream is a gaming stream and the clip is about something that happened in the game — a clutch win, a jump-scare, a rage moment, a funny death, a "did you see that?!" beat. The viewer needs to see the gameplay and the avatar reacting to it. Concrete example: a horror game jump-scare. The Game layout shows the scare in the game region and the avatar flinching in its band, in the same vertical frame. A face-crop tool would either show the scare with no reaction, or the reaction with no scare — and the clip dies either way.

2. Model layout — avatar full-screen

The Model layout fills the entire vertical frame with the avatar and drops the gameplay region altogether. The model gets the full 9:16 canvas, framed like a portrait. It's the layout for moments where the avatar is the content.

Use it when: there's no game that matters to the moment — just-chatting segments, story time, a hot take, a reaction to chat, singing, an emotional beat, an ASMR-style whisper. Concrete example: a just-chatting stream where the VTuber tells a funny story. There's nothing to "split" with; cramming a half-empty gameplay region next to the model just shrinks the avatar for no reason. Model layout gives the model the whole screen, the way a creator would frame it by hand.

That's the whole taxonomy. Game when two things share the moment; Model when the avatar owns it. A good clipper picks the right one per clip, not once per video — a single VOD usually contains both (gameplay highlights and a chatty intro), so the layout should switch moment to moment.

How VTubeClip applies layouts vs face-cropping tools

Instead of asking "where is the dominant face?", a layout-aware pipeline asks "what kind of moment is this, and where is the avatar?" In practice that means detecting whether a clip-worthy segment is gameplay-with-avatar or avatar-only, finding the avatar region as a placed element rather than a tracked face, and then composing the vertical frame around the matching layout — Game split or Model full-screen — so the model is never cropped out and the gameplay is never lost.

It also means the framing is a deliberate choice you can see and override, not an opaque crop you have to accept. The point isn't magic; it's that the tool is reasoning about VTuber layouts rather than human faces. For the bigger picture of how a clip gets made end to end, see what a VTuber clip is and the how-to-use guide.

Face-crop tools vs layout-aware: side by side

Situation	Generic face-crop tool	Layout-aware (VTubeClip)
Avatar in a corner box over gameplay	May crop the box out or center on gameplay	Game layout keeps both in frame
Just-chatting, no game	Tracks the avatar's "face," may zoom oddly	Model layout — avatar full-screen
Other faces on screen (NPCs, portraits)	Can lock onto the wrong face	Targets the streamer's avatar region
Live2D vs 3D model	Treated as a face to track	Treated as a placed element to frame
One VOD, mixed gameplay + chatting	One crop strategy for everything	Switches layout per moment
Choosing the framing	Opaque auto-crop	Detected, and overridable per clip

None of this means face-cropping tools are bad — they're excellent for real faces. It means a VTuber's screen is a different problem, and matching the layout to the moment is what makes the short read correctly.

Quick rule of thumb

"There's a game and my reaction both matter." Game layout — split, keep both.
"It's just me talking / singing / reacting to chat." Model layout — avatar full-screen.
"My avatar keeps getting cut off." You're being face-cropped; you want layout-aware framing.
"My VOD has both." The layout should switch per clip, not once per video.

Frame the layout, not the face. That single shift is the difference between a clip that looks hand-made and one that looks like a generic crop that happened to land on a VTuber.

Frequently asked questions

Why do my clips cut off my avatar?

Most clip tools crop a vertical 9:16 by tracking a human face and re-centering on it. A VTuber's avatar is usually a small Live2D or 3D box over gameplay, so the face tracker either locks onto the wrong thing or pushes the avatar box out of the vertical frame. A layout-aware tool places the avatar deliberately instead of guessing from face detection.

What is a Model layout?

A Model layout fills the vertical frame with the avatar itself — no gameplay underneath. It's the right choice for just-chatting, reactions, singing, and any moment where the model and what it's saying are the whole story. The avatar gets the full 9:16 canvas instead of being squeezed into a corner box.

Does this work for Live2D and 3D?

Yes. Both Live2D (2D rigged) and 3D avatars sit on screen as a placed element rather than a real human face, so both benefit from layout-aware framing instead of face cropping. Whether your model is a 2D Live2D rig or a full 3D model, the game-split and model layouts apply the same way.

Can I choose the layout?

Yes. VTubeClip detects whether a moment is gameplay-with-avatar or avatar-only and applies the matching layout, and you can override it per clip if you want a game moment in model framing or vice versa. You stay in control of how each short is framed.

Is it free to try?

VTubeClip uses pay-per-clip credits and only charges when a job actually delivers clips, so there's no monthly subscription. You can submit a VOD and only pay for the clips you receive, which makes trying it on a single stream low-risk.

Clip a VOD with the right layout

🎬 Clip a VOD

Game & Model layouts · avatar-aware framing · pay per clip, charged only when clips are delivered