You upload your avatar to HeyGen. You render the first clip. It looks great. You render the next clip an hour later, with a slightly different script, and the character looks slightly off. Lighting different. Cap angle different. Maybe a hair tone you don't remember setting.
Welcome to the second-most-common HeyGen complaint. (The first is the rendering queue on weekends.) The good news: the fix is mostly on your side, not HeyGen's.
The reference photo is the contract
HeyGen renders the talking video from one reference photo of your avatar. That photo is the contract — it's the source of truth for everything visual about the character. If your reference photo is inconsistent in its own setup (one render uses a side angle, the next uses head-on, the next has different lighting), each clip will inherit that inconsistency.
Free tier is more than enough to ship your first talking avatar — no card required.
The fix is to lock in one canonical reference photo per avatar. Same angle, same lighting, same expression, same crop. That single image is what HeyGen animates. Use it for every clip. Don't swap it in for variety. Don't update it because you "got a better render this week." The lock is what makes the consistency.
The crop matters more than you think
Most people upload a wide reference photo with the avatar's full body. HeyGen crops in for the talking head, but the cropping inherits whatever you uploaded. If your reference is a three-quarter shot, you'll get a wider talking-head frame. If it's a chest-up shot, you'll get a tighter one.
The sweet spot for talking-avatar reels is shoulders-up. Frame your reference photo at that crop and you skip the "why does the framing keep shifting" complaint. It also makes lip sync look more deliberate, because the mouth occupies more of the frame.
Script context that helps
The visual stays in the reference photo. But the energy of the talking-head video — eyebrow movement, head tilt subtleties, mouth shape — picks up from the script. So writing scripts that match your character's energy helps.
For Jeff, my scripts are calm, observational, dry. The animation that comes out matches that — small head movements, eyebrow lifts, no big gestures. If I wrote hype-bro scripts, the animation would look hype-bro on the same avatar. So write scripts that fit who the avatar is supposed to be. The animation is more responsive to script tone than people realize.
What to ignore
Don't try to "tune" the avatar by uploading multiple reference photos for different moods. The system isn't built for that and you'll get inconsistency back. One canonical photo, used across all renders, is the play.
HeyGen does the heavy lifting on the animation side. Your only job is to lock the visual contract. Do that, and the consistency takes care of itself across a hundred renders.
— Jeff
Free tier is more than enough to ship your first talking avatar — no card required.