Method · 2026-06-02

How I keep one face consistent
across AI-generated portraits

The breakthrough was one rule: never let the model render the identity-bearing pixels. Generate the world around the face and lift the face itself from a real photo. Everything in this essay follows from that.

A single ink portrait head held inside a clean rectangular frame while a generated scene of brushstrokes, lighting, and backdrop is painted in the space around it, the face itself left untouched as a pasted real cutout

I wanted professional, on-brand photos of myself without booking a photographer for every variation: a website hero, a clean headshot, a few editorial portraits in different outfits and settings. The job sounded simple. Generate as many looks as I want while the face stays unmistakably, identically mine in every single one. The trap is that I have a calibrated eye for my own face. A five percent drift in the jaw, the proportions, the set of the eyes reads instantly as "not me." And every standard text-to-image path failed that test, every time.

So the rule I landed on: never let the model render the identity-bearing pixels. Let it build the outfit, the background, the composition, even the field of light. The face itself must be real photographic pixels, lifted from a real photo and put back in. I did not start there. I got there by failing at every other approach first.

Everything that invents the face fails

The first thing most people try, and the first thing I tried, is to train a personal model on a set of your own photos. Fine-tune a FLUX adapter on fifteen to thirty real images and it will happily generate you in any setting. The problem is what it actually learns. It learns an average of your face and then invents that average from scratch on every render. Best case, the output is "a guy who resembles me." Turning the adapter strength up and down only moves where the failure lands: too strong and it over-bakes, too weak and it drifts toward a generic stranger. Train it on low-variety selfies and it faithfully learns a slightly distorted face and reproduces the distortion perfectly.

Then I tried the one-photo identity adapters, the family of methods built around IP-Adapter and InstantID. These encode a face into an embedding and condition the diffusion on it. They are genuinely impressive, and they all share the same flaw for my purpose: they guess the face from a vector. One made the beard fuller, another came out thinner, a third shifted the proportions. Each was a plausible person. None was the specific person.

Face-swap had the same disease in a different coat: it rebuilds a face from an identity representation, so the result is an approximation, smoothed and idealized. Scene-from-cutout generators produced beautiful frames but re-rendered the whole thing, face included, and quietly cleaned it up.

Two ink portraits side by side: on the left a generated face subtly wrong with smoothed waxy skin and shifted proportions, on the right the same face built from real photographic texture with pores and asymmetry intact

Why my eye is right and the model is wrong

After enough rounds I stopped treating the drift as a tuning problem and accepted it as a property. Every generated face carries a signature my eye correctly flags, because a denoiser doesn't copy my face; it synthesizes a face from a description. Even the edit-and-relock approaches, where you hand the model a real image and ask it to change only the outfit, run the face through the denoiser during the edit, so it comes out close but shifted, and the shift is different on every shot. Stacking more reference photos made it worse, not better. Off-angle references average together and soften the result, the same way a composite sketch loses the specific person.

That gave me the rule directly. My eye only accepts my real pixels. So the face has to be real pixels. The model gets the entire rest of the frame, and the face gets transplanted.

There was a second source of drift that took me longer to see. It was not the model, it was the process making free choices. When the pipeline got to pick which reference photo to use and decide what to keep, it drifted just from that variance. The fix was to remove the choices. Pin reference selection to a fixed lookup keyed on head angle, use fixed prompts, and match the real-photo angle to the scene angle. Frontal to frontal, three-quarter to three-quarter. And never mirror a reference, because a face is asymmetric and a flipped face is a subtly different person.

The pipeline, step by step

Here is what actually runs now. Shoot yourself once against a clean wall in even light, hitting every head angle: frontal, three-quarter left and right, profile, plus a few tight close-ups for the chin and jaw. These photos are your identity source for everything downstream. Decide the scene's head angle first, then look up the matching real photo from a fixed list. No on-the-fly "which one looks best." That judgment call is exactly where the drift creeps in. Background-remove the chosen photo down to a clean head cutout on transparency. Then hand the model only the cutout with a precise prompt: a real person's head on a transparent background, build a photorealistic scene around it, keep the face exactly as the cutout, do not redraw it. Models in the current Gemini image class are strong at this, building plausible bodies, clothing, and rooms while leaving the supplied face mostly alone. The load-bearing step is compositing: align the real face to the scene face using facial landmarks and a similarity transform, warp it into place, and feather the edge of the mask slightly. In my own comparisons, having a high-fidelity edit model transplant the angle-matched real face into the scene won over a manual warp, with minimal seam or warp distortion.

An ink flow diagram reading left to right: a real photo, a clean head cutout, a generated scene built around the empty face area, the real face composited back into the frame, and a final graded print, each stage drawn as a simple labeled card

If the face needs to match the scene's light, use a strict instruction: change only how light and shadow fall, keep features, bone structure, texture, and identity one hundred percent identical, do not redraw or beautify. The deterministic fallback I trust more is frequency separation: borrow the scene's low-frequency lighting onto the real face's high-frequency texture, so the light moves and the face does not. Finish with a fixed photographic color grade, a print-film look applied the same way every time, instead of a generative "polish" pass. That kills the clean, plastic, video-game quality of raw model output and, because it is deterministic, it never drifts.

The gotchas that cost me time

A few specifics, because they each cost me a day. Warm, golden-hour scenes wash out olive skin and read as a lighter, different person, so I clamp the skin tone. If your reference set is all frontal and you ask for a turned head, you will get drift, because there is no angle-matched real photo to lift from. If all your real photos are smiling and you prompt a neutral face, the model invents the expression, so shoot the expressions you actually want. Do not over-enumerate anatomy in the prompt; a simpler prompt plus selecting against drift beats a paragraph of facial description. And never edit an already-synthetic image to make a variant, because you are compounding a fake face. Re-composite from the real photo every time.

Generate the world, not the face

If I compress everything to one principle, it is this. I treat generation as the world builder and the real photo as the identity source. Never let the model render the identity-bearing pixels, never feed an AI image of yourself back in as a reference, and select-and-lift instead of blending many references into an average. Then finish the seam and grade in deterministic code. The model imagines a place to stand. The face stays from a real photo.

Some operational details in these essays have been changed for narrative or privacy reasons. The arguments, the numbers, and the lessons are real.