Abstract
Despite impressive advancements in human motion transfer based on diffusion models, existing methods still
struggle to generate temporally consistent and realistic human motion, often exhibiting motion discontinuities,
appearance (or identity)-motion conflicts, and visual artifacts such as phantom limbs. These issues largely
stem from inaccurate 2D pose or 3D SMPL mesh estimation and the lack of explicit motion modeling to capture
coherent temporal dependencies across frames. To address these challenges, we introduce CoMotion, flow-driven
dual-path diffusion models designed for Consistent human Motion transfer. Specifically, it consists of three units:
(1) Dual-Path Motion Coordination integrates global motion priors from an auxiliary temporal branch into the main path.
The main path captures fine-grained local motion via interleaved video-flow embeddings, while the auxiliary path
encodes long-range temporal dependencies through external temporal blocks, ensuring globally coherent motion.
(2) Structure-Aware Flow mechanism embeds 3D structural priors into 2D optical flow, guided by surface normal
and Euler continuity constraints, enabling geometrically consistent and perceptually stable motion synthesis
with respect to underlying 3D geometry. (3) Dual single-layer ViT module mitigates motion-appearance discrepancies.
Extensive experiments demonstrate that CoMotion significantly improves the continuity of local body motion and
global human motion as well as the generation quality, achieving competitive performance on benchmark datasets.
Citation
@article{wang2025comotion,
title={CoMotion: Flow-Driven Dual-Path Diffusion Models for Consistent Human Motion Transfer},
author={Wang Xiangyang and Cai Yuqing and Wang Rui and Cheng Erkang},
year={2025}
}