HuMo is an open‑source video generation project from teams at Tsinghua University and ByteDance, designed around human subjects.
- Highlights:
- Accepts multiple inputs: text, images, and audio
- Produces high‑fidelity, controllable human videos
- Strong adherence to textual instructions
- Maintains subject identity across frames; accurate audio‑driven motion
- Flexible pipelines: text+image, text+audio, or text+image+audio
- Exports up to 720p resolution
- Suited to content creation, virtual avatars, education, and artistic work