Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

[![Project Page](https://img.shields.io/badge/Project-Page-green?logo=googlechrome&logoColor=green)](https://eyeline-research.github.io/Go-with-the-Flow/) [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2501.08331) [![YouTube Tutorial](https://img.shields.io/badge/YouTube-Tutorial-red?logo=youtube&logoColor=red)](https://www.youtube.com/watch?v=IO3pbQpT5F8) [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Go--with--the--Flow-blue)](https://huggingface.co/Eyeline-Research/Go-with-the-Flow/tree/main) [Ryan Burgert](https://ryanndagreat.github.io)1,3, [Yuancheng Xu](https://yuancheng-xu.github.io)1,4, [Wenqi Xian](https://www.cs.cornell.edu/~wenqixian/)1, [Oliver Pilarski](https://www.linkedin.com/in/oliverpilarski/)1, [Pascal Clausen](https://www.linkedin.com/in/pascal-clausen-a179566a/?originalSubdomain=ch)1, [Mingming He](https://mingminghe.com/)1, [Li Ma](https://limacv.github.io/homepage/)1, [Yitong Deng](https://yitongdeng.github.io)2,5, [Lingxiao Li](https://scholar.google.com/citations?user=rxQDLWcAAAAJ&hl=en)2, [Mohsen Mousavi](www.linkedin.com/in/mohsen-mousavi-0516a03)1, [Michael Ryoo](http://michaelryoo.com)3, [Paul Debevec](https://www.pauldebevec.com)1, [Ning Yu](https://ningyu1991.github.io)1† 1Netflix Eyeline Studios, 2Netflix, 3Stony Brook University, 4University of Maryland, 5Stanford University
Project Lead ### Table of Contents - [Abstract](#abstract) - [Quick Start: Cut-and-drag Motion Control](#quick-start-cut-and-drag-motion-control) - [Animation Template GUI (Local)](#1-animation-template-gui-local) - [Running Video Diffusion (GPU)](#2-running-video-diffusion-gpu) - [TODO](#todo) - [Citation](#citation) ## :book: Abstract Go-with-the-Flow is an easy and efficient way to control the motion patterns of video diffusion models. It lets a user decide how the camera and objects in a scene will move, and can even let you transfer motion patterns from one video to another. We simply fine-tune a base model — requiring no changes to the original pipeline or architecture, except: instead of using pure i.i.d. Gaussian noise, we use **warped noise** instead. Inference has exactly the same computational cost as running the base model. If you create something cool with our model - and want to share it on our website - email rburgert@cs.stonybrook.edu. We will be creating a user-generated content section, starting with whomever submits the first video! If you like this project, please give it a ★! ## :rocket: Quick Start: Cut-and-drag Motion Control Cut-and-drag motion control lets you take an image, and create a video by cutting out different parts of that image and dragging them around. For cut-and-drag motion control, there are two parts: an GUI to create a crude animation (no GPU needed), then a diffusion script to turn that crude animation into a pretty one (requires GPU). **YouTube Tutorial**: [![YouTube Tutorial](https://img.shields.io/badge/YouTube-Tutorial-red?logo=youtube&logoColor=red)](https://www.youtube.com/watch?v=IO3pbQpT5F8) Examples:

### 1. Animation Template GUI (Local) 1. Clone this repo, then `cd` into it. 2. Install local requirements: `pip install -r requirements_local.txt` 3. Run the GUI: `python cut_and_drag_gui.py` 4. Follow the instructions shown in the GUI. After completion, an MP4 file will be generated. You'll need to move this file to a computer with a decent GPU to continue. ### 2. Running Video Diffusion (GPU) 1. Clone this repo on the machine with the GPU, then `cd` into it. 2. Install requirements: `pip install -r requirements.txt` 3. Warp the noise (replace `` accordingly): `python make_warped_noise.py --output_folder noise_warp_output_folder` 4. Run inference: ``` python cut_and_drag_inference.py noise_warp_output_folder \ --prompt "A duck splashing" \ --output_mp4_path "output.mp4" \ --device "cuda" \ --num_inference_steps 5 ``` Adjust folder paths, prompts, and other hyperparameters as needed. The output will be saved as `output.mp4`. ## :clipboard: TODO - [x] Upload All CogVideoX Models - [x] Upload Cut-And-Drag Inference Code - [x] Release to Arxiv - [ ] Google Colab for people without GPU's - [ ] Depth-Warping Inference Code - [ ] T2V Motion Transfer Code - [ ] ComfyUI Node - [ ] Release 3D-to-Video Inference Code + Blender File - [ ] Upload AnimateDiff Model - [ ] Replicate Instance - [ ] Fine-Tuning Code ## :black_nib: Citation If you use this in your research, please consider citing: ``` @misc{burgert2025gowiththeflowmotioncontrollablevideodiffusion, title={Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise}, author={Ryan Burgert and Yuancheng Xu and Wenqi Xian and Oliver Pilarski and Pascal Clausen and Mingming He and Li Ma and Yitong Deng and Lingxiao Li and Mohsen Mousavi and Michael Ryoo and Paul Debevec and Ning Yu}, year={2025}, eprint={2501.08331}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2501.08331}, } ```